# Genetic Algorithms – Basic Implementation

Standard

After I used Python and Scikit-Learn for Twitter Sentiment analysis – which worked quite well – I started to read and learn about Genetic Algorithms.

Genetic Algorithms (GA) provide a very efficient way to search for optimum solutions in a very huge space of possible solutions. This does not mean that a GA always finds the best solution out there, but it finds a solutions that is working.
Of course you could use brute force to loop through all possible solutions and find the best one. But with large search spaces this would lead to a veeeeeery long runtime.

Example:
Imagine you want to find a certain number with a certain number of 1’s and 0’s (only 0 and 1 is allowed for each digit).
Wanted solution / number is 10 digits with only 1’s (or any other combination of 1’s and 0’s). This would mean 2^10 = 1024 options. Lets say each option takes 1ms to test. Then the overall time that a program needs would be ~1 second.
If you wanted 1000 digits with only 1’s then that means 2^1000 = 1.072 * 10^301 combinations. If each combination would also take 1ms then the overall time needed would be 3.398 * 10^290 years !!!
You see, brute force does not work – at least in my lifetime.

Genetic Algorithms are inspired by Darwin’s theory about evolution. That means GA uses crossover and mutations for finding an optimum solution to your problem.

The core functionality for deciding which partial solution (sometimes called individual or creature) should be used for further reproduction is the calculation of the fitness for each individual (i.e. how high can an individual jump, how fast can it go, or how closed does it come to a certain point, etc.)

Each individual has chromosomes which are encoded by genes.

I started with a fairly easy problem – finding an individual that creates the string “danielkoitzsch” (remember: brute force would be 14^26 possible gene combinations)

The basic idea of the algorithms is:

1. [Start] Generate random population of n chromosomes (suitable solutions for the problem)
2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
3. [New population] Create a new population by repeating following steps until the new population is complete
1. [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)
2. [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.
3. [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
4. [Accepting] Place new offspring in a new population
4. [Replace] Use new generated population for a further run of algorithm
5. [Test] If the end condition is satisfied, stop, and return the best solution in current population
6. [Loop] Go to step 2

My main loop looks like:

The crossover function:

The mutation function:

And finally the fitness calculation function (this one is pretty simple !):