After I used Python and Scikit-Learn for Twitter Sentiment analysis – which worked quite well – I started to read and learn about Genetic Algorithms.
Genetic Algorithms (GA) provide a very efficient way to search for optimum solutions in a very huge space of possible solutions. This does not mean that a GA always finds the best solution out there, but it finds a solutions that is working.
Of course you could use brute force to loop through all possible solutions and find the best one. But with large search spaces this would lead to a veeeeeery long runtime.
Example:
Imagine you want to find a certain number with a certain number of 1’s and 0’s (only 0 and 1 is allowed for each digit).
Wanted solution / number is 10 digits with only 1’s (or any other combination of 1’s and 0’s). This would mean 2^10 = 1024 options. Lets say each option takes 1ms to test. Then the overall time that a program needs would be ~1 second.
If you wanted 1000 digits with only 1’s then that means 2^1000 = 1.072 * 10^301 combinations. If each combination would also take 1ms then the overall time needed would be 3.398 * 10^290 years !!!
You see, brute force does not work – at least in my lifetime.
Genetic Algorithms are inspired by Darwin’s theory about evolution. That means GA uses crossover and mutations for finding an optimum solution to your problem.
The core functionality for deciding which partial solution (sometimes called individual or creature) should be used for further reproduction is the calculation of the fitness for each individual (i.e. how high can an individual jump, how fast can it go, or how closed does it come to a certain point, etc.)
Each individual has chromosomes which are encoded by genes.
I started with a fairly easy problem – finding an individual that creates the string “danielkoitzsch” (remember: brute force would be 14^26 possible gene combinations)
The basic idea of the algorithms is:
- [Start] Generate random population of n chromosomes (suitable solutions for the problem)
- [Fitness] Evaluate the fitness f(x) of each chromosome x in the population
- [New population] Create a new population by repeating following steps until the new population is complete
- [Selection] Select two parent chromosomes from a population according to their fitness (the better fitness, the bigger chance to be selected)
- [Crossover] With a crossover probability cross over the parents to form a new offspring (children). If no crossover was performed, offspring is an exact copy of parents.
- [Mutation] With a mutation probability mutate new offspring at each locus (position in chromosome).
- [Accepting] Place new offspring in a new population
- [Replace] Use new generated population for a further run of algorithm
- [Test] If the end condition is satisfied, stop, and return the best solution in current population
- [Loop] Go to step 2
My main loop looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
// create GA object GeneticAlgorithm ga(100, 0.01, 0.95, 2); // initialize population const int chromosomeLength = 14; std::shared_ptr<Population> population = ga.initPopulation(chromosomeLength); std::cout << "Population size after initialization: " << population->size() << std::endl; // eval population ga.evalPopulation(population); std::cout << "Population size after first evaluation: " << population->size() << std::endl; // keep track of current generation int generation = 1; while (!ga.isTerminationConditionMet(population)) { // print fittest individual from population std::cout << "Best solution (gen. " << generation << ", pop.fit. " << population->getPopulationFitness() << "): " << population->getFittest(0)->toString() << std::endl; // apply crossover population = ga.crossoverPopulation(population); // apply mutation population = ga.mutatePopulation(population); // eval population ga.evalPopulation(population); // next generation generation++; } std::cout << "Found solution in " << generation << " generations" << std::endl; std::cout << "Best solution: " << population->getFittest(0)->toString() << std::endl; |
The crossover function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
std::shared_ptr<Population> GeneticAlgorithm::crossoverPopulation(std::shared_ptr<Population> population) { // create new population std::shared_ptr<Population> newPopulation(new Population(population->size(), population->getChromosomeLength())); // loop over current population by fitness for (int individualIdx = 0; individualIdx < population->size(); individualIdx++) { std::shared_ptr<Individual> parent_A = population->getFittest(individualIdx); int randomNo = s_Distribution(s_Generator); // apply crossover to this individual ? if (m_CrossoverRate > (randomNo / 100.0) && individualIdx > m_ElitismCount) { // take only individuals that are not the top fittest ! the best will be moved to the new generation immediately. // initialize offspring std::shared_ptr<Individual> offspring(new Individual(parent_A->getChromosomeLength())); // now find second parent for "reproduction" std::shared_ptr<Individual> parent_B = selectParent(population); // loop over genome for (int geneIdx = 0; geneIdx < parent_A->getChromosomeLength(); geneIdx++) { // use half of A's genes and half of B's genes float randomNo2 = (float) s_Distribution(s_Generator) / 100.0f; if (randomNo2 > 0.5f) { offspring->setGene(geneIdx, parent_A->getGene(geneIdx)); } else { offspring->setGene(geneIdx, parent_B->getGene(geneIdx)); } } // add offspring to new population newPopulation->setIndividual(individualIdx, offspring); } else { // add individual to new population without applying crossover - since the individual is elite !! newPopulation->setIndividual(individualIdx, parent_A); } } return newPopulation; } |
The mutation function:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
std::shared_ptr<Population> GeneticAlgorithm::mutatePopulation(std::shared_ptr<Population> population) { // create new population std::shared_ptr<Population> newPopulation(new Population(population->size(), population->getChromosomeLength())); // loop over current population by fitness for (int individualIdx = 0; individualIdx < population->size(); individualIdx++) { std::shared_ptr<Individual> individual = population->getFittest(individualIdx); // skip mutation if this is an elite individual if (individualIdx >= m_ElitismCount) { // loop over individual's genes for (int geneIdx = 0; geneIdx < individual->getChromosomeLength(); geneIdx++) { // does this gene need mutation? int randomNo = s_Distribution(s_Generator); if (m_MutationRate > randomNo / 100.0) { randomNo = s_Distribution(s_Generator); double charIndex = 25.0 * (randomNo / 100.0); char newGene = char(charIndex + 97); // mutate gene individual->setGene(geneIdx, newGene); } } } // add individual to population newPopulation->setIndividual(individualIdx, individual); } return newPopulation; } |
And finally the fitness calculation function (this one is pretty simple !):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
double GeneticAlgorithm::calcFitness(std::shared_ptr<Individual> individual) { // track no of correct genes int correctGenes = 0; std::vector<char> chromosome = individual->getChromosome(); if (chromosome[0] == 'd') correctGenes++; if (chromosome[1] == 'a') correctGenes++; if (chromosome[2] == 'n') correctGenes++; if (chromosome[3] == 'i') correctGenes++; if (chromosome[4] == 'e') correctGenes++; if (chromosome[5] == 'l') correctGenes++; if (chromosome[6] == 'k') correctGenes++; if (chromosome[7] == 'o') correctGenes++; if (chromosome[8] == 'i') correctGenes++; if (chromosome[9] == 't') correctGenes++; if (chromosome[10] == 'z') correctGenes++; if (chromosome[11] == 's') correctGenes++; if (chromosome[12] == 'c') correctGenes++; if (chromosome[13] == 'h') correctGenes++; // calculate fitness double fitness = (double) correctGenes / individual->getChromosomeLength(); // store fitness individual->setFitness(fitness); return fitness; } |
Here you can download my source files: GeneticAlgorithm