I have previously been studying a whole bunch of nature inspired algorithms, or in other words optimisation methods inspired by natural phenomena. In this article, I’ll cover the basics of creating a genetic algorithm (GA).
The nature of evolutionary algorithms
A GA is an evolutionary algorithm. Evolutionary algorithms are inspired by the mechanisms of biological evolution by means of natural selection (as proposed by Darwin). The mechanisms of biological evolution revolves around modification and propagation of genetic material over time. Parents contribute with genetic material to create an offspring, and along the way there is some probability of natural mutation.
In layman’s terms
Basically, a GA mimics the evolution of some population with a sequence of selections of ‘genetic’ traits.
To give you a feel for the difference between simple search algorithms and a GA, let’s say you want to guess the number I’m thinking of between 1 and 1 million. You could just drop random numbers from 1 to 1 million, but it’ll take some time to cover the entire search area, and who knows when you will guess my number (it’s 280393). If, however, I can tell you whether your guess was hot or cold you’d quickly improve.
A more relevant example could be optimising a travel path represented as a graph (see the picture below). The green path represents the actual path that minimises the travel time, and the red path represent a proposal solution. If we had more destinations to cover (more vertices in the graph), we could make an algorithm that in each iteration generated \(N\) solution proposals and evaluated the theoretical travel time of each proposal. Then, for those proposals that had the least travel time, the algorithm would use part of their routes to create a new route and evaluate this route’s travel time. Eventually, the algorithm converges towards a time optimal route without systematically testing every possible path imaginable.
GAs are useful whenever we have an idea of the ‘fitness’ of proposed solutions. They can help us achieve a desirable solutions significantly faster than brute force searches are able to. A brute force method will waste time testing out every possible solution, while the genetic algorithm will propose solutions with similar characteristics to the so far best proposals.
The concept of a GA is generally very simple to grasp. Let’s simplify the task at hand with a simple recipe and gradually add more to it:
0. Formulate the objective and fitness functions
- Generate an initial population of N members and compute their fitness
- Reproduce N children and replace the old population, this includes:
- Selection: Select parents
- Crossover: Cross over genetic material from parents to create offspring with similar characteristics
- Mutation: Mutate genetic material in the offsprings with some probability
- Fitness: Compute each new member’s fitness
- Repeat step 2, until the population converges
Formulate the objective and fitness functions
Your objective function needs to be expressed in a way that can act as the DNA of the solutions.. If your problem at hand cannot naturally be expressed as a linear (and probably ideally also binary) representation, this algorithm is not what you need to solve your problem. It’s not a trivial task to formulate your problem suitably, and it is therefore naturally a crucial part of the design. For this tutorial, we’ll exemplify the algorithm using bit arrays.
Your fitness function is also a potential pitfall in computation time, so keep it as simple as possible. It has to be evaluated many times per iteration.
For this tutorial, let’s say our objective function should be a bit array describing the travel path, and the fitness function could either be the number of miles or the time it takes to travel along that path – which we want to minimise.
Generate an initial population of N members and compute their fitness
To initiate your algorithm, generate N solutions; N bit arrays randomly filled with 1’s and 0’s. Using the fitness function, evaluate the fitness of all population members each.
Reproduce N children and replace the old population
The principle behind reproduction can be summarised in the following figure. Two parents combine their genetic material in order to create two offsprings:
Selecting parents and can be done in various ways. You could:
- Sort all population members by their fitness and use the first half of the population to generate offsprings.
- Randomly select population members with fitness above some threshold and generate offsprings, until \(N\) new members are created.
- Identify all fit population members, but only generate offspring from a sample of them.
- Define some other way of doing it.
Whatever you choose, the idea is to combine the genetic material of two fit members into new ones. This could be as simple as just copying 50 % of each parents genetic material and combining them as in the picture above, or it could be a more random combination:
The process of combining the parents’ genetic material is called a crossover. After the crossover, run each offspring through a mutation function that with some predefined low probability will mutate, i.e. change some of the genetics, in an offspring. This can be done with a simple random uniform number \(\in [0,1] \), so that if the random number is below the threshold, some mutation will happen:
There are no restrictions to the mutations, and it can be fruitful to play around with this. Maybe you’ll find that a larger probability in the beginning of the search is fruitful to explore the search domain, but that you want it to decrease in the end.
When you have produced N new population members, compute each offspring’s fitness and save the currently most fit candidate as the current best solution.
Repeat step 2, until the population converges
You’ll find that after a number of iterations, all the population members will share similar characteristics, as the best genetic traits survive better. Eventually, they will all represent the same solution.
It might not always be time efficient or necessary to find the ideal solution to the problem. Sometimes, it is enough to find an approximate solution. It is therefore very important to define a criteria for ending the search, either after a number of iterations or when the solution converges with some probability or precision.