1 03-MetaheuristicParts

Previous: 02-BioAnalogyOrigins.html

“The advantage of evolutionary algorithms includes,
its ability to address problems for which there is no human expertise.
Even though human expertise should be used when it is needed and available;
it often proves less adequate for automated problem-solving routines.
Certain problems exist with expert system:
the experts may not agree, may not be qualified,
may not be self-consistent or may simply cause error.

Artificial intelligence may be applied to several difficult problems,
requiring high computational speed,
but they cannot compete with the human intelligence,
Fogel (1995) declared, they solve problems,
but they do not solve the problem of how to solve problems.

In contrast, evolutionary computation provides a that method,
for solving the problem of how to solve problems.”

1.1 Recap of EC metaphor

Natural Evolution	Probleem Solving
Enivornment	Problem
Individual	Candidate solution
Fitness	Solution quality

A population of individuals exists in an environment with limited resources.
Competition for those resources causes differential survival and reproduction of those fitter individuals,
that are better adapted to the particular environment they encountered.
Fitness like the “fit of a glove”, not like “fitness of an athlete”!

These individuals act as seeds for the generation of new individuals.
Diversity is introduced through recombination and mutation.
The new individuals have their fitness “evaluated” and “compete” (possibly also with parents) for survival.
Over time Natural selection causes a rise in the fitness of the population.
It causes a population’s gene pool to adapt to it’s particular past demands.

EAs fall into the category of “generate and test” algorithms.
They are stochastic, population-based algorithms.
Variation operators (recombination and mutation) create the necessary diversity,
and thereby facilitate novelty.
Selection reduces diversity,
and acts as a force pushing or selecting for quality.

1.2 What is an Evolutionary Algorithm?

Outline for the day:

Scheme of an EA
Main EA components:
- Representation / individuals / population
  - Recombination / mutation
- Evaluation / fitness
- Selection (Parent selection / survivor selection)
Examples: eight-queens problem, etc.
Typical EA behavior
EAs and global optimization
EC and neighborhood search

1.3 Scheme of an EA:

1.3.1 General scheme of EAs

03-MetaheuristicParts/03-ga_flowchart.png
Recombination happens at a slightly different stage in living organisms,
but it’s logically equivalent.

1.3.2 EA scheme in pseudo-code

03-MetaheuristicParts/ch03-What_is_an_EA-20140.png

1.3.3 Common model of evolutionary processes

Population of individuals
Individuals have a fitness
Variation operators: crossover, mutation
Selection towards “higher fitness”, or more suited to particularly encountered environment
- “survival of the fittest/most-suited”
- “mating of the fittest/most-suited”

Optimization according to some fitness-criterion
(optimization on a fitness landscape)

1.3.4 Two pillars of evolution

There are two competing forces:

Increasing population diversity by genetic operators:
- mutation
- recombination
- Push towards novelty and variety
Decreasing population diversity by selective pressure:
- of parents
- of survivors
- Push towards quality of fit to environment

If these two occur with differential survival,
evolution must happen (in general)
Further, for evolution to occur (success),
we cannot ignore either.

1.4 Main EA components

The following major parts are to be defined for each problem to be solved:

Representation

Encoding of Genotype -> Phenotype
Variation operators:
- Mutation (and probability/rate)
- Recombination (and probability/rate)

Fitness function (a.k.a. objective function)

Selection mechanisms

Parent selection
Survivor selection

Initialization parameters

Parameter choices

Population size
Number of offspring
etc.

Termination condition

1.4.1 Representation

Role:
Provides code for candidate solutions that can be manipulated by variation operators.

Leads to two levels of existence:

phenotype: object in original problem context, the outside.
genotype: code to denote that object, the inside (chromosome, “digital DNA”).

Implies two mappings:

Encoding : phenotype => genotype (not necessarily one to one).
Decoding : genotype => phenotype (must be one to one).

Chromosomes contain genes,
which are in positions called loci,
and have a value (allele):

A singular position is a locus.
Such positions are usually fixed in location.

Example representation: represent integer values by their binary code
03-MetaheuristicParts/03-pop_model.png
In order to find the global optimum,
every feasible solution must be represented in genotype space

1.4.2 Evaluation (fitness) function

A fitness function is a particular type of objective function,
that is used to summarize, as a single figure of merit,
how close a given design solution is to achieving the set aims.

A.k.a. quality function or objective function.
The designer must determine this, and it very much impacts success.
Problem-specific, of course.
Many involve using a simulator.
May be stochastic or estimated.
Need to evaluate thousands of times, so can’t be too costly.
For real-world, evaluation time of fitness function is the typical bottleneck.
Even a simple fitness criterion may be complex to calculate.
Objective function may be maximized or minimized.
Over multi-dimensional space, this can be considered a fitness landscape.

Role:
Represents the task to solve, the requirements to adapt to.
Can be seen as “the environment”.
Enables selection (since it provides basis for comparison).
Some phenotypic traits may be advantageous, desirable,
e.g. big ears,
these traits are rewarded by more offspring that will expectedly carry the same trait

Assigns a single real-valued fitness to each individual phenotype.
This number forms the basis for selection.
Thus, more discrimination (large range of different values) is better.

Typically we talk about fitness being maximized.
Some problems may be best posed as minimization problems,
but conversion is a trivial inversion.

1.4.3 Population

Role:
Holds the candidate solutions of the problem as individuals (genotypes).
Formally, a population is a multi-set of individuals,
Repetitions are possible.
Population is the basic unit of evolution,
The population is evolving, not the individuals.

Selection operators act on population level.
Variation operators act on individual level.

Selection operators usually take whole population into account.
Reproductive probabilities are relative to the current generation.
Some sophisticated EAs also assert a spatial structure on the population, like a grid.

Diversity of a population refers to the number of different fitnesses / phenotypes / genotypes present.
Each of those is not the same thing.
There are many algorithms for define diversity formally.
One can build an EC to increase mutation rates,
if diversity drops too low.

1.4.4 Selection mechanism

Two main types:
1. Parent selection (who gets to breed to produce children).
2. Survivor selection (which children get to live into the next iteration, to potentially breed later)

1.4.4.1 Parent selection

Role:
- Identifies individuals to become parents to survive.
Pushes population towards higher fitness.
Usually probabilistic.
- High quality solutions more likely to be selected than low quality, but not guaranteed.
- Even the worst in current population usually has non-zero probability of being selected.
This stochastic nature can aid escape from local optima!
- Don’t get too greedy, lest you get stuck!

Example: roulette wheel selection
03-MetaheuristicParts/03-roulette.png
In principle,
any selection mechanism can be used for parent selection,
as well as for survivor selection

1.4.4.2 Survivor selection A.k.a. replacement

Most EAs use fixed population size,
so need a way of going from (parents + offspring) to next generation.
Often deterministic (while parent selection is usually stochastic).

Fitness based: rank parents + new offspring, and take best.
Age based: create as many offspring as parents, and delete all/some parents.

Sometimes a combination of stochastic and deterministic (elitism).

Proportional selection:
Parents are chosen to mate,
with probability proportional to their fitness.

Traditionally, children replace their parents,
but if best parents remain, it is called “elitism”.

Many other variations now more commonly used:
roulette wheel, tournament selection, rank-based selection,
stochastic universal sampling, truncation selection, etc.

Overall principle: survival of the fittest.

In a classical, “generational” GA:
Based on fitness, choose the set of breeders (the “intermediate” population),
that in pairs, will be crossed over and possibly mutated,
with offspring replacing parents.
One individual may appear several times in the intermediate population,
or the next population.

1.4.5 Variation operators

Role:
to generate new candidate solutions, with more variety.

Usually divided into two types according to their arity (number of inputs/parents):
Arity 1 : mutation operators.
Arity >1 : recombination operators.
Arity = 2 : typically called crossover.
Arity > 2 : is formally possible, but seldom used in EC.

There has been much debate about relative importance of recombination and mutation.
Now, most EAs use both.
Variation operators (mutation, recombination) must match the given representation:
Tree might swap branches.
Strings might swap positions.
Numbers might randomize.

1.4.5.1 Mutation

Mutation is the first type of variation operator.

Role:
Causes small, random variance.

Acts on one genotype, and delivers another.
Element of randomness is essential,
and differentiates it from other unary heuristic operators.

Importance researchers ascribe to this operation depends on representation and historical dialect:
Binary GAs - mutation is background operator responsible for preserving and introducing diversity.
Evolutionary programming for FSM’s / continuous variables - mutation is the only search operator.
Genetic programming - mutation is hardly used.

May guarantee connectedness of search space,
and hence convergence proofs.

Discussion question:
Can you generate all possible solutions with enough time?

before
1 1 1 1 1 1 1
03-MetaheuristicParts/03-elephant.png

after
1 1 1 0 1 1 1
03-MetaheuristicParts/03-elephant2.png

1.4.5.2 Recombination

Second type of variation operator

Role:
Merges information from parents into offspring.

Choice of what information to merge is largely stochastic.
But, complements modularity in the genome.
Most offspring may be worse, or the same as the parents.
Hope is that some are better by combining elements of genotypes that lead to good traits.
Principle has been used for millennia by breeders of plants and livestock.

Parents
03-MetaheuristicParts/ch03-What_is_an_EA-20141.png
Mark middle of genome:
Parent 1: 1 1 1 | 1 1 1 1
Parent 2: 0 0 0 | 0 0 0 0

Offspring
03-MetaheuristicParts/ch03-What_is_an_EA-20142.png
Swap around mark to generate new genome:
Child 1: 1 1 1 | 0 0 0 0
Child 2: 0 0 0 | 1 1 1 1

1.4.6 Initialization, numbers, termination

Initialization is usually performed by creating a random population.
Need to ensure even spread, and mixture of possible allele values.
Can include existing solutions, or
use problem-specific heuristics, to “seed” the population (not usually worth the effort).

Need population size and generation size to be as large as computationally practical!

Termination condition checked every generation:

Reaching some (known/hoped for) fitness
Reaching some maximum allowed number of generations
Reaching some minimum level of diversity
Reaching some specified number of generations without fitness improvement

1.5 What are the different types of EAs

Historically different flavors of EAs have been associated with different data types to represent solutions

Binary strings: Genetic Algorithms
Real-valued vectors : Evolution Strategies
Finite state Machines: Evolutionary Programming
LISP trees: Genetic Programming

These differences are largely irrelevant, now the best modern strategy is to:
Choose representation to suit your problem.
Choose variation operators to suit your representation.

Selection operators, which choose survival and reproduction,
only use abstract fitness itself,
and so are independent of the representation you choose!

Discussion question:
Recall the quote at the top of the lecture.
Does it seem like EC can solve, in part,
the problem of learning to solve problems,
that is more broad than solving the problem itself
Is this always the case,
or dependent on what you use the EC for?

1.6 Examples

We’ll go over these now:

Evolving numbers
8-Queens
Knapsack

Remember:
03-MetaheuristicParts/03-ga_flowchart.png
03-MetaheuristicParts/ch03-What_is_an_EA-20140.png
The following major parts are to be defined for each problem:

Representation
- Encoding of Genotype -> Phenotype
- Variation operators
  - Mutation (and probability/rate: ):
  - Recombination (and probability/rate: ):
Fitness function (objective function)
Selection mechanisms
- Parent selection
- Survivor selection
Initialization
Parameter choices
- Population size
- Number of offspring
- etc.
Termination

1.6.1 Evolving numbers

A trivial optimization problem:
Maximizing the value of x² for integers,
in the range 0-31 (the full range of 5 bits).
Yes, this is a bit of a contrived example…
We’ll get to more realistic ones soon!
Must make design decisions regarding the EA components:
representation, parent selection, recombination, mutation, and survivor selection.

1.6.1.1 Design

The following major parts are required:

Representation:
- use a simple five-bit binary encoding mapping integers (phenotypes) to bit-strings (genotypes).
Fitness function:
- Integer value of x²
Selection mechanisms:
- Parent selection:
  - probability p_i that an individual i in population P of other individuals, j, is defined as:
    - \(p_i=f(i)/\sum_{j\epsilon P}f(j)\)
- Survivor selection:
  - As a simple choice, we can decide to replace the entire population in one go, by the offspring created from the selected parents.
  - All existing individuals are removed from the population, and all new individuals are added to it without comparing fitness values.
  - This implies that we will create as many offspring as there are members in the population.
Variation operators
- Mutation (probability: ):
  - Mutation is executed by generating a random number (from a uniform distribution over the range [0, 1]) in each bit position, and comparing it to a fixed threshold, usually called the mutation rate.
  - If the random number is below that rate, the value of the gene in the corresponding position is flipped.
- Recombination (probability: ):
  - Recombination is implemented by the classic one-point crossover.
  - This operator is applied to two parents and produces two children by choosing a random crossover-point along the strings and swapping the bits of the parents after this point.
Population size
- 4
Number of offspring
- 4
Initialization
- Random bit-strings
Termination
- 1 cycle this time…

1.6.1.2 Run 1 cycle

Initialize, parent selection
03-MetaheuristicParts/00-init_eval_parent-select.png

String number is an identification number for original string.
Initial pop is the actual individual in binary encoding (genes).
x is the decimal value (phenotype)
x² is the fitness evaluation function.
Prob_i is derivative of fitness, a proportion.
- \(p_i=f(i)/\sum_{j\epsilon P}f(j)\)
- e.g., 169/1170 = 0.14
Expected count is derived of Fitness (via un-rounded Prob_i)
- Population of 4 * Prob_i
- e.g., (169/1170)*4 = 0.58
Actual is a run instance (dependent on RNG):
- In a pool of 100, 14-1, 49-2, 6-3, 31-4, then draw 4, random with replacement:
Put the following code in files, run it, link it here.

How to implement random selection in an instance?

Basic idea:

#!/usr/bin/python3

import random

rand_sample: list[int] = [1] * 14 + [2] * 49 + [3] * 6 + [4] * 31
new_pop: list[int] = []
for i in range(4):
    new_pop.append(random.choice(rand_sample))

Random magic:

#!/usr/bin/python3

import random

rand_sample: list[int] = [1, 2, 3, 4]
weights: list[float] = [0.06, 0.14, 0.31, 0.49]
new_pop: list[int] = []
new_pop = random.choices(rand_sample, weights, k=4)

Hack to do this manually, numerically, more efficiently:

#!/usr/bin/python3

new_pop: list[int] = []

for i in range(4):
    rand_sample = random.random()  # [0, 1)
    # Cumulative probabilities:
    if rand_sample < 0.06:
        new_pop.append(3)
    elif rand_sample < (0.06 + 0.14):
        new_pop.append(1)
    elif rand_sample < (0.06 + 0.14 + 0.31):
        new_pop.append(1)
    elif rand_sample < (0.06 + 0.14 + 0.31 + 0.49):  # aka < 1
        new_pop.append(2)
    # last elif could just be else...
    # but this is explicit for readability.

Cross over
03-MetaheuristicParts/01-crossover_offspring-eval.png

1-2 and 2-4 “breed” (randomly paired)
1-2 cross over between positions 4 and 5
2-4 cross over between positions 2 and 3

Mutation
03-MetaheuristicParts/02-mutation_offspring-eval.png
(note a couple math errors in this one…)
In this case,
the mutations shown happen to have caused positive changes in fitness,
but we should emphasize that in later generations,
as the number of 1’s in the population rises,
mutation will be on average (but not always) deleterious.

Survival selection
Children only, no parents.

Check termination condition,
Rinse, repeat…

Code for x-squared optimization:
03-MetaheuristicParts/01_xsquared.py

1.6.2 Example: The 8-queens problem

https://en.wikipedia.org/wiki/Eight_queens_puzzle
Place 8 queens on an 8x8 chessboard in such a way that they cannot check each other.
03-MetaheuristicParts/03-8_queens.png

1.6.2.1 Representation?

What kind of data structure do we employ,
to represent a complete solution to the n-queens problem?

Phenotype: a board configuration
Genotype: a permutation of the numbers 1-8

1.6.2.2 Fitness evaluation?

Penalty of one queen/gene:
the number of queens she can check.

Penalty of a whole individual/configuration/solution:
the sum of penalties of all queens.

Penalty is to be minimized.
Fitness of a configuration:
inverse penalty to be maximized.

1.6.2.3 Mutation?

Small variation in one permutation, e.g.:
swapping values of two randomly chosen positions
03-MetaheuristicParts/03-variation.png

Discuss:
Why swapping instead of random single point mutation?

1.6.2.4 Recombination?

There are a variety of recombination options.
What is a permutation?
What is a set?
Our recombination should combine two permutations into two new permutations.

What happens if we just pick a position,
and swap the arms, like we did with x-squared above?

Instead, we choose a random crossover point.
Copy first parts into children (somewhat like before).
Create second part by inserting values from other parent (new option here).
in the order they appear there,
beginning after crossover point,
skipping values already in child.
03-MetaheuristicParts/03-recomb.png
The 2nd half is a subtle operation.
Make sure you understand it!

Important discussion:
Why permutation by the opposing strand?
Does it accomplish something similar to just swapping “arms”?
Does it still capitalize on modularity?
Can you think of any other ways to recombine a permutation?

1.6.2.5 Selection?

Parent selection (who breeds):
Pick 5 parents and take best two to undergo crossover.

Survivor selection (who gets replaced):
When inserting a new child into the population,
choose an existing member to replace by:
sorting the whole population by decreasing fitness,
enumerating this list from high to low, and
replacing the first with a fitness lower than the given child.

1.6.2.6 Summary

03-MetaheuristicParts/ch03-What_is_an_EA-20143.png
This configuration of an EC implementation is only one possible.
There are other sets of choices of operators and parameters.

Discussion questions:
What about when there is one fewer queen…?
Example: The 7-queens problem

Can you think of any other way we could have encoded solutions to this problem?
Would you guess those methods have been easier or harder to solve?

Code for n-queens:
03-MetaheuristicParts/02_nqueens.py

1.6.3 Example: 0-1 Knapsack?

https://en.wikipedia.org/wiki/Knapsack_problem
03-MetaheuristicParts/Knapsack.png

1.6.3.1 Problem

Given a set of items, each with a weight and a value,
determine the number of each item to include in a collection,
so that the total weight is less than or equal to a given limit,
and the total value is as large as possible.

There are a variety of types of knapsack problem.
The most common type being solved is the 0-1 knapsack problem,
which restricts the number x_i of copies of each kind of item to zero or one.

You are given a set of n items numbered from 1 up to n,
each with a weight w_i and a value v_i,
along with a maximum weight capacity W.

The task is to select a subset of those items,
that maximizes the sum of the values,
while keeping the summed cost within some capacity.

Maximize:
\(\sum_{i=1}^n v_i x_i\)

Subject to:
\(\sum_{i=1}^n w_i x_i \leq W and\ x_i \in \{0,1\}\)

1.6.3.2 Representation

Genotype and phenotype:

1.6.3.2.1 Genotype

We’ll start with a natural idea to represent candidate solutions for this problem.
Binary strings of length n, where:
a 1 in a given position indicates that an item is included, and
a 0 indicates that it is omitted.
Below, cols are items, and rows are individuals.
The corresponding genotype space G,
is the set of all such strings with size 2ⁿ,
which increases exponentially with the number of items considered.
This is the number of rows below.
Given a set of items, each with a weight and value,
we encode all the combinations of 3 items (2³)=8 possible genotypes/individuals:

1(w,v)	2(w,v)	3(w,v)	…(w,v)
0	0	1	…
0	0	0	…
0	1	1	…
0	1	0	…
1	0	1	…
1	0	0	…
1	1	1	…
1	1	0	…

Some of these will violate total weight, W,
and thus are invalid solutions.

Side-note:
It is often formulated with a c, cost as an equivalent generalization of w, weight.

1.6.3.2.2 Phenotype

We will review two possible phenotype encodings.

Possibility 1:
The first representation (in the sense of a mapping) that we consider,
takes the phenotype space P, and the genotype space G, to be identical.
The quality of a given solution p,
represented by a binary genotype g,
is thus determined by n summing the values of the included items, i.e.,
\(q(p) = \sum_{i=1}^n v_i g_i\)
However, this simple representation leads us to some immediate problems.
By using a one-to-one mapping between the genotype space G and the phenotype space P,,
individual genotypes may correspond to invalid solutions,
that have an associated cost greater than the capacity, i.e.,
\(\sum_{i=1}^n c_i g_i > Cmax\)
This issue is typical of a class of problems that we return to later in the class,
and a number of mechanisms have been proposed for dealing with it.

Possibility 2:
The second representation that we outline here solves this problem by employing a decoder function,
that breaks the one-to-one correspondence between the genotype space G, and the solution space P.
Our genotype representation remains the same,
but when creating a solution we read from left to right along the binary string,
and keep a running tally of the cost of included items.
When we encounter a value 1,
we first check the total cost,
to see if including the item would break our capacity constraint.
Rather than interpreting a value 1 as meaning include this item,
we interpret it as meaning include this item,
if and only if it does not take us over the cost constraint.
Then, the mapping from genotype to phenotype space many-to-one (as in bio),
since once the capacity has been reached,
the values of all bits to the right of the current position are irrelevant,
as no more items will be added to the solution.
This mapping ensures that all binary strings represent valid solutions,
with a unique fitness (to be maximized).

def decoder(individual_genome: list[Union[0,1]], w_max: int) -> float:
    v = 0
    w = 0
    for i in range(len(individual_genome)):
        if individual_genome[i] == 1:
            if w + w[i] < w_max:
                v = v + v[i]
                w = w + w[i]
    return v

Recall what we did with the n-queens permutation:
by removing large parts of the search space,
we improve the efficiency of our search process!

1.6.3.3 Variation operators

Having decided on a fixed-length binary representation,
we can now choose off-the-shelf variation operators from the GA literature,
because the bit-string representation is ‘standard’ there.

1.6.3.3.1 Recombination

A suitable recombination operator is the so-called one-point crossover,
where we align two parents,
pick a random point along their length.
Then at that point, exchange the tails,
creating two new offspring.
This is not necessarily optimal, but it works.

03-MetaheuristicParts/Single-point-crossover.png

Probability
We will apply this with 70% probability, i.e.,
for each pair of parents,
there is a 70% chance that we will create two offspring by crossover,
and 30% that the children will be just copies of the parents.

1.6.3.3.2 Mutation

A suitable mutation operator is so-called bit-flipping:
in each position we invert the value with a small probability,
p_m sampled in the range [0, 1).

Probability
Mutation rate of p_m = 1/n
On average that will change about one bit value in every offspring.
A bit is seen as an allele value for a single gene.
For example, 1/500 chance for any given base, for a population of 500.

import random

for i in range(len(individual_genome), n):
    chance = random.random()  ## (0, 1]
    if chance < 1 / n:
        # xor with 1 to flip bit:
        individual_genome[i] = individual_genome[i]^1

1.6.3.4 Population numbers

In this case we will create the same number of offspring as we have members in our initial population.
Population size: 500
Number of offspring: 500

1.6.3.5 Selection

Parent and survival:

1.6.3.5.1 Parent

We will use a tournament for selecting the parents:
We pick two members of the population at random (with replacement).
The one with the highest value q(p) wins the tournament and becomes a parent.

1.6.3.5.2 Survival

Generational scheme for survivor selection:
All of the older population in each iteration are discarded,
and parents replaced by their offspring.

1.6.3.6 Initialization

For our initial population,
for each position of the genome,
random choice of 0 and 1.

1.6.3.7 Termination

In this case, we do not know the maximum value that we can achieve.
We will run our algorithm until no improvement in the fitness,
of the best member of the population,
has been observed for 25 generations.

1.6.3.8 Summary

Our evolutionary algorithm to tackle this problem can be specified as below:

Representation
- Binary strings of length n
Recombination
- One-point crossover
- Recombination probability 70%
Mutation
- Each value inverted with independent probability p_m
- Mutation probability p_m = 1/n
Parent selection
- Best out of random 2
Survival selection
- Generational
Population size
- 500
Number of offspring
- 500
Initialization
- Random bit-strings of values 0 and 1
Termination condition
- No improvement in best, in last 25 generations
TODO: Write this knapsack, maybe as a class (it’s still x-squared).
03-MetaheuristicParts/03_knapsack.py

1.7 Typical EA behavior

Remember:
03-MetaheuristicParts/03-ga_flowchart.png
03-MetaheuristicParts/ch03-What_is_an_EA-20140.png
The following major parts are to be defined for each problem to be solved:

Representation
- Encoding of Genotype -> Phenotype
- Variation operators
  - Mutation (and probability/rate: ):
  - Recombination (and probability/rate: ):
Fitness function (objective function)
Selection mechanisms
- Parent selection
- Survivor selection
Initialization
Parameter choices
- Population size
- Number of offspring
- etc.
Termination

Discussion questions:
Which of the above parts do you think is the computational bottleneck, why?
Which of the above parts is the design (human creativity/difficulty) bottleneck, why?

+++++++++++++++ Cahoot-03-1

1.7.1 Stages / progression

Progression in optimizing on a 1-dimensional fitness landscape:

Early stage:
03-MetaheuristicParts/ch03-What_is_an_EA-20144.jpg
Quasi-random initial population distribution

Mid-stage exploration:
03-MetaheuristicParts/ch03-What_is_an_EA-20145.jpg
Population starts to arrange around/on hills.

Late stage exploitation:
03-MetaheuristicParts/ch03-What_is_an_EA-20146.jpg
Population concentrated on high hills.

1.7.2 Typical run: progression of fitness

What is the “Memory” of GA?

Each string has a goodness (fitness) or optimality.
This value determines its future influence on search process,
i.e., survival of the fittest.
Solutions which are good, are used to generate other,
similar solutions which may also be good (or even better).

The population at any time stores all we have learned about the solution,
accessible at any point.
You just access a good individual.
03-MetaheuristicParts/ch03-What_is_an_EA-20147.png
Typical run of an EA shows so-called “anytime behavior”.

1.7.3 Are long runs beneficial?

Answer:
It depends on how much you want the last bit of progress.
It may be pragmatically better to do more short runs,
with more time spend achieving rapid, effecient returns.
03-MetaheuristicParts/ch03-What_is_an_EA-20148.png
This is the notion of:
https://en.wikipedia.org/wiki/Diminishing_returns
Continued cost for less and less benefit likely has diminishing practical utility,
though it may yield marginally better solutions.

Discuss:
Are all problems going to show progressive improvement?
Might some runs demonstrate insight,
or sudden bursts of discovery?
Would this same principle apply in such situations?

1.7.4 Is it worth expending effort on smart initialization?

03-MetaheuristicParts/03-init.png
Answer: it depends.
Possibly good, if good solutions/methods exist.
Care is needed, see upcoming lectures on hybridization.

1.7.5 Evolutionary Algorithms in context

There are many views on the use of EAs as robust problem solving tools.
For most problems a problem-specific tool may:
perform better than a generic search algorithm on most instances,
but not do well on all instances.

Goal is to provide robust tools that provide:
evenly good performance,
and do so over a wide range of problems and instances.

1.7.6 EAs as problem solvers:

Goldberg view (1989)
03-MetaheuristicParts/03-prob_solvers.png

1.7.7 EAs and domain knowledge

Trend in the 1990’s:
Add problem specific knowledge to EAs,
(special variation operators, repair, etc).

Result: EA performance curve “deformation”:
better on problems of the given type,
worse on problems different from given type,
amount of added knowledge is variable.
Some recent learning and information theory suggests trade-offs,
where the search for an “all-purpose” algorithm may be of limited utility.

Discussion questions:

Do you think we discover a learning rule so general,
that it can match human generality?

Do you think it might have constraints,
in how it must be taught/handled/babied/trained to perform well?
Any intuition for what they may be?

1.7.8 EAs as problem solvers

Michalewicz view (1996)
03-MetaheuristicParts/03-michalewicz06.png

1.8 EC and global optimization

Global optimization problems:
search with goal of finding the best solution, _x*,_
out of some fixed set of solutions, S.

Deterministic approaches (classic AI):
Guarantee to find an optimal solution, _x*._

Complete enumeration of all s in S,
is guaranteed to be globally optimal…
but it’s brute force.

Can also order search through S.
e.g. box decomposition (branch and bound, etc)
May have bounds on run-time, usually super-polynomial

Heuristic Approaches (extensions to classic AI):
Cull the search space (and then generate and test).
Some estimation heuristic, helps to search the random space in a more directed way.
Must decide on some heuristic rules for deciding which x in S to generate next.

No bounds on run-time.
For some randomized heuristics, such as simulated annealing,
convergence proofs do in fact exist,
i.e., they are guaranteed to find _x*_ in many cases.
Unfortunately these algorithms are fairly weak,
in the sense that they will not identify _x*_ as being globally optimal,
rather as simply the best solution seen so far.

Let’s think about:
“I don’t care if it works as long as it converges”
vs.
“I don’t care if it converges as long as it works”

1.8.1 EC and neighborhood search

We can also use operators that impose some kind of structure onto the elements of S,
such that each point x has associated with it a set of neighbors N (x).
In our “landscape” the variables (traits) x and z were taken to be real-valued,
which imposes a natural structure on S.
03-MetaheuristicParts/ch02-EC_the_origins-20140.png
This looks like a continuous outcome above.
However, genomes may not always produce continuous fitness interpretations of the genome.
For example combinatorial optimization (int or binary genomes) can produce discontinuous spaces.
Imagine optimizing something more like this:
https://en.wikipedia.org/wiki/Ackley_function

Many heuristics impose a neighborhood structure on search space, S.
Such heuristics may guarantee that best point found is locally optimal,
e.g. Hill-Climbers.
Often heuristics very quick to identify good solutions.
But, problems often manifest as local optima.

Question:
What is the neighborhood in an EA?
What part of the meta-heuristic moves the search through the neighborhood with EC?
Is the entire search space reachable by any heuristic algorithm?

EAs are distinguished by:

Use of population,
Use of multiple, stochastic search operators,
Especially variation operators with arity > 1,
Stochastic selection

Probability distributions over the neighborhood structure:

The breeding population provides the algorithm with change,
a means of defining a nonuniform probability distribution function (p.d.f.),
governing the generation of new points from an existing S.

This p.d.f. reflects possible interactions between points in S,
which are currently represented in the population.
The interactions arise from the recombination of partial solutions,
from two or more members of the population (parents).

This potentially complex p.d.f. contrasts with:
the lack of any probability distribution in deterministic solvers,
the globally uniform distribution of blind random search, and
the locally uniform distribution used by many heuristic or stochastic algorithms,
such as simulated annealing, and various hill-climbing algorithms.

The ability of EAs to maintain a diverse set of points,
provides not only a means of escaping from local optima,
but also a means of coping with large and discontinuous search spaces.

In addition, as will be seen later,
if several copies of a solution can be generated, evaluated, and maintained in the population,
this provides a natural and robust way of dealing with problems,
where there is noise or uncertainty,
associated with the assignment of a fitness score to a candidate solution.

1.9 Why it works so well

Implicit parallelism:
Each evaluation provides information on many possible candidate solutions.

Modularity:
Class example, does crossing over help more with modular templates or with non-modular templates?

To generate something like the phrase in your first assignment:
“methinks it is like a weasel”
from a random sample of a paragraph in English,
or from that same sample with letters randomized.
Which is faster?

Building blocks:
“Short, low-order, and highly fit schemata are sampled, recombined, and re-sampled,
to form strings of potentially higher fitness…
we construct better and better strings,
from the best partial solutions of the past samplings.”
– David Goldberg, 1989

1.10 Extra reading

T. Bäck. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996
U. Bäck, H.-P. Schwefel. An overview of Evolutionary Algorithms for parameter optimisation. Evolutionary Computation, 1:1 pp. 1–23, 1993. One of the first papers to really unify historically isolated strands of EA research within a common framework.
A.E. Eiben. Evolutionary computing: the most powerful problem solver in the universe? Dutch Mathematical Archive (Nederlands Archief voor Wiskunde), 5/3:2 pp. 126–131, 2002
Available at http://www.cs.vu.nl/~gusz/papers/ec-intro-naw.pdf
D.B. Fogel. Evolutionary Computation. IEEE Press, 1995
M.S. Hillier, F.S. Hillier. Conventional optimization techniques.
Chapter 1, pp. 3–25 in Sarker et al., Eds. Evolutionary Optimization, Kluwer Academic Publishers, Boston. 2002. Provides a background on other methods to place EC in context
X.~Yao. Evolutionary computation: A gentle introduction. Chapter 2, pp.~27–53 in Sarker et al., Eds. Evolutionary Optimization, Kluwer Academic Publishers, Boston. 2002.

1.11 Exercises

How big is the phenotype space for the eight-queens problem?
Try to design an incremental evolutionary algorithm for the eight-queens problem.
That is, a solution must represent a way to place the queens
on the chess board one by one. How big is the search space in your design?
Explain why the order in which items are listed in the representation is unimportant for the naive approach to the knapsack problem, but makes a big difference if we use the decoder approach.
Find a problem where EAs would certainly perform very poorly compared to alternative approaches. Explain why you expect this to be the case.
Consider the beginning phase of an evolutionary search process as depicted in the image below.
- Is exploration or exploitation the dominant force in this stage? What about the end phase?

Next: 04-RepresentMutateRecombine.html