1 08-ParameterControl

Can we optimally vary parameters over the run?

1.1 Introduction

1.1.1 Parameter Control

Motivation
Parameter setting
- Tuning
- Control
Examples
Where to apply parameter control
How to apply parameter control

1.1.2 Motivation

An EA has many strategy parameters, e.g.

mutation operator and mutation rate
crossover operator and crossover rate
selection mechanism and selective pressure (e.g. tournament size)
population size

Good parameter values facilitate good performance.

Q1: How to find good parameter values?
EA parameters are rigid (constant during a run).
But, an EA is a dynamic, adaptive process.
Thus, optimal parameter values may vary during a run.

Q2: How to vary parameter values?

1.1.3 Parameter Settings

08-ParameterControl/ch08-Parameter_Control-20140.png

1.1.4 Tuning

Parameter tuning:
the traditional way of testing and comparing different values before the “real” run.

Problems:

users mistakes in settings can be sources of errors or sub-optimal performance
parameters interact: exhaustive search is not practicable
costs much time even with “smart” tuning
good values may become bad during the run

1.1.5 Control

Parameter control:
setting values on-line, during the actual run , e.g.

predetermined time-varying schedule p = p(t)
using feedback from the search process
encoding parameters in chromosomes and rely on selection

Problems:
* finding optimal p is hard, finding optimal p(t) is harder
* still user-defined feedback mechanism, how to “optimize”?
* when would natural selection work for strategy parameters?

1.2 Initial examples

We’ve covered some of these previously:

1.2.1 Varying mutation step size

Task to solve:
- min f(x₁,…,x_n)
- L_i ≤ x_i ≤ U_i for i = 1,…,n bounds
- g_i (x) ≤ 0 for i = 1,…,q inequality constraints
- h_i (x) = 0 for i = q+1,…,m equality constraints
Algorithm:
- EA with real-valued representation (x₁,…,x_n)
- arithmetic averaging crossover
- Gaussian mutation: x’_i = x_i + N(0, σ)
- standard deviation σ is called mutation step size

1.2.1.1 option 1 deterministic

Replace the constant σ by a function σ(t)
\(\sigma(t) = 1 - 0.9 \frac{t}{T}\)
0 ≤ t ≤ T is the current generation number
Features:
- changes in σ are independent from the search progress
- strong user control of σ by the above formula
- σ is fully predictable
- a given σ acts on all individuals of the population

1.2.1.2 option 2 adaptive

Replace the constant σ by a function σ(t) updated after every n steps by the 1/5 success rule
Remember 1/5=0.2 …
\(\sigma(x) = \sum^mj=1 \left\{ \begin{array}{cc} \sigma(t - n) / c, & if\ p_s > 0.2 \\ \sigma(t - n) * c, & if\ p_s < 0.2 \\ \sigma(t - n), & otherwise \end{array} \right\}\)
p_s is the % of successful mutations
constant c is \(0.8 \leq c \leq 1\)
Features:
- changes in σ are based on feedback from the search progress
- some user control of σ by the above formula
- σ is not predictable
- a given σ acts on all individuals of the population

1.2.1.3 option 3 self-adaptive evolving

Assign a personal σ to each individual
Incorporate this σ into the chromosome: (x₁, …, x_n, σ)
Apply variation operators to x_i’s and σ
Features:
- changes in σ are results of natural selection
- (almost) no user control of σ
- σ is not predictable
- a given σ acts on one individual

1.2.1.4 option 4 self-adaptive evolving more

Assign a personal σ to each variable in each individual
Incorporate σ’s into the chromosomes: (x₁, …, x_n, σ₁, …, σ_n)
Apply variation operators to x_i’s and σ_i’s
Features:
- changes in σ_i are results of natural selection
- (almost) no user control of σ_i
- σ_i is not predictable
- a given σ_i acts on one gene of one individual

1.2.2 Varying penalties

Constraints in CSP
- g_i(x) ≤ 0 for i = 1,…,q inequality constraints
- h_i(x) = 0 for i = q+1,…,m equality constraints
are handled by penalties:
eval (x) = f(x) + W × penalty(x)
where
\(penalty(x) = \sum^m_{j=1} \left\{ \begin{array}{cc} 1 & for\ violated\ constraint, \\ 0 & for\ satisfied\ constraint. \\ \end{array} \right\}\)

1.2.2.1 option 1 deterministic

Replace the constant W by a function W(t):
W(t) = (C x T)^σ
0 ≤ t ≤ T is the current generation number
Features:
- changes in W independent from the search progress
- strong user control of W by the above formula
- W is fully predictable
- a given W acts on all individuals of the population

1.2.2.2 option 2 adaptive

Replace the constant W by W(t) updated in each generation
\(W(t + 1) = \left\{ \begin{array}{cc} \beta \cdot W(t) & if\ last\ k\ champions\ all\ feasible\ \\ \gamma \cdot W(t) & if\ last\ k\ champions\ all\ infeasible\ \\ W(t), & otherwise \end{array} \right\}\)
β < 1, γ > 1, β × γ ≠ 1 champion: best of its generation
Features:
- changes in W are based on feedback from the search progress
- some user control of W by the above formula
- W is not predictable
- a given W acts on all individuals of the population

1.2.2.3 option 3 self-adaptive evolving

Assign a personal W to each individual
Incorporate this W into the chromosome: (x₁, …, x_n, W)
Apply variation operators to x_i’s and W
Alert:
val((x, W)) = f(x) + W × penalty(x)
while for mutation step sizes we had
eval((x, σ))= f(x)
this option is thus sensitive “cheating” ⇒ makes no sense

However, one could use an objective tournament to evaluate fitness,
and still evolve the fitness function.

1.2.3 Past lessons learned

Various forms of parameter control can be distinguished by:
primary features:
- what component of the EA is changed
- how the change is made
secondary features:
- evidence /data backing up changes
- level/scope of change

Various forms of parameter control discussed above can be distinguished by:

	σ(t) = 1-0.9*t/T	σ’ = σ/c, if r > ⅕ …	(x1, …, xn, σ)	(x1, …, xn, σ1, …, σn)	W(t) = (C*t)α	W’=β*W, if bi∈F	(x1, …, xn, W)
What	Step size	Step size	Step size	Step size	Penalty weight	Penalty weight	Penalty weight
How	Deterministic	Adaptive	Self-adaptive	Self-adaptive	Deterministic	Adaptive	Self-adaptive
Evidence	Time	Successful mutations rate	(Fitness)	(Fitness)	Time	Constraint satisfaction history	(Fitness)
Scope	Population	Population	Individual	Gene	Population	Population	Individual

1.3 Classification of control methods

What is changed?
How are changes made?
What evidence informs the change?
What is the scope of the change?

1.3.1 1. Where to apply parameter control?

What is changed?

Practically any EA component can be parameterized,
and thus controlled on-the-fly:

representation
evaluation function
variation operators
selection operator (parent or mating selection)
replacement operator (survival or environmental selection)
population (size, topology)

Note that each component can be parameterized,
and that the number of parameters is not clearly defined.
Despite the somewhat arbitrary character of this list of components,
and of the list of parameters of each component,
we will maintain the ‘what-aspect’ as one of the main classification features,
since this allows us to locate where a specific mechanism has its effect.

1.3.2 2. How to apply parameter control?

How are changes made?

Three major types of parameter control:

1.3.2.1 Deterministic

some rule modifies strategy parameter without feedback from the search,
based on some counter)

1.3.2.2 Adaptive

feedback rule based on some measure monitoring search progress

1.3.2.3 Self-adaptative

parameter values evolve along with solutions;
encoded into chromosomes, and they undergo variation and selection

1.3.3 3. What evidence informs changes?

The parameter changes may be based on:
- time or number of evaluations (deterministic control)
- population statistics (adaptive control)
  - progress made
  - population diversity
  - gene distribution, etc.
- relative fitness of individuals created with given values (adaptive or self-adaptive control)

1.3.3.1 Absolute evidence

predefined event triggers change, e.g.
- increase p_m by 10% if population diversity falls under threshold x
Direction and magnitude of change is fixed

1.3.3.2 Relative evidence

compare values through solutions created with them, e.g.
- increase p_m if top quality offspring came by high mutation rates
Direction and magnitude of change is not fixed

1.3.4 4. What is the scope/level of the change?

The parameter may take effect on different levels:

environment (fitness function)
population
individual
sub-individual

Note: given component (parameter) determines possibilities

Thus: scope/level is a derived or secondary feature in the classification scheme

1.4 Refined taxonomy

08-ParameterControl/ch08-Parameter_Control-20141.png
Combinations of types and evidences
* Possible: +
* Impossible: -

1.5 Evaluation/Summary

Parameter control offers the possibility to use appropriate values in various stages of the search
Adaptive and self-adaptive parameter control
- offer users “liberation” from parameter tuning
- delegate parameter setting task to the evolutionary process
- the latter implies a double task for an EA: problem solving + self-calibrating (overhead)

1.6 More examples

Of each “What”?

1.6.1 Representation

L.D. Whitley, K.E. Mathias, and P. Fitzhorn.
Delta coding: An iterative search strategy for genetic algorithms,.
In Belew and Booker [46], pages 77–84.

We illustrate variable representations with the delta coding algorithm of Mathias and Whitley,
which effectively modifies the encoding of the function parameters.
The motivation behind this algorithm is to:
maintain a good balance between fast search and sustaining diversity.
In our taxonomy, it can be categorised as an adaptive adjustment of the representation,
based on absolute evidence.

The GA is used with multiple restarts;
the first run is used to find an interim solution,
and subsequent runs decode the genes as distances (delta values) from the last interim solution.
This way each restart forms a new hypercube,
with the interim solution at its origin.
The resolution of the delta values can also be altered at the restarts,
to expand or contract the search space.
Population density can be measured,
by the Hamming distance between the best and worst strings of the current population.
The restarts are triggered when population diversity
is not greater than 1.
The sketch of the algorithm showing the main idea is given below.
Note that the number of bits for δ can be increased if the same solution INTERIM is found.

BEGIN
    /* given a starting population and genotype-phenotype encoding */
    WHILE (HD > 1) DO
        RUN GA with k bits per object variable;
    OD
    REPEAT UNTIL (global termination is satisfied) DO
        save best solution as INTERIM;
        reinitialise population with new coding;
        /* k-1 bits as the distance δ to the object value in */
        /* INTERIM and one sign bit */
        WHILE (HD > 1) DO
            RUN GA with this encoding;
        OD
    OD
END

1.6.2 Evaluation function

A.E. Eiben and J.K. van der Hauw.
Solving 3-SAT with adaptive genetic algorithms.
In ICEC-97 [231], pages 81–86.

Evaluation functions are typically not varied in an EA,
because they are often considered as part of the problem to be solved,
and not as part of the problem-solving algorithm.
In fact, an evaluation function forms the bridge between the two,
so both views are at least partially true.
In many EAs, the evaluation function is derived from the (optimization) problem at hand,
with a simple transformation of the objective function.
In the class of constraint satisfaction problems, however,
there is no objective function in the problem definition.
One possible approach here is based on penalties.
Let us assume that we have m constraints c_i(i ∈ {1, . . . , m})
and n variables v_j(j ∈ {1, . . . , n}) with the same domain S.
Then the penalties can be defined as follows:
\(f(\bar{s}) = \sum_{i=1}^m \chi (\bar{s}, c_i),\)
where w_i is the weight associated with violating c_i, and
\(\chi(\bar{s}, c_i) = \left\{ \begin{array}{cc} 1 & if\ \bar{s}\ violated\ c_i, \\ 0 & otherwise. \\ \end{array} \right\}\)
Obviously, the setting of these weights has a large impact on the EA performance,
and ideally w_i should reflect how hard c_i is to satisfy.
The problem is that:
finding the appropriate weights requires much insight into the given problem instance,
and therefore it might not be practicable.
The step-wise adaptation of weights (SAW) mechanism,
introduced by Eiben and van der Hauw,
provides a simple and effective way to set these weights.
The basic idea behind the SAW mechanism is that,
constraints that are not satisfied after a certain number of steps (fitness evaluations),
must be difficult, and thus must be given a high weight (penalty).
SAW-ing changes the evaluation function adaptively in an EA,
by periodically checking the best individual in the population,
and raising the weights of those constraints this individual violates.
Then the run continues with the new evaluation function.
A nice feature of SAW-ing is that:
it liberates the user from seeking good weight settings,
thereby eliminating a possible source of error.
Furthermore, the used weights reflect the difficulty of constraints,
for the given algorithm, on the given problem instance,
in the given stage of the search.
This property is also valuable since, in principle,
different weights could be appropriate for different algorithms.

1.6.3 Mutation

Oliver Kramer.
Evolutionary self-adaptation: a survey of operators and strategy parameters.
Evolutionary Intelligence, 3(2):51–65, 2010.

A large majority of work on adapting or self-adapting EA parameters concerns variation operators:
mutation and recombination (crossover).
The 1/5 rule of Rechenberg we discussed earlier,
constitutes a classic example for adaptive mutation step size control in ES.
Furthermore, self-adaptive control of mutation step sizes is traditional in ES

1.6.4 Crossover

L. Davis, editor.
Handbook of Genetic Algorithms.
Van Nostrand Reinhold, 1991.

The classic example for adapting crossover rates in GAs is Davis’s adaptive operator fitness.
The method adapts the rates of crossover operators,
by rewarding those that are successful in creating better offspring.
This reward is diminishingly propagated back to operators of a few generations back,
who helped set it all up;
the reward is an increase in probability at the cost of other operators.
The GA using this method applies several crossover operators simultaneously,
within the same generation, each having its own crossover rate p_c(op_i).
Additionally, each operator has its local delta value, d_i,
that represents the strength of the operator,
measured by the advantage of a child created by using that operator,
with respect to the best individual in the population.
The local deltas are updated after every use of operator i.
The adaptation mechanism recalculates the crossover rates periodically,
redistributing 15% of the probabilities biased by the accumulated operator strengths,
that is, the local deltas.
To this end, these d_i values are normalized,
so that their sum equals 15, yielding d^norm_i for each i.
Then the new value for each p_c(op_i) is 85% of its old value, and its normalized strength:

\(p_c(op_i) = 0.85 \cdot p_c(op_i) + d^{norm}_i\)

Clearly, this method is adaptive based on relative evidence.

1.6.5 Selection

S.W. Mahfoud.
Boltzmann selection.
In Bäck et al., pages C2.5:1–4.

T. Bäck, D.B. Fogel, and Z. Michalewicz, editors.
Handbook of Evolutionary Computation.
Institute of Physics Publishing, Bristol, and Oxford University
Press, New York, 1997.

https://en.wikipedia.org/wiki/Simulated_annealing

todo

1.6.6 Population

J. Arabas, Z. Michalewicz, and J. Mulawka. GAVaPS – a genetic algorithm
with varying population size. In ICEC-94 [229], pages 73–78.

Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs.
Springer, 3rd edition, 1996.

todo

1.6.7 Multiple simultaneously

T. Bäck.
U. The interaction of mutation rate, selection and self-adaptation within a genetic algorithm.
V. In Männer and Manderick [282], pages 85–94.

T. Bäck, A.E. Eiben, and N.A.L. van der Vaart.
U. An empirical study on GAs “without parameters”.
V. In Schoenauer et al. [368], pages 315–324.

todo

1.7 Discussion

Tuning works, and increases the search space.
Control works, and increases it further.
Be systematic in both approaches!