Differential Evolution for Hyperparameter Search

Quick Overview
You will implement a simple generic Differential Evolution search
You will apply this generic algorithm to two problems
You are provided with scaffolding code that you will need to complete
You will also perform experiments and report on your results
Introduction
In machine learning, a hyperparameter is a parameter whose value is set before the learning process
begins. By contrast, the values of other parameters like the weights of a neural network are derived
via training. Examples of hyperparameters for neural networks include the number of layers, the
number of neurons in each layer and the learning rate.
Different training algorithms require different hyperparameters. Some simple algorithms (such as
ordinary least squares regression) require none. The optimal values for the hyperparameters depend
also on the dataset. In this assignment, you will implement a search algorithm to find good values
for a small set of hyperparameters of a neural network.
Differential Evolution (DE) is an efficient and high performing optimizer for real-valued functions.
DE is the search technique that you are required to use in this assignment to solve the optimization
problems. As DE is based on evolutionary computing, it performs well on multi-modal,
discontinuous optimization landscapes. Another appealing feature of DE is that it is still applicable
even if the function we try to optimize is not differentiable. This is often the case when some
simulation is involved in the computation of the real-valued function being optimized. In these
scenarios, a gradient descent approach is not applicable. DE performs a parallel search over a
population of fixed size, where each population member is a higher dimensional vector.
In this assignment, the core problem is to adapt DE to perform a search for four hyperparameters of
a neural network. In the next section, we describe in more details DE.
Differential Evolution
Let f(w) be the cost function which must be minimized. Note that maximization can be performed
by considering the function -f . The function f(w) takes a candidate solution as argument in the form
of a vector w of real numbers and produces a real number as output which indicates the cost of the 

There exist many variations of the DE algorithm. The variation that you have to implement has
been chosen for its simplicity. In the rest of this section, variable names written in blue italic
correspond to Python program variables used in the provided scaffolding code.
We constrain the search by specifying bounds for each component of with a list of pairs called
bounds in the scaffolding code. That is, we impose
bounds[k][0] <= w[k] <= bounds[k][1]
for k in range(len(w))
For example, if bounds is [(1,100),(1,100),(-6,2),(-6,1)] , then w[1] is constrained to the interval
[1, 100] and w[2] is constrained to the interval [-6, 2].
To make the code generic, we normalize each component of to the interval [0, 1] by mapping
with an affine function the interval [ bounds[k][0], bounds[k][1] ] to the interval [0, 1].
We first initialize all individuals of the initial population with random points from the search-
space.
Until a termination criterion is met (maximum number of iterations/generations done, or acceptable
cost is reached), we repeat the following loop:
For each individual in the population do:
Pick three distinct individuals aand from the current population at random. The
individuals aand must be distinct from as well.
Create a mutant vector mut * (– c) where mut is a constant (passed as an argument to
the differential_evolution function)
Clip the mutant entries to the interval [0, 1]
Create a trial vector by assigning to trial[k] with probability crossp the value mutant[k]
and probability 1-crossp the value of w[k]
Evaluate the cost of the trial vector. If the trial vector is better than w, then replace by the
trial vector
In other words, for each individual in the current population, we create a challenger trial vector.
If this challenger is better than w, then is replaced by the trial vector .
The mutation factor mut with and the crossover constant crossp are numerically defined in the
scaffolding code.
Scaffolding Code
The file “my_submission.py” contains scaffolding code. The key function is differential_evolution.
This function return a generator. You should have seen generators in the Python
tutorial
To validate your implementation of the function differential_evolution, you should first test and
debug it with the function task_1 as it tries to solve a simpler problem than task_2.
In Task 1, you fit a polynomial to a noisy dataset. This is the same problem as you have seen in the
early pracs where you solve this problem with the pseudo-inverse and a gradient descent approach.

Leave a Reply

Your email address will not be published. Required fields are marked *