Package io.jenetics.prog


package io.jenetics.prog

Example

The following example shows how to solve a GP problem with Jenetics. We are trying to find a polynomial (or an arbitrary mathematical function) which approximates a given data set.
Sample points
xy
0.000.0000
0.100.0740
0.200.1120
0.300.1380
......
The sample points has been created with the function f(x) = 4*x^3 - 3*x^2 + x. The knowledge of the creating function makes it easier to compare the quality of the evolved function. For the example we created 21 data points.

NOTE: The function which created the sample points is not needed in the error function we have to define for the GP. It just let us verify the final, evolved result.

As first step, we have to define the set of allowed non-terminal and the terminal operations the GP is working with. Selecting the right set of operation has a big influence on the performance of the GP. If the operation set is bigger than necessary, we will expand the potential search space, and the execution time for finding a solution. For our polynomial example we will chose the following operations and terminals.
static final ISeq<Op<Double>> OPERATIONS = ISeq.of(
    MathOp.ADD,
    MathOp.SUB,
    MathOp.MUL
);

static final ISeq<Op<Double>> TERMINALS = ISeq.of(
    Var.of("x", 0),
    EphemeralConst.of(() -> (double)RandomRegistry.getRandom().nextInt(10))
);
The chosen non-terminal operation set is sufficient to create any polynomial. For the terminal operations, we added a variable "x", with index zero, and an ephemeral integer constant. The purpose of the ephemeral constant is to create constant values, which will differ for every tree, but stay constant within a tree.

In the next step define the fitness function for the GP, which will be an error function we will minimize.

// The lookup table where the data points are stored.
static final double[][] SAMPLES = new double[][] {
    {-1.0, -8.0000},
    {-0.9, -6.2460}
    // ...
};

static double error(final ProgramGene<Double> program) {
    return Arrays.stream(SAMPLES).mapToDouble(sample -> {
        final double x = sample[0];
        final double y = sample[1];
        final double result = program.eval(x);
        return abs(y - result) + program.size()*0.00005;
    })
    .sum();
}
The error function calculates the sum of the (absolute) difference between the sample value and the value calculated the by the evolved program (ProgramGene). Since we prefer compact programs over complex one, we will add a penalty for the program size (the number of nodes of the program tree).

CAUTION: The penalty for the tree size must be small enough to not dominate the error function. We still want to find an approximating function and not the smallest possible one

After we have defined the error function, we need to define the proper Codec.
static final Codec<ProgramGene<Double>, ProgramGene<Double>> CODEC =
    Codec.of(
        Genotype.of(ProgramChromosome.of(
            // Program tree depth.
            5,
            // Chromosome validator.
            ch -> ch.root().size() <= 50,
            OPERATIONS,
            TERMINALS
        )),
        Genotype::gene
    );
There are two particularities in the definition of the ProgramChromosome:
  1. Since we want to narrow the search space, we are limiting the depth of newly created program trees to 5.
  2. Because of crossover operations performed during evolution, the resulting programs can grow quite big. To prevent an unlimited growth of the program trees, we mark programs with more than _50_ nodes as invalid.
Now we are ready to put everything together:
public static void main(final String[] args) {
    final Engine<ProgramGene<Double>, Double> engine = Engine
        .builder(Polynomial::error, CODEC)
        .minimizing()
        .alterers(
            new SingleNodeCrossover<>(),
            new Mutator<>())
        .build();

    final ProgramGene<Double> program = engine.stream()
        .limit(500)
        .collect(EvolutionResult.toBestGenotype())
        .gene();

    System.out.println(Tree.toString(program));
}
The GP is capable of finding the polynomial which created the sample data. After a few tries, we got the following (correct) output program:
 add
 ├── mul
 │   ├── x
 │   └── sub
 │       ├── 0.0
 │       └── mul
 │           ├── x
 │           └── sub
 │               ├── sub
 │               │   ├── sub
 │               │   │   ├── sub
 │               │   │   │   ├── 3.0
 │               │   │   │   └── x
 │               │   │   └── x
 │               │   └── x
 │               └── x
 └── x
 
This program can be reduced to 4*x^3 - 3*x^2 + x, which is exactly the polynomial, which created the sample data.
Since:
3.9
Version:
3.9