Introduction to genetic programming and evolving expressions using genetic programming

What is genetic programming and how genetic programming can evolve mathematical expressions using given data


Category: Machine Learning Tags: Python, Python 3

Evolve expression using Genetic Programming Code Files

Introduction

    Genetic programming idea is taken from biology, as name suggest a program can be improved from one generation to another. It actually starts with initial set of programs that may be handpicked or randomly generated then programs compete with each other over given input and expected output data. Top performing programs are picked, mutation and breeding is performed on them to generate next generation. Next generation compete with each other, the process goes on until perfect program is evolved.

Genetic Programming Model
Fig 1: Genetic Programming Model


Program as tree: In generic programming we build program using nodes, a node can be parent or leaf and program can be interpreted as tree. A parent node may do operations like if, addition, multiplication etc.

Program as tree
Fig 2: Program as tree

Implementation

    We are going to evolve an expression using genetic programming, suppose we have an expression Z = X^2 + 5Y + 1. I am using Jupyter Notebook to implement this. Import below methods.

from random import random, randint, choice
from copy import deepcopy
from math import log

Below is the function returns value of the Z

def expression(x, y):
    return x**2+5*y+1

If we run above function with random values of x and y:

def builddataset():
    rows=[]
    for i in range(200):
        x=randint(0,40)
        y=randint(0,40)
        rows.append([x,y,expression(x,y)])
    return rows

Will get data set having values of x, y and z as below:

[[7, 20, 150],

 [8, 5, 90],

 [18, 9, 370],…]

The problem statement comes when we are given the above data but the expression is unknown and we are asked to evolve an expression which calculates z by substituting x, y. Let’s see how we can evolve the expression using genetic programming. Create a class fwrapper.

class fwrapper:
    def __init__(self, function, paramcount, name):
        self.function = function
        self.paramcount = paramcount
        self.name = name

Create a class node.

# Node of the tree
class node:
    def __init__(self, fw, children):
        self.function = fw.function
        self.name = fw.name
        self.children = children
        
    def evaluate(self, inp):
        results = [n.evaluate(inp) for n in self.children]
        return self.function(results)
    
    def display(self, indent=0):
        print((' '*indent)+self.name)
        for c in self.children:
            c.display(indent+1)

A node consists a function, children and node name. you can see evaluate is a bottom up approach which first evaluates its children the evaluate itself. Two more nodes which does not have function but returns a value.

# Parameter node returns parameter itself
class paramnode:
    def __init__(self, idx):
        self.idx = idx
    
    def evaluate(self, inp):
        return inp[self.idx]
                   
    def display(self, indent=0):
        print('%sp%d' % (' '*indent,self.idx))
# Constant node returns a constant value
class constnode:
    def __init__(self, v):
        self.v = v
        
    def evaluate(self, inp):
        return self.v
    
    def display(self, indent=0):
        print('%s%d' % (' '*indent,self.v))

paramnode returns one of the parameters passed and constnode returns what constant is passed to it. Lets create a tree, first we create function wrappers using lambda:

addw = fwrapper(lambda l: l[0] + l[1], 2, 'add')
subw = fwrapper(lambda l: l[0] - l[1], 2, 'sub')
mulw = fwrapper(lambda l: l[0] * l[1], 2, 'mul')

Or with function definitions if function is complicated

def iffun(l):
    if(l[0] > 0):
        return l[1]
    else:
        return l[2]
def isgrater(l):
    if(l[0] > l[1]):
        return 1
    else:
        return 0
ifw = fwrapper(iffun, 3, 'if')
isgraterw = fwrapper(isgrater, 2, 'isgrater')

List of function wrappers below:

flist = [addw, subw, mulw, ifw, isgraterw]

Now we can create a tree:

def exampletree():
    return node(ifw, [
        node(isgraterw, [paramnode(0), constnode(3)]),
        node(addw, [paramnode(1), constnode(5)]),
        node(subw, [paramnode(1), constnode(2)])
    ])
exampletree().evaluate([0,3])

Output

1

Display the tree:

exampletree().display()

Output

if

 isgrater

  p0

  3

 add

  p1

  5

 sub

  p1

  2

Create a score function which can give difference between expected and actual values calculated by a tree, if function return 0 means the tree has perfect expression which produces correct results for each input.

def scorefunction(tree, dataset):
    difference = 0
    for row in dataset:
        z = tree.evaluate([row[0], row[1]])
        # Difference betwen actual and evaluated value
        difference += abs(z - row[2])
    return difference

Now we will write a method which produces random tree.

def makerandomtree(param_count,maxdepth=4,fun_prob=0.5,param_prob=0.6):
    if random()<fun_prob and maxdepth>0:
        # Choosing a random function
        function=choice(flist)
        # Random children of node
        children=[makerandomtree(param_count,maxdepth-1,fun_prob,param_prob) for i in range(function.paramcount)]
        return node(function,children)
    elif random()<param_prob:
        return paramnode(randint(0,param_count-1))
    else:
        return constnode(randint(0,10))

Above method will generate a random tree with maximum depth of 4. Let’s create a random tree and try to generate the score.

random_tree = makerandomtree(2)
scorefunction(random_tree, builddataset())

Output

131214

makerandomtree can be used to create initial population, we have to mutate and breed them to build next generations so let’s create mutate and breed methods.

def mutate(tree,param_count,probchange=0.1):
    if random()<probchange:
        return makerandomtree(param_count)
    else:
        result=deepcopy(tree)
        if isinstance(tree,node):
            # Changing children of input tree(mutating)
            result.children=[mutate(child,param_count,probchange) for child in tree.children]
    return result

Mutation is changing a part of tree slightly. Let’s mutate random_tree previously generated and see the difference:

random_tree.display()

Output

mul

 p0

 add

  isgrater

   mul

    p0

    p0

   5

  add

   sub

    p1

    p1

   isgrater

    p0

    p1

muttree=mutate(random_tree,2)
muttree.display()

Output

mul

 p0

 add

  isgrater

   mul

    p0

    p0

   5

  add

   sub

    p1

    p1

   isgrater

    p0

    2

You can see last node of the mutated tree is different. We can generate it’s score too and if score is less than above one we can say mutated tree is better than original.

def breed(tree1,tree2,probswap=0.7,top=1):
    if random()<probswap and not top:
        return deepcopy(tree2)
    else:
        result=deepcopy(tree1)
        if isinstance(tree1,node) and isinstance(tree2,node):
            # Replacing tree1 children with tree2
            result.children=[breed(child,choice(tree2.children),probswap,0) for child in tree1.children]
        return result

Breeding is taking some nodes from two different trees to make a new one. In above method we take two tree and randomly take nodes from both of the trees to create new one.

Let’s create two random trees, breed them and see the difference.

tree1 = makerandomtree(2)
tree2 = makerandomtree(2)
tree1.display()

Output

isgrater

 add

  if

   add

    p0

    p0

   isgrater

    6

    7

   0

  if

   sub

    3

    p0

   p0

   add

    6

    p1

 isgrater

  add

   p0

   isgrater

    5

    p0

  p1

tree2.display()

Output

mul

 isgrater

  sub

   9

   sub

    p1

    p1

  if

   p0

   sub

    p0

    p1

   isgrater

    p0

    p1

 0

newbreed = breed(tree1, tree2)
newbreed.display()

Output

 add

  sub

   9

   sub

    p1

    p1

  sub

   9

   sub

    p1

    p1

 0

We can see above tree has nodes from both of the trees. New we will write a function which can rank list of trees on a dataset:

def getrankfunction(dataset):
    def rankfunction(population):
        scores=[(scorefunction(t,dataset),t) for t in population]
        # Sorting based on score
        scores.sort(key=lambda val: val[0])
        return scores
    return rankfunction

The last function “evolve” which make use of all above methods. This method first generate the initial population rank them then mutate and breed the top trees to generate next generation until error score reaches zero or loop ends.

def evolve(param_countc,population_size,rankfunction,maxgen=500,mutationrate=0.1,breedingrate=0.4,exp_prob=0.7,new_prob=0.05):
    # Returns a random number, always lower numbers.
    def selectindex():
        return int(log(random())/log(exp_prob))
    # Create a random initial population
    population=[makerandomtree(param_countc) for i in range(population_size)]
    for i in range(maxgen):
        scores=rankfunction(population)
        print(scores[0][0])
        if scores[0][0]==0: break
        # The two best always make it
        newpop=[scores[0][1],scores[1][1]]
        # Build the next generation
        while len(newpop)<population_size:
            if random()>new_prob:
                newpop.append(mutate(
                    breed(scores[selectindex()][1],
                    scores[selectindex()][1],
                    probswap=breedingrate),
                    param_countc,probchange=mutationrate))
            else:
            # Add a random node to mix things up
                newpop.append(makerandomtree(param_countc))
        population=newpop
    scores[0][1].display()
    return scores[0][1]

Let’s run above method to generate tree which represents our expression.

dataset = builddataset()
rankfun = getrankfunction(dataset)
tree = evolve(2,100,rankfun,mutationrate=0.2,breedingrate=0.1,exp_prob=0.7,new_prob=0.1)

Output

52714

51097

13335

13335

8344

5967

3232

3232

.

.

5

5

0

add

 add

  add

   p1

   p1

  add

   p1

   1

.....

You can see above when score is reduced to zero tree is finalized and printed. Let’s test our evolved expression against actual expression.

print("Result of x={}, y={} from actual expression is: {} "
      "and from evolved expression is {}".format(25, 19, expression(25, 19), tree.evaluate([25, 19])))
print("Result of x={}, y={} from actual expression is: {} "
      "and from evolved expression is {}".format(20, 38, expression(20, 38), tree.evaluate([20, 38])))
print("Result of x={}, y={} from actual expression is: {} "
      "and from evolved expression is {}".format(11, 35, expression(11, 35), tree.evaluate([11, 35])))
print("Result of x={}, y={} from actual expression is: {} "
      "and from evolved expression is {}".format(9, 34, expression(9, 34), tree.evaluate([9, 34])))

Output

Result of x=25, y=19 from actual expression is: 721 and from evolved expression is 721

Result of x=20, y=38 from actual expression is: 591 and from evolved expression is 591

Result of x=11, y=35 from actual expression is: 297 and from evolved expression is 297

Result of x=9, y=34 from actual expression is: 252 and from evolved expression is 252

Evolved expression’s output matches with actual expression which means the expression is accurate.

Conclusion

    We often see difference between actual expression and evolved expression because evolved expression is not simplified. Suppose actual expression is (x^2 - y^2) then evolved expression might be ((x-y)(x+y) + 5xy - 4xy - xy) which is actually same as actual. We might not end up having best expression sometimes where output is close but not exactly same, in that case increasing initial population and increase in number of generation increases chances of evolving the best expression. Executing evolve method multiple times might help too.

This was a small example of evolving expression but genetic programming can do more. We can even design games which can actually compete against each other or with a person.


Like 1 Person
Last modified about 27 days ago
Nikhil Joshi

Nikhil Joshi
Ceo & Founder at Dotnetlovers
Atricles: 133
Questions: 9
Given Best Solutions: 9 *

Reference:

Programming Collective Intelligence - by Toby Segaran and published by O'Reilly

Comments:

Nitin

Good article.

Thank you

Like 0 People about 6 days ago

You are not loggedin, please login or signup to add comments:

Existing User

Login via:

New User



x