Course Home | Assignments | Schedule | Submit

Computer Science 462
Artificial Intelligence

Assignment 04

Local Search Algorithms

Collaboration Policy

For this assignment, you must work individually in regard to the design and implementation of your project.

Please make sure you adhere to the policies on academic integrity in this regard.

Overview

Topic: Local Search
Related Reading: Ch. 4
Due: Wednesday, 3 March 2010, 11:59pm

We will perform experiments in trying to tune our hill-climbing and genetic algorithm implementations for solving N-queens and TSP instances.

Background

All software for these experiments can be found at turing:/Public/goldwasser/462/localsearch/current. We will examine two different algorithmic approaches, and three different problem formations.

The Problem Domains

queens
This is the N-queen problem, as modeled in our textbook.

Candidate Solution
described by placing one queen in each column of the board.

Local Neighbor (mutation)
derived by moving any single queen to a new row within its column

Crossover of parents
for randomly chosen k, we take the k leftmost columns of the first parent and the (n-k) rightmost columns of the second parent.

Valuation
goal is to minimize the number of pairs of conflicting queens (or conversely, to maximize the number of pairs of non-conflicting queens)

Size
The software supports any board size with command line option such as -n 20. Default is 8. Our experiments will focus exclusively on sizes 8 and 20.

queensPerm
This is an alternate model for the N-queens problem based on permutations.

Candidate Solution
We avoid row collisions by ensuring that each row is used once. The state is therefore a permutation of the row numbers, as used from left to right.

Local Neighbor (mutation)
We support three distinct types of operations:

"invert"
Do a left-to-right reversal of a group of two or more consecutive columns
"move"
Delete a given column and re-insert elsewhere

"swap"
Swap two columns (not necessarily neighboring)

Also, you can select "all", in which case it will report the union of the three sets.

Crossover of parents
We support three distinct types of operations:

"ordered"
Take first k columns of one parent, and then use remaining row configuration as found left-to-right from other parent

"pmx"
Partially Matched Crossover, as described in class

"edge"
Edge recombination, as described in class

Valuation
goal is to minimize the number of pairs of conflicting queens (or conversely, to maximize the number of pairs of non-conflicting queens)

Size
The software supports any board size with command line option such as -n 20. Default is 8. Our experiments will focus exclusively on sizes 8 and 20.

tsp
Traveling Salesman Problem (two-dimensional Euclidean)

Candidate Solution

A canonical permutation of the sites.

Local Neighbor (mutation)

We support three distinct types of operations:

"move": Remove a city from one part of the tour and re-insert elsewhere.
"swap": Swap the placement of two cities in the permutation
"invert": Take a consecutive portion of the tour and invert it in the permutation (equivalently, we are replacing two arbitrary edges AB and CD with AD and BC.

Crossover of parents

We support the same three operations as for the queensPerm:

"ordered"
"pmx"
"edge"

Valuation

goal is to minimize the distance of the tour

Size

The software allows you to pick the number of cities using a command line option such as -n 20. By default, those cites are picked at random on a 1000-by-1000 integer grid. However, the random seed for selecting cities can be control with a -c seed option (independently of the -s seed that controls the random seed for the algorithms.

In order to examine a variety of layouts with known optimal solutions we will focus on cases where we set the city seed to the same as the number of cities. Benchmarks we will use are n=30, n=60, n=100. Here are those cases and others:

params Optimal (rounded) NEOS optimal our best likeness

-n 20 -c 20 3842.44
Image by Dorothy Joseph

-n 30 -c 30 4810.00

-n 40 -c 40 5108.72

-n 50 -c 50 6085.61

-n 60 -c 60 6301.80

-n 70 -c 70 ~6667

-n 80 -c 80 ~7298

-n 90 -c 90 ~7664

-n 100 -c 100 ~7730

Algorithmic Choices

Hill Climbing
The issues to consider when tuning the performance:
- -Y SIDEWAYS [default: 0]
  By default, sideways moves are not allowed. For the TSP problem, this is fine since it is almost surely the case that there will not be two distinct tours with the same value. But for the N-queens formulations, there are often many neighbors with equal values. Allowing sideways moves improves the chance of a single climb finding a goal, but it takes more time (that might otherwise be spent on random restarts).
- -S SELECTION [default: steepest]
  The default is to consider all possible neighbors, compute the value of those neighbors, and to proceed to the neighbor with the highest value (choosing randomly in case of a tie). The downside of this approach is that it takes time to do those computations. An alternate selection process (denoted "first" in our software) is to use the first neighbor that has a value strictly greater than the current state, if any (otherwise, we can defer to sideways moves, if allowed).
- -B BEAMSIZE [default: 1]
  The default "beam size" of 1 designates the classic hill-climb. We have a single state, and consider its neighbors. More generally, we can store b states, and consider the best b of their neighbors.
  Increasing the beam size can increase the chance of success, but at the expense of more computation.
  Please note that for beam size > 1, it does not make sense to consider "first" selections, since those will presumably be based on the first entry of the beam. Also, the notion of a "sideways" move during beam search is unclear.
- -N RULE [default: all]
  Remember that for the queensPerm and tsp problems, we can control the rule used for generating neighbors of a state. More choices will presumably increase our opportunities to find better states, but at the expense of more computation.

Genetic Algorithm
- -P POPSIZE [default: 20]
  The default population size is 20.
- -C CARRY [default: 1]
  The carryover allows for some organisms from one generation to survive to the next generation unchanged. Our implementation deterministically chooses the most fit entries to be carried over. By default, only one is carried over (to preserve the overall best solution thus far).
- -S SELECTION (tournament, roulette) [default: tournament]
  To fill a new generation, we must choose parents to reproduce. By default, we use a "tournament" strategy by uniformly selecting five entries from the current generation, and taking the most-fit two of those five to reproduce. We repeat this selection (with replacement) to generate each child.
  The other selection rule we have is the roulette wheel selection, in which we randomly choose two parents directly, but with probabilities that are proportional to the fitness values.
- -X RULE [default: all]
  The queensPerm and tsp problems, support three options for the crossover operator (order, pmx, edge).
- -M PROB [default: 0.15]
  After each child has been created, there is a small probability of that child being subsequently mutated. This parameter sets the chance of a mutation taking place.
- -N RULE [default: all]
  When a mutation occurs, the result is based on choosing a random neighbor of the original child. For the queens formulation, the rule is that described in the book. For the queensPerm and tsp problems, we can choose which types of modifications are considered.
- -G GENERATIONS [default: 100]
  This controls how many generations we create, stemming from an initial population. Limiting the number of generations might preserve computation time that could be otherwise spent restarting with a new initial population.

Required Experiments

You are to perform a variety of experiments related to the performance of these algorithms under various problem settings and time limits, and to analyze your results. As a final report, we ask that your give a clear discussion of the experiments and your conclusions for each. Please keep a log of all relevant experimental data, and include that in a subsequent appendix. Please refer to the various experiments using our suggested labels.

Since our algorithms rely on randomization, please run enough independent trials to provide some confidence in your conclusions. Also, since we relying on cpu limits, if you are going to be comparing results across related experiments, please try to gather your data on the same machine under reasonably stable system conditions and do not use any of the graphical visualizations. If you stop and come back to the assignment at a later time, please retry an earlier experiment to check the calibration.

It is possible to write a script to invoke the experiments (demo.py). Or you may perform them manually (so that you can examine results before doing the next experiment). Please be aware that our software accepts a commandline argument -t ping which will give brief trace data letting you see how your experiment is progressing (as opposed to -t all which is verbose). Unfortunately, even for ping, the stdout will have a slight affect on the CPU usage for timed experiments. Finally, if you are going to perform your experiments manually (and wait for results), you are free to use the main turing server or the lab machines. But if you choose to run large experiments that will be running unmonitored, please do not use turing; make sure to run such processes on one of the lab machines (e.g., linuxlab5).

Experiment: 8-queens Hill-Climb with Random Restarts
When discussing hill-climbing in the text, the authors note that for the standard queens formulation with n=8, a steepest-ascent climb on a random initial state has a 14% chance of success without sideways moves, and with an average of 4 steps per success and 3 per failure. This can be verified experimentally, with a run
python hillclimb.py queens -T 10000
This probability can be boosted by performing repeated climbs with randomly selected initial states, at the expense of increased time.
- As a baseline, please report on the success rate, steps per trial, and CPU time per trial on the above 10000 trial experiment.
- If we allow arbitrary number of restarts, but with a cap on the cpu time per trial, what time limit is necessary to boost the success rate to 50%?
  That is, what limit is needed for the following:
  python hillclimb.py queens -T 10000 -R 1000 -U ???
  Note that we've set R to 1000, even though our time limit will never allow so many restarts. We suggest 10000 trials in order to make sure we have a good confidence in the accuracy of our results. However, this will take time. When trying to calibrate, you may get an estimate with a much smaller number of trials. But as you narrow in on the appropriate time limit, please do a final check with more trials to verify that you have a success rate of approximately 50%, as we will use this time limit for several other experiments.
  Please make note of this time limit. We will use it for many of the remaining experiments.

Experiment: 8-queens Hill-Climb with Sideways Moves

The textbook notes that we can also increase success rate by allowing sideways moves. For example, if 100 consecutive sideways moves are allowed, the chance of success goes up to 94%, but the average number of steps goes up to 21 for a success and 64 for a failure. This too can be verified experimentally, with a run
python hillclimb.py queens -T 1000 -Y 100
For this experiment, consider enforcing the same time limit per trial as determined in Experiment I. To be fair, we should again allow the algorithm arbitrary restarts until the time limit is reached (although on some runs with many sideways moves, it may not even finish the first climb).
- For the given time limit, what is the success rate with sideways set to 100? Looking at the profile results, how many moves were completed on average within that time limit and were restarts advantageous?
- More generally, is there a choice of sideways limit that maximizes the success rate for the given time limit? What rate can you achieve and what sideways limit produces that success?

Experiment: 8-queens Hill-Climb Neighbor Selection
Recall that we have a choice of taking the steepest-ascent or the first-choice hill climbing. The steepest ascent would tend to get us closer to the goal with fewer steps, but it takes more time per step to evaluate the neighbors. Whether that is a net improvement or not on running time is unclear. It is also not clear whether to expect any change in the chance of getting to a non-global optima.
- Repeat the first experiment but with the first-choice selection method. When restricted to a single climb per trial (without a time limit), how does the performance of first-choice compare to steepest-ascent in terms of success rate, steps per climb, and time per climb.
  python hillclimb.py queens -T 10000 -S first
- In Experiment I, we determined a time limit for which steepest ascent achieves 50% success rate with restarts. Using that same time limit, what is the success rate for the first-choice selection?
- If sideways moves are allowed with first-choice selection, they will only be applied in the case when all neighbors were examined without finding any uphill steps. Continuing with the goal of improving the success rate subject to the established time limit, do sideways moves help? If so, what success rate was achieved and what sideways limit was used?

Experiment: 8-queens Beam Search
Turning our attention to beam search, please note that we assume that we go back to using the default steepest-ascent selection and we do not consider "sideways" moves. Increasing the beam size for a single climb should increase the likelihood of success, but at the expense of more computation.
- We begin by examining isolated climbs, starting with the following experiment with beam size of 2.
  python hillclimb.py queens -T 1000 -B 2
  How did the success rate, steps per climb, and CPU time per climb compare to the traditional hill-climbing experiment?
- What is the success rate for a beam size of 2 if we allow arbitrary restarts subject to the same time limit per trial we have been using for earlier experiments? How does this success rate compare to our various hill-climbing results from earlier experiments?
- Continuing with the same time limit, can you get a better success rate with a beam size greater than 2? If so, what is the best success rate you can get and what beam size achieves it?

Experiment: 8-queens Permutation based
We wish to look at the permutation-based model of N-queens (queensPerm rather than queens). In this case, there are three different ways we can define neighboring relationships. We presume that first-choice will outperform steepest-ascent, given the continued large number of neighbors in this model
Let's check out the success rate for single climbs using the following.
- Analyze the results of the experiment
  python hillclimb.py queensPerm -S first -N invert -T 10000
  paying particular attention to the success rate, the number of steps, and the CPU time per trial.
- Analyze the results of the experiment
  python hillclimb.py queensPerm -S first -N move -T 10000
  paying particular attention to the average quality of a solution, the number of steps, and the CPU time per trial.
- Analyze the results of the experiment
  python hillclimb.py queensPerm -S first -N swap -T 10000
  paying particular attention to the average quality of a solution, the number of steps, and the CPU time per trial.
- Analyze the results of the experiment
  python hillclimb.py queensPerm -S first -N all -T 10000
  paying particular attention to the average quality of a solution, the number of steps, and the CPU time per trial.
- What conclusions can you draw about the three individual neighbor definitions in the context of the 8-queens permutation model?

Experiment: 8-queens Permutation based boosting
Our next goal is to compare the overall results we get with the permutation-based model for N-queens versus the original model from the book, as reported for Experiment III.
- How do the single climb results from the previous experiment compare to the single-climb results using first-move selection on the tradition queens problem model from Experiment III.
- Consider boosting the success rate by allowing repeated climbs within the same time limit we have used throughout our experiments. What success rate do you get if you use the permutation model using the best of the three individual move types from the previous experiment? What if you use the "all" setting for the neighborhood?
- How does the allowance of sideways moves affect the success rate?

Experiment: 40-queens Hill-climbing
We want to repeat several of the previous experiments, but this time on a 40-by-40 board size. This is considerably more difficult to solve given similar time constraints. This also makes it more difficult to perform as many independent trials, but we will do our best to gather data. We will start with 100 trials just to see if we can get the flavor.
As a baseline, we want to examine the success rates for single climbs. Our experiences with the 8-queens board should give us intuition for the most promising model and settings, but things do not always scale as expected. That said, we will admit that we are going to stick with the -S first selection method, given that the number of possible neighbors that are otherwise evaluates is quadratic in the board size.
- Please report on the success rate, steps per trial, and CPU time per trial on the following experiment with the original model.
  python hillclimb.py queens -n 40 -S first -T 100
- Please report on the success rate, steps per trial, and CPU time per trial on the following experiment with the permutation model.
  python hillclimb.py queensPerm -n 40 -S first -T 100 -N invert
- Please report on the success rate, steps per trial, and CPU time per trial on the following experiment with the permutation model.
  python hillclimb.py queensPerm -n 40 -S first -T 100 -N move
- Please report on the success rate, steps per trial, and CPU time per trial on the following experiment with the permutation model.
  python hillclimb.py queensPerm -n 40 -S first -T 100 -N swap
- Please report on the success rate, steps per trial, and CPU time per trial on the following experiment with the permutation model.
  python hillclimb.py queensPerm -n 40 -S first -T 100 -N all
- You should have found that one of the above settings clearly outperforms the others in terms of the success rate and use of time. How does this "winner" compare to what we found for the 8-queens board? Discuss your conclusions.
- For your own amusement, you might be interested in how big of a board can be solved with the winning technique. It is quite impressive! The success rate remains high on much bigger boards (because there are so many solution to be found), although the per trial time increases with the board size.
  With our software, if you just want to find a single example, you can say to do many trials, but with the -Q flag, that says that it should quit the entire process once a goal is found.

Experiment: Hill-climbing Traveling Salesman (30 cities)
Unlike the N-queens problem, in which there was a clearly recognizable goal state, our software does not know what the best cost tour is for an arbitrary TSP instance. So rather than evaluating our algorithms based on a success rate, we will consider the best quality solution found in each trial, and the average the quality of those solutions over the set of trials. As reference data, we have the true optimal tour costs for several benchmark examples. For example the 30-city data set generated with parameters (-n 30 -c 30) has an optimal tour of length 4810.00, as shown earlier on this page.
We begin by examining the performance of a single random climb. For this permutation problem, we have the choose of allowing "neighbors" to be defined based on relocating a single city in our order, swapping any pair of cities, or inverted any continuous segment of our tour. Given the large number of possible neighbors, we will restricted our focus entirely to the first-choice selection method.
- Analyze the results of the experiment
  python hillclimb.py tsp -n 30 -c 30 -S first -N invert -T 100
  paying particular attention to the average quality of a solution, the number of steps, and the CPU time per trial.
- Analyze the results of the experiment
  python hillclimb.py tsp -n 30 -c 30 -S first -N move -T 100
  paying particular attention to the average quality of a solution, the number of steps, and the CPU time per trial.
- Analyze the results of the experiment
  python hillclimb.py tsp -n 30 -c 30 -S first -N swap -T 100
  paying particular attention to the average quality of a solution, the number of steps, and the CPU time per trial.
- Analyze the results of the experiment
  python hillclimb.py tsp -n 30 -c 30 -S first -N all -T 100
  paying particular attention to the average quality of a solution, the number of steps, and the CPU time per trial.

Experiment: Boosting TSP Hill-climbing (30 cities)
Take the average time per climb, as reported when using all for the neighbor selection. For this experiment, let's try to boost the results by allowing it arbitrary restarts with a time limit that is three times the average from the previous experiment. Each trial will report a solution that is the best it found during the small number of climbs it completed.
- Analyze the results of the experiment
  python hillclimb.py tsp -n 30 -c 30 -S first -N all -T 100 -R 1000 -U limit
  based on the time limit as determined above. Report on the average quality of a solution, average number of restarts, and the CPU time per trial.

Experiment: Hill-climbing Traveling Salesman (60 cities)
Our final experiment with hill-climbing will be to tackle a larger TSP instance, namely the (-n 60 -c 60) example. Using an external solver, we know that this instance has an optimal solution of length 6301.80.
Unfortunately, even a single hill climb in this model is expensive.
- Analyze the results of the experiment
  python hillclimb.py tsp -n 60 -c 60 -S first -N all -T 20
  Report on the CPU time per trial, number of steps per trial, average quality of solution per trial, and global best solution across all trials.
- As a benchmark for future experiments, let's set a time limit that is three times the average time per climb, and then repeat the experiment with boosting, allowing arbitrary number of restarts within that time limit. Perform as many trials as you are willing, and report the number of trials, and average solution cost per trial (as well as the best overall solution, just to see if you find the optimal one).

Experiment: Intro to Genetic Algorithms (8-queens)
Although we have already seen that hill-climbing can be quite effective for the N-queens problem, we will use it as a base for genetric algorithm experiments. We begin with the textbook's model for the problem and we restrict ourself to the benchmark time limit originally defined in Experiment I.
By default, parameter settings are -P 20 -C 1 -M .15, but those choices are somewhat arbitrary. Let's experiment with tuning them as follows
- We suspect that the C value should be set somehow proportional to the P value. Let's begin by assuming that C = P/2 and looking at P values that range anywhere from 6 to 20. Using the -U limit set as in Experiment I, try the following
  python genetic.py queens -U limit -T 1000 -P ? -C ?
  Please note the success rates for your experiments, as well as the average number of generation that were completed within the time limit.
  - What do you notice about the number of generations?
  - Which P and C combination provides the strongest success rate? What is that rate?
- Using the result from the previous set, let's leave P fixed, but allow C to vary anywhere from 1 to P-1. Again, take note of the success rate and the average number of generations completed.
  - What do you notice about the number of generations?
  - Which P and C combination provides the strongest success rate? What is that rate?
- Using the best C and P combination, let's experiment with choices of M in (.1, .2, .3, .4, .5), again with a goal of increasing the success rate. Which M value provides the strongest success rate? What is that rate?

Experiment: Genetic Algorithms (8-queens Permutation)
Let's try to tune the permutation-based version of the problem. The most significant new decision is which crossover rule to use (order, pmx, edge). So we begin by leaving all default values, except varying the -X setting. We use the same time limit as before and examine the success rate and number of generations.
- Discuss the results of python genetic.py queensPerm -T 1000 -U limit -X order
- Discuss the results of python genetic.py queensPerm -T 1000 -U limit -X pmx
- Discuss the results of python genetic.py queensPerm -T 1000 -U limit -X edge
- Picking the better of the crossover methods, let's again try to tune the other parameters C,then P, and then M (as in Experiment XI). What settings produce the best success rate? What rate do you get?

Experiment: Genetic Algorithms (30-city TSP)
Let's examine the use of genetic algorithms for TSP. When running our software for experiments, the key data point we are interested in is the "Final Value (avg)". To make comparisons to our hill-climbing experiments, let's use the same time constraints that we established in the final part of Experiment IX (that was three times the average from the earlier part of that experiment). Because we have more time to work with, we can utilize many generations. The default setting for generations may become limiting. We have added a feature to the software that if you set -G 0 then there is no limit on the generations (presumably, you should only use this setting with an explicit time limit).
We are going to examine the three crossover rules separately, trying to fine tune each as follows. For each rule, we suggest that you again start by assuming that C = P/2 and trying out a variety of P values to see which gets the better results. (we suggest a modest 20 to 30 trials each, until you decide you have found a promising range).
Then leave P fixed and try to vary C downward or upward for better results. Finally, experiment with variations for the mutation rate M.
- What was the best average value you were able to achieve with -X order? What P, C and M values did you use?
- What was the best average value you were able to achieve with -X pmx? What P, C and M values did you use?
- What was the best average value you were able to achieve with -X edge? What P, C and M values did you use?
- How did these compare to the results achieved with hill-climbing?
For what it's worth, I find that a well-tuned genetic algorithm should perform at least as well as hill climing on this example (if not better).

Experiment: Genetic Algorithms (60-city TSP)
Finally, repeat the previous experiment for tuning GA, but this time using the 60-city example (and the time limit established for the 60-city hill-climbing from Experiment X)
For what it's worth, I find that a well-tuned genetic algorithm should clearly outperform our well-tuned hill climbing for this scenario.

Submitting Your Assignment

You should submit a writeup of your results, using a relatively standard file format (e.g., txt, html, pdf, rtf, odt, doc, docx).

Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.

Grading Standards

The assignment is worth 50 points.

Michael Goldwasser

CSCI 462, Spring 2010
Last modified: Thursday, 05 September 2019

Course Home | Assignments | Schedule | Submit

Saint Louis University

Computer Science 462
Artificial Intelligence

Michael Goldwasser

Spring 2010

Dept. of Math & Computer Science

Assignment 04

Local Search Algorithms

Contents:

Collaboration Policy

Overview

Background

The Problem Domains

Algorithmic Choices

Required Experiments

Submitting Your Assignment

Grading Standards

params	Optimal (rounded)	likeness
-n 20 -c 20	3842.44	Image by Dorothy Joseph
-n 30 -c 30	4810.00
-n 40 -c 40	5108.72
-n 50 -c 50	6085.61
-n 60 -c 60	6301.80
-n 70 -c 70	~6667
-n 80 -c 80	~7298
-n 90 -c 90	~7664
-n 100 -c 100	~7730

Computer Science 462 Artificial Intelligence

Spring 2010

Assignment 04

Local Search Algorithms

Contents:

Overview

Computer Science 462
Artificial Intelligence