Saint Louis University |
Computer Science 462
|
Dept. of Math & Computer Science |
For this assignment, you must work individually in regard to the design and implementation of your project.
Please make sure you adhere to the policies on academic integrity in this regard.
Topic: Local Search
Related Reading: Ch. 4
Due:
Wednesday, 3 March 2010, 11:59pm
We will perform experiments in trying to tune our hill-climbing and genetic algorithm implementations for solving N-queens and TSP instances.
In order to examine a variety of layouts with known optimal solutions we will focus on cases where we set the city seed to the same as the number of cities. Benchmarks we will use are n=30, n=60, n=100. Here are those cases and others:
params | Optimal (rounded) | NEOS optimal | our best | likeness |
---|---|---|---|---|
-n 20 -c 20 | 3842.44 | Image by Dorothy Joseph |
||
-n 30 -c 30 | 4810.00 | |||
-n 40 -c 40 | 5108.72 | |||
-n 50 -c 50 | 6085.61 | |||
-n 60 -c 60 | 6301.80 | |||
-n 70 -c 70 | ~6667 | |||
-n 80 -c 80 | ~7298 | |||
-n 90 -c 90 | ~7664 | |||
-n 100 -c 100 | ~7730 |
Increasing the beam size can increase the chance of success, but at the expense of more computation.
Please note that for beam size > 1, it does not
make sense to consider "first" selections, since those
will presumably be based on the first entry of the beam.
Also, the notion of a "sideways" move during beam search
is unclear.
The other selection rule we have is the roulette wheel selection, in which we randomly choose two parents directly, but with probabilities that are proportional to the fitness values.
You are to perform a variety of experiments related to the performance of these algorithms under various problem settings and time limits, and to analyze your results. As a final report, we ask that your give a clear discussion of the experiments and your conclusions for each. Please keep a log of all relevant experimental data, and include that in a subsequent appendix. Please refer to the various experiments using our suggested labels.
Since our algorithms rely on randomization, please run enough independent trials to provide some confidence in your conclusions. Also, since we relying on cpu limits, if you are going to be comparing results across related experiments, please try to gather your data on the same machine under reasonably stable system conditions and do not use any of the graphical visualizations. If you stop and come back to the assignment at a later time, please retry an earlier experiment to check the calibration.
It is possible to write a script to invoke the experiments (demo.py). Or you may perform them manually (so that you can examine results before doing the next experiment). Please be aware that our software accepts a commandline argument -t ping which will give brief trace data letting you see how your experiment is progressing (as opposed to -t all which is verbose). Unfortunately, even for ping, the stdout will have a slight affect on the CPU usage for timed experiments. Finally, if you are going to perform your experiments manually (and wait for results), you are free to use the main turing server or the lab machines. But if you choose to run large experiments that will be running unmonitored, please do not use turing; make sure to run such processes on one of the lab machines (e.g., linuxlab5).
When discussing hill-climbing in the text, the authors note that
for the standard queens formulation with n=8, a
steepest-ascent climb on a random initial state has a 14% chance
of success without sideways moves, and with an average of 4
steps per success and 3 per failure.
This can be verified experimentally, with a run
python hillclimb.py queens -T 10000
This probability can be boosted by performing repeated climbs with randomly selected initial states, at the expense of increased time.
Please make note of this time limit. We will use it for many of the remaining experiments.
The textbook notes that we can also increase success rate by
allowing sideways moves. For example, if 100 consecutive
sideways moves are allowed, the chance of success goes up to
94%, but the average number of steps goes up to 21 for a success
and 64 for a failure. This too can be verified experimentally,
with a run
python hillclimb.py queens -T 1000 -Y 100
For this experiment, consider enforcing the same time limit per trial as determined in Experiment I. To be fair, we should again allow the algorithm arbitrary restarts until the time limit is reached (although on some runs with many sideways moves, it may not even finish the first climb).
Recall that we have a choice of taking the steepest-ascent or the first-choice hill climbing. The steepest ascent would tend to get us closer to the goal with fewer steps, but it takes more time per step to evaluate the neighbors. Whether that is a net improvement or not on running time is unclear. It is also not clear whether to expect any change in the chance of getting to a non-global optima.
Turning our attention to beam search, please note that we assume that we go back to using the default steepest-ascent selection and we do not consider "sideways" moves. Increasing the beam size for a single climb should increase the likelihood of success, but at the expense of more computation.
How did the success rate, steps per climb, and CPU time per climb compare to the traditional hill-climbing experiment?
We wish to look at the permutation-based model of N-queens (queensPerm rather than queens). In this case, there are three different ways we can define neighboring relationships. We presume that first-choice will outperform steepest-ascent, given the continued large number of neighbors in this model
Let's check out the success rate for single climbs using the following.
Our next goal is to compare the overall results we get with the permutation-based model for N-queens versus the original model from the book, as reported for Experiment III.
We want to repeat several of the previous experiments, but this time on a 40-by-40 board size. This is considerably more difficult to solve given similar time constraints. This also makes it more difficult to perform as many independent trials, but we will do our best to gather data. We will start with 100 trials just to see if we can get the flavor.
As a baseline, we want to examine the success rates for single climbs. Our experiences with the 8-queens board should give us intuition for the most promising model and settings, but things do not always scale as expected. That said, we will admit that we are going to stick with the -S first selection method, given that the number of possible neighbors that are otherwise evaluates is quadratic in the board size.
With our software, if you just want to find a single example, you can say to do many trials, but with the -Q flag, that says that it should quit the entire process once a goal is found.
Unlike the N-queens problem, in which there was a clearly recognizable goal state, our software does not know what the best cost tour is for an arbitrary TSP instance. So rather than evaluating our algorithms based on a success rate, we will consider the best quality solution found in each trial, and the average the quality of those solutions over the set of trials. As reference data, we have the true optimal tour costs for several benchmark examples. For example the 30-city data set generated with parameters (-n 30 -c 30) has an optimal tour of length 4810.00, as shown earlier on this page.
We begin by examining the performance of a single random climb. For this permutation problem, we have the choose of allowing "neighbors" to be defined based on relocating a single city in our order, swapping any pair of cities, or inverted any continuous segment of our tour. Given the large number of possible neighbors, we will restricted our focus entirely to the first-choice selection method.
Take the average time per climb, as reported when using all for the neighbor selection. For this experiment, let's try to boost the results by allowing it arbitrary restarts with a time limit that is three times the average from the previous experiment. Each trial will report a solution that is the best it found during the small number of climbs it completed.
Our final experiment with hill-climbing will be to tackle a larger TSP instance, namely the (-n 60 -c 60) example. Using an external solver, we know that this instance has an optimal solution of length 6301.80.
Unfortunately, even a single hill climb in this model is expensive.
Although we have already seen that hill-climbing can be quite effective for the N-queens problem, we will use it as a base for genetric algorithm experiments. We begin with the textbook's model for the problem and we restrict ourself to the benchmark time limit originally defined in Experiment I.
By default, parameter settings are
Let's try to tune the permutation-based version of the problem. The most significant new decision is which crossover rule to use (order, pmx, edge). So we begin by leaving all default values, except varying the -X setting. We use the same time limit as before and examine the success rate and number of generations.
Let's examine the use of genetic algorithms for TSP. When running our software for experiments, the key data point we are interested in is the "Final Value (avg)". To make comparisons to our hill-climbing experiments, let's use the same time constraints that we established in the final part of Experiment IX (that was three times the average from the earlier part of that experiment). Because we have more time to work with, we can utilize many generations. The default setting for generations may become limiting. We have added a feature to the software that if you set -G 0 then there is no limit on the generations (presumably, you should only use this setting with an explicit time limit).
We are going to examine the three crossover rules separately, trying to fine tune each as follows. For each rule, we suggest that you again start by assuming that C = P/2 and trying out a variety of P values to see which gets the better results. (we suggest a modest 20 to 30 trials each, until you decide you have found a promising range).
Then leave P fixed and try to vary C downward or upward for better results. Finally, experiment with variations for the mutation rate M.
For what it's worth, I find that a well-tuned genetic algorithm should perform at least as well as hill climing on this example (if not better).
Finally, repeat the previous experiment for tuning GA, but this time using the 60-city example (and the time limit established for the 60-city hill-climbing from Experiment X)
For what it's worth, I find that a well-tuned genetic algorithm should clearly outperform our well-tuned hill climbing for this scenario.
You should submit a writeup of your results, using a relatively standard file format (e.g., txt, html, pdf, rtf, odt, doc, docx).
Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.
The assignment is worth 50 points.