Saint Louis University |
Computer Science A220/P126
|
Dept. of Math & Computer Science |
Please see the general programming webpage for details about the programming environment for this course, guidelines for programming style, and details on electronic submission of assignments.
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners. You will note that there are two distinct implementation approaches required for this assignment. It may be that the partnership divides the work by having one person do one implementation, and one person another (with consultation as needed). Alternatively both pieces could be developed side-by-side.
Please make sure you adhere to the policies on academic integrity in this regard.
Although each individual part of this assignment may not be so difficult, you should expect that the assignment as a whole will be the most involved programming assignment as of yet. Needless to say, please start early and tackle the program in stages.
This assignment will involve several programming techniques which we have not yet used in earlier assignments. Namely,
The implementation of a templated class. Throughout the semester, we have discussed the use and implementation of templated classes for developing generic data structures. However if you look carefully at past assignments, you have not yet implemented such a templated class. Some assignments (such as 'cards' and 'expression' have involved the use of templated classes); for other assignments we have avoided templates by making specialized versions of generic structures (such as a BoundedStack of strings, or linked lists with integer data).
The extra challenge in truly implementing templated classes is that typical compiler errors are far more cryptic. This is due to the fact that templated class code is not compiled on its own but rather compiled only when instantiated by explcit template parameters.
A more minor challenge in this assignment is that we are asking you to use the standard C++ vector class. Though we used this class marginally in the very first assignment, you are likely to need to use more of the supported methods for that class. It is important that you do not confuse the syntax for the standard C++ vector class with that of the Vector ADT introduced in our text. You may wish to look at documentation for C++ vectors.
You have just taken an internship with an online grocer (Please Note: I had originally written this assignment at a time when I thought online grocers might be viable -- who knew?). You will be responsible for packing an order of groceries into 'bins' which will then get loaded onto a truck for delivery. Of course, the goal is to be able to pack the order into as few bins as possible. At the same time, you do not want to pack bins so heavily that they cannot be easily carried. In order to try to find a good solution for packing items, we will have you write a computer program. We will abstract the problem with the following model:
Each item has a weight which is somewhere between 0 and 1.
Each item must be assigned to a particular bin.
Each bin has a maximum capacity of 1.
The goal is to use as few bins as possible.
This setup can actually be used in modeling a variety of different applications, and it has been given the name "Bin Packing" in the literature. Unfortunately, minimizing the number of bins appears to be a very difficult problem to solve efficiently (there are too many possibilities!). Fortunately a number of heuristics have been studied, and some of these seem to get quite good results.
In this program, we ask you to implement two particular heuristics. The first heuristic, called "worst-fit," proceeds as follows. Items are considered one-at-a-time (in the original order presented). When an item is being considered, we look at the existing bin with the most available room. If the new item fits in this bin, we place it there; if it does not fit in this bin then it will not fit in any bin, so we place the item into a newly created bin. For example, on an instance with weights .29, .11, .81 and .68, this algorithm would pack the items into three bins (note that this is not the best possible).
A second heuristic is known as "worst-fit-decreasing." This heuristics considers items one-at-a-time, again placing each considered item into the bin with the most remaining space (if such a bin will hold the item). The difference between this heuristic and the last is that items are considered in order of decreasing weight, rather than in the original order. To do this, all items are initially placed into a priority queue and at each step the item with largest remaining weight is removed from this queue and then placed into a bin. (note that for this to work, you will have to adjust your view of priorities so that the "minKey" is the one with the largest weight). Looking at our previous example, the items would get considered in the order .81, .68, .29 and .11, and thus packed into two bins. By no means are these the only two approaches to bin packing. If anyone is interested in the history of the problem, we will be happy to provide references. But for this assignment, we will focus on these two heuristics.
In order to test your program and evaluate the bin packing heuristics, we are providing you with a driver which does the following. Given an integer N and a floating-point value u in [0...1], the driver generates N weights each chosen at random in the range [0...u). These weights are stored in an array of double-precision floating point numbers. The driver calculates S, the sum of all of the weights, and reports that any solution will require at least S* bins, where S* is the smallest integer at least as great as S. For example, if the weights sum to 20.3, then we know that no solution can pack these items into 20 bins or less. (Please note: this does not mean that there always exists a solution using exactly S* bins - but it gives us a reasonable bound for which to aim when evaluating the heuristics' results.)
The driver accepts the following five command line arguments, as specified by the user. The first four of these are required; the fifth is optional.
N - a non-negative integer, specifying the number of items
u - a double in the range (0...1] specifying an upper limit on the range when choosing random weights.
0 or 1 - interpreted as a boolean specifying whether or not the program should give "verbose" output (explained later). 0 represents no verbose output, and 1 represents verbose.
0 or 1 - specifying which priority queue implementation to use (explained later). 0 means SlowPQ; 1 means FastPQ.
seed - an integer value used to seed the random number generator. If no argument is given, the seed is set based on the system clock.
For example, the execution of the program might be started as:
Binpack 20 0.2 0 1
At this point, the driver calls a routine worstFit which is written by you to simulate the worst-fit heuristic. This routine returns the overall number of bins used. Secondly, the driver calls a routine worstFitDecreasing written by you to simulate the worst-fit-decreasing heuristic.
Finally, although generating random data is useful for experiments, it is troublesome when developing and testing your program. The difficulty is that when a bug arises, you generally want to fix the bug and then re-test on the same data. But if data is generated at random, you will likely get a different data set when you re-test. For this reason, we will give you a way to set a "seed" which is used by the random number generator. In this way, you can re-test on a previous data set by providing the identical seed.
We are providing you with a general interface for a PriorityQueue class. This class is modeled in the spirit of the ADT described in our text, however we have taken some liberties in simpilifying that very general approach as appropriate for this assignment. In particular, rather than rely on a three-parameter template <Element, Key, Comparator>, we assume that keys are always of type double and compared with the standard operator. Thus our definition of a PriorityQueue for this assignment is as a single-parameter template, <typename Element>. You will find this level of generality sufficient for completing this assignment.
You must give two different priority queue implementations. Both should be based based on using a C++ vector for the low-level representation. The first implementation, titled SlowPQ, should be implemented by keeping the items unordered. This is quite similar to the discussion in Section 7.2.1, though again using a C++ vector rather than the text's Sequence.
The second implementation, titled FastPQ, will be based on a heap as specified in Section 7.3.2. Although we intuitively think of a heap as a binary tree, we will ask you to design your implementation based on the vector-based representation of a heap. Again, you will use a vector as the low-level storage, but rather than keep items unordered, you will partially order them based on the heap property. Section 7.3.3 of the text gives an implementation of a priority queue with a heap, based on using the BinaryTree interface. Though this is quite different syntactically than what you must do, you might choose to look at that implementation in understanding the algorithmic process.
To get your going, we will provide initial stubs for the class definitions and the various methods.
Your other major task is to correctly implement both of the bin-packing heurstics described earlier in this assignment. To get you going, we have provided you a stub for such a method, described in files Heuristic.h and Heuristic.cpp. As you start to think about the task you should realize that the concept of a priority queue will serve you in several ways. For both heuristics you process grocery items one-at-a-time. However the ordering differs between the two heuristics, as the standard worst-fit processes such items in the original order given whereas the worst-fit-decreasing heuristics processes items in decreasing order of their weight. Both of these heuristics can be implemented in similar fashion, using a priority queue to determine which item should next be processed.
But this is not the only use of priority queues. Both heuristics should also make good use of a priority queue to keep track of all currently opened bins, always identifying the particular bin with the most remaining capacity. As you might see, these two uses of a priority queue are quite different. They hold different types of elements and they are based on a possibly different criteria for priorities. Fortunately, the abstraction provided by the PriorityQueue interface allows us this flexibility.
Given the similarities between the two heuristics, you must implement them with a single, unified routine, declared with the following signature:
with five parameters interpretted as follows:int worstFit(double* w, int N, bool decreasing, bool useFastPQ, bool verbose);
w
an array of the original weights
N
the number of such weights
decreasing
If true then the procedure should perform
worst-fit-decreasing heuristics; otherwise it should
perform the standard worst-fit heuristic (processing items
in the original order given).
useFastPQ
If true then the procedure should rely upon the FastPQ
implementation for all priority queues; if false
then the SlowPQ.
verbose
If true then the procedure should generate output
(via cout), listing the contents and total weight
of each bin. If false, no such output should be generated.
The int return value should represent the overall
number of bins which were used in the solution. Your procedure
is responsible for accurately reporting this piece of
information, whether or not it is in verbose mode. The purpose
of verbose mode is to give additional information. For example,
with our original motiviation, it does a person no good to
simply say that the solution uses 59 bins - a program must give
some sort of mapping to explain which items should be grouped in each
such bin. So for this reason, we want your program to be able to
produce verbose output which gives such a mapping. Going back to the
earlier sample, possible verbose output for the worst-fit heuristic
might read (though you are free to format the information as you wish):
Bin 1: weight=0.40 [items: 0.29 0.11]
Bin 2: weight=0.81 [items: 0.81]
Bin 3: weight=0.68 [items: 0.68]
At the same time, we will want to perform some large experiments. Verbose output would be very distracting if packing 1000000 items and producing such output would skew the running times. Even more important, on large experiments memory will become scarce. The ideal student will realize that the structure used for representing a bin can be greatly simplified if one is not required to produce the verbose output. Streamlined code may allow significantly larger experiments to be performed.
We ask that you perform some experiments which will be reported in your 'readme' file. This will allow us to compare the quality of the solutions produced by the two heuristics as well as the efficiency of the two priority queue implementations.
You must submit a plain-text readme file with the following information:
Show the output of your program, when run using "verbose" mode, for [N=20, u=.2, seed=9].
Show the output of your program, when run using "verbose" mode, for [N=20, u=.7, seed=9].
Give a table of results for u=.2, and as large of values of N as your program can reasonably handle from N=100, 1000, 10000, 100000, 1000000. Your table should contain the total sum S of the weights, the number of bins used by worst-fit, and the number of bins used by worst-fit decreasing. Furthermore, you should report the running times for both the FastPQ and SlowPQ implementations, when applicable. (even better is if you report the average over several random trials for each value of N).
Give a second table, as above, for the value u=.7.
You have a choice in how to gather this information. One way is to sit at the computer for a while and gather piece after piece of information using our driver. A wise student might feel this is a mindless use of time and surely can be automated by a new or modified driver. For your benefit, we will tell you that our Binpack.cpp driver relies internally on the method runTrial() with parameters (int N, double u, boolean useFast, bool verbose, long seed), with the seed being optional. You could write a loop to call this method repeatedly with various parameter values.
One of the challenges of this program is simply the many required tasks. Here is our suggested approach for the development cycle:
Implement SlowPQ -- the concept could not be simpler. Writing this implementation should be a relatively easy task as a warmup, but already it will help you get used to some of the subtleties of the Priority Queue interface. For example, you will likely need to use the "composition" pattern as described in Section 7.1.4 with the Item class.
Go to the worstFit procedure and add the original items into a priority queue, setting the keys in a way so that the weights can be processed in the original order for the worst-fit heuristic, yet in decreasing order for the worst-fit-decreasing heuristic.
For the moment, instead of truly processing each weight in the bin-packing context, just print out the value. This should already provide you with a way to test whether your SlowPQ (and later your FastPQ) implementation appears to be working.
Implement FastPQ -- This will take a bit more care in development than did SlowPQ. Fortunately, you should have code from the previous step which can be used to help test your implementation.
Go back and complete the task of implementing the worst-fit and worst-fit-decreasing heuristics. Before writing any code, you will want to think a bit about the logic. Decisions will need to be made as to how a "bin" will be represented, how bins are compared to each other, and how to make further use of the PriorityQueue abstraction. Generating verbose output will allow you to examine the behavior of your code on some very small examples which you might be able to compare to a hand simulation.
When you are confident that all of your code is written and working, start running the experiments.
The files you need for this assignment can be downloaded here.
At minimum, you must submit the files: readme, Heuristic.cpp, SlowPQ.h, SlowPQ.tcc, FastPQ.h and FastPQ.tcc. However, if you find need to create any additional files, please make sure to submit those as well, along with an updated makefile, if necessary.
The assignment is worth 10 points. In general, we will try to apportion the points based on the following break down (though we will use some of our own judgment in the end):