The following is a checklist which provides a summary of the assignment. This is meant only as a supplement; please read the original assignment description.
[Michael 4/3] At one point in the original assignment, it states "Your
driver program should not store the whole text string. Indeed, you
only need to keep track of the most recent k characters in the
string."
however...In one of my precepts I said that we will not be
strictly checking this requirement, and you will notice that it has never
appeared as a requirement anywhere in the checklist It is unclear
in context whether this comment was talking about the warmup method or
about your final sumbitted program (it appeared in the paragraph which
started, "As a warmup...")
At the same time, I said that the reason for the comment is
indeed that there is really no need to explicitly store the original
string for most people's programs. For example, in the direct symbol
table program, the flow of control is to walk through the original
string and call Insert for each string you pass. In this
case, there is never any other time you will need the original string,
and so in this case, rather than take the space to store the whole
thing, you can just do the insertions while you read in the input.
I think this is true of most of the more interesting methods too,
however if you feel there is good reason to be storing the input
string for your program, you are certainly free to do so. (As we said,
just don't throw away space, and explain your method in the
readme).
[Michael 3/30] Some performance bounds for my own implementation have been added to the bottom of this file.
[Michael 3/30] The original 'baby.txt' had some extended ascii values (accented versions of vowels with values>256), but these have now been replaced. At this point, all characters in the input files are standard ascii and thus have equivalent values between 0 and 127 inclusive.
[Michael 3/30] For this assignment, it is important to be able to
generate different random trials, by changing the seed used
for the random number generator.
For my program, I use random(), and thus have the following
two lines at the top:
#include <sys/types.h>And the following two lines within my main routine:
#include <sys/timeb.h>
long seed=time(NULL);
srandom(seed);
Inputfile | Source | N | |||
princeton.txt | A Packet article about Princeotn | 7959 | |||
aesopshort.txt | collection of Aesop's fables | 10280 | |||
moby1.txt | Moby Dick - Chapter 1 | 12218 | |||
amendments.txt | Constitutional Amendments | 18369 | |||
y2kintro.txt | Introduction of the recent Senate report on Y2K | 21224 | |||
baby.txt | How baby's learn language | 22200 | |||
manifesto.txt | Communist Party Manifesto | 72955 | |||
muchado.txt | Much Ado about Nothing | 123413 | |||
aesop.txt | collection of Aesop's fables | 191945 | |||
starr.txt | The Starr Report narrative | 234378 | (warning: explicit language) | ||
lilwomen.txt | Little Women | 1042048 | |||
mobydick.txt | Moby Dick | 1191463 |
Inputfile | k=3 | k=7 | k=12 | k=20 |
princeton.txt | 0.02 | 0.10 | 0.17 | 0.27 |
aesopshort.txt | 0.05 | 0.15 | 0.22 | 0.34 |
moby1.txt | 0.04 | 0.16 | 0.26 | 0.42 |
amendments.txt | 0.05 | 0.18 | 0.29 | 0.50 |
y2kintro.txt | 0.08 | 0.24 | 0.42 | 0.70 |
baby.txt | 0.08 | 0.25 | 0.39 | 0.70 |
manifesto.txt | 0.17 | 0.67 | 1.20 | 2.14 |
muchado.txt | 0.28 | 1.18 | 2.20 | 3.70 |
aesop.txt | 0.38 | 1.70 | 3.40 | 5.89 |
starr.txt | 0.41 | 1.85 | 3.60 | 6.60 |
lilwomen.txt | 1.86 | 9.45 | 18.75 | 31.03 |
mobydick.txt | 2.38 | 11.69 | 22.04 | 37.34 |