Saint Louis University |
Computer Science 150
|
Dept. of Math & Computer Science |
The goal for this assignment is to write a fully functional spell check program. You will be provided a dictionary of English words and you need to write a program that will find mistakes and allow you to correct them.
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.
Please make sure you adhere to the policies on academic integrity in this regard.
When the program is run it will prompt the user for filenames of the text on which to spell-check and a dictionary file of correctly spelled English words. The program should loop through each word of the original text document. When it finds a word not in the dictionary it will print out the word as well as the line number based on the original document, and prompt the user on how to deal with it. The user will have the option to ignore it, replace it with a word that they enter or with one of a list of options. These options will be the two words immediately before and after the place where the misspelled word would have occurred in the dictionary. After the entire session, the corrected text should be saved to disk as a replacement for the original document (using the same filename).
Assume that the original document, my.txt, has the following content,
This is a tesk of the spellchecking prograg, albeit a small tess. This is only a test.
An example session might go like this:
Enter the name of the file to spellcheck: my.txt Enter the name of the dictionary file: English.dict The word: tesk on line 1 is not in the dictionary. i) Ignore r) Replace 1) terzetto 2) tesla Option: r Enter your replacement: test The word: spellchecking on line 1 is not in the dictionary. i) Ignore r) Replace 1) spellbound 2) spellcraft Option: i The word: prograg on line 2 is not in the dictionary. i) Ignore r) Replace 1) prograde 2) program Option: 2 The word: tess on line 1 is not in the dictionary. i) Ignore r) Replace 1) tesla 2) tessellate Option: r Enter your replacement: test Done spellchecking. File Saved
At this point, the file my.txt should read
This is a test of the spellchecking program, albeit a small test. This is only a test.
You should use whitespace as the delimiter when determining the breakdown into words.
If a word is followed immediately by punctuation (e.g. 'test.') you must make sure to strip off that trailing punctuation character before checking to see if the word is in the dictionary. At the same time, the punctuation should still be included in the final document.
The dictionary contains both capitalized and uncapitalized words. A word that is capitalized in the dictionary is only legitimate if capitalized in the document (i.e., 'Missouri' is okay but 'missouri' is not). A word that is uncapitalized in the dictionary can be used in the document in either capitalized or uncapitalized from (i.e., 'This' and 'this' are both legitimate).
Watch out for effect of newlines, when read from the dictionary or document.
Make sure to remove punctuation from the end of words before you check if they are in the dictionary.
Many of the methods of the Python string class which we have not previously emphasized will be quite useful for this assignment. Most notable are: isalpha(), isdigit(), islower(), issapce(), isupper(), rstrip(). Type help(str) in a Python interpreter for more details.
Make sure to close your files when you're done with them.
We will provide an English dictionary
ln -sf /home/home0/goldwasser/lib/English.dict .
This will create a link to our dictionary with your directory.
As usual, you should submit your sourcecode as well as a separate 'readme' file. If you worked as a pair, please make this clear and briefly describe the contributions of each person in the effort.
Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.
The assignment is worth 10 points. You will be graded based upon the correctness of its identification and correction of misspelled words. Furthermore, you program will be judged on its efficiency. You program should find misspellings and correction options fast enough to be a useful tool.
Often a word is misspelled more than once in a document. Have your program keep track of the replacements it has made and if you comes up again provide the correction as an additional option. Also, provide Ignore All and Replace All options that will ignore all future occurrences of a word or make replacements on all future misspellings of a word, respectively.