Saint Louis University |
Computer Science 150
|
Dept. of Math & Computer Science |
In order to practice use of loops and conditionals, we wanted to play with an easily accessible data set. Our solution is to use a system-wide list of words and phrases that supports spell-checking and other language tools (here is a copy). We load those words into a Python list with the following command:
words = [line.strip() for line in open('/usr/share/dict/words') if line.strip()]
Develop Python code to determine the following (consider each problem independently). As an aide on using the str class, feel free to refer to online documentation.
Create a list of all words, converted to lowercase.
Create a list of all words that were originally lowercase.
Create a list of all words that have an apostrophe character.
(on turing, there are 26226 such words)
Find all entries with 5 or more lowercase 's' characters.
(on turing there are 166 such words)
Compute the average number of characters per entry, and the
average number of vowels ('aeiou') per entry.
(on turing, avg word length is 8.467 and average number of
vowels is 2.978)
Classic spelling rule is "i before e except after c";
create a list with exceptions to the typical rule.
(on turing there are 784 such words)
Find words that, after being lowercased, have consecutive repeated characters
(e.g., filled which has ll). Be careful that
each original word appears at most once in the result (e.g.,
coffee, that has both ff and ee).
(on turing there are 18447 such words)
Find words that have an 'a' followed (not necessarily
consecutively) by a 'b' and then by a 'c',
such as with tablecloth.
(on turing there are 249 such words)
Generalize the previous problem, allowing the user to select a
given pattern (e.g., pattern='abcde'), and determine
those words in which the pattern occurs as a (not necessarily
consecutive) subsequence.
(on turing there is a single word that has
'abcde' as a subsequence; can you find it?)
Make up your own challenge...