Saint Louis University |
Computer Science 150
|
Dept. of Math & Computer Science |
Chapter 11 describes a project for computing anagrams of words, that is, other words that can be formed through a rearrangement of letters (e.g., 'trace' and 'react'). In this assignment, we make several improvements to that code.
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.
Please make sure you adhere to the policies on academic integrity in this regard.
Note that in the original project, the results of the anagrams function are not alphabetical. For example, if computing anagrams for the word 'integral', the results are reported as
integral triangle tanglier relating altering alerting
An easy way to ensure that the results are discovered in alphabetical order by the process is to have the main part of the program make its initial call to the anagrams function with the characters in alphabetical order. That is, rather than calling anagrams('intergral'), we wish to effectively call anagrams('aegilnrt'). This serves to force results to be discovered in alphabetical order by the nature of our recursion. That is, it will try to find all results starting with 'a', then all results starting with 'e', and so on, with the same process being followed recursively (Note: you do not need to re-sort those characters within the recursive function. They will automatically remain sorted given the coded logic.)
In some cases, you will find that the anagrams function places the same word on the list of results multiple times. For example, a call to anagrams('retrace') will generate results:
retrace retrace terrace terrace terrace terrace retrace retrace caterer caterer caterer catererAlthough there are only three unique anagrams, you will notice that each of those three is reported four times (yet not necessarily consecutively). We could just let it generate duplicates and then filter them from the output during post-processing. However, it is better to change the implementation of the anagrams function to avoid the duplication in the first place, because that will save significant computation time.
A careful analysis of the recursion shows that the problem stems from the fact that the initial sequence of letters containes duplicates (e.g., the two 'r' characters). As an example, let's look at a call for 'retrace', but assuming that we use the first improvement above and start the initial call with sorted string anagrams('aceerrt', ''). That top level call spawns several independent recursive calls:
anagrams('ceerrt', 'a') # i=0 pass anagrams('aeerrt', 'c') # i=1 pass anagrams('acerrt', 'e') # i=2 pass anagrams('acerrt', 'e') # i=3 pass anagrams('aceert', 'r') # i=4 pass anagrams('aceert', 'r') # i=5 pass anagrams('aceerr', 't') # i=6 passNote well that the i=2 and i=3 passes produce the same results, as in either case one of the 'e' characters is being presumed to start the word and the other 'e' remains in the character to use.
The remedy is that when the characters to use contains duplicates, we only want to start the recursion once for each unique character that can be chosen to be the next.
Our original version assumed that all given letters must be used
to form a single word. But it is interesting to try to find
multiword anagrams. For example, the string
Determining multiword anagrams efficiently will require more thoughtful code. In particular, if you consider the same style of recursion, with charsToUse and prefix, with prefix representing a partial solution possibly including spaces. For example, when evaluating anagrams('editions'), or more technically the sorted version anagrams('deiinost'), we might see an intermediate call to anagrams('eno', 'it is d').
To implement this, you may use a realtively similar strategy to the original, in that any of the remaining charaters to use can be added to the end of the partial solution. But there are two modifications. First, when doing a prefix search to prune impossible combinations, you should only consider the final partial word in the solution. Secondly, because we are willing to consider multiword solution, you may also consider adding a space to the end of the partial solution, but only if the final word of that solution is a legitimate word in the language.
Version #1 | Version #2 | Version #3 | Extra | |||||
---|---|---|---|---|---|---|---|---|
#soln | #recur | #soln | #recur | #soln | #recur | #soln | #recur | |
trace | 7 | 114 | 7 | 114 | 9 | 339 | 9 | 304 |
retrace | 12 | 737 | 3 | 228 | 57 | 2,088 | 57 | 1,388 |
editing | 6 | 387 | 3 | 214 | 71 | 2,818 | 71 | 1,514 |
editions | 4 | 1,084 | 2 | 640 | 952 | 27,737 | 952 | 10,624 |
integral | 6 | 1,211 | 6 | 1,211 | 712 | 35,951 | 712 | 17,208 |
diameters | 6 | 2,907 | 3 | 1,676 | 6977 | 186,602 | 6977 | 62,069 |
coordinate | 6 | 4,051 | 3 | 2,637 | 29365 | 1,103,198 | 29365 | 232,337 |
description | 6 | 7,114 | 3 | 4,175 | 106,569 | 5,054,186 | 106,569 | 819,827 |
impersonated | 4 | 15,390 | 2 | 9,167 | 2,998,908 | 89,063,550 | 2,998,908 | 9,728,898 |
disintegration | 48 | 55,326 | 2 | 6,916 | 23,670,072 | ? | 23,670,072 | 31,783,356 |
As another point of reference, I have created a verbose version of my completed code (without extra credit), and have produced several complete taces of the execution:
Note that this debug code is a slightly different version than the one benchmarked in the above table, so the number of recursive calls may vary. But you might consider comparing the trace of the algorithm for my code to what your code does on that same example.Please submit a revised version of Anagram.py.
You should also submit a separate 'readme' text file, as outlined in the general webpage on programming assignments.
Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.
The assignment is worth 10 points.
There is another improvement to consider. With multiword anagrams,
once you find one way to rearrange letters into legitimate words
(e.g.,
We can force the original anagram recursion to only find one example
of such a group of permuated anagrams by requiring that when multiple
words are used in a solution, each word is alphabetically at least as
great as its preceding words (e.g., as with
Then, once the original recursion completes, we can use the canonical
answers to re-generate the complete set of anagrams, for example
rearranging
The only downside to this approach is that when combining all such solutions, we will no longer have the full list in alphabetical order, so we will need to do an explicit sort at the end.