Computer Science 150
Introduction to Object-Oriented Programming

Programming Assignment 08

More Anagrams

Due: 11:59pm, Wednesday 4 May 2011

Overview

Chapter 11 describes a project for computing anagrams of words, that is, other words that can be formed through a rearrangement of letters (e.g., 'trace' and 'react'). In this assignment, we make several improvements to that code.

Collaboration Policy

For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.

Please make sure you adhere to the policies on academic integrity in this regard.

Detailed Requirements

Our starting point is a version of the anagram project from Chapter 11. The goal of this assignment is to implement three improvements to that program.

Have anagrams function discover results in alphabetical order
Note that in the original project, the results of the anagrams function are not alphabetical. For example, if computing anagrams for the word 'integral', the results are reported as
```
integral
triangle
tanglier
relating
altering
alerting
```
An easy way to ensure that the results are discovered in alphabetical order by the process is to have the main part of the program make its initial call to the anagrams function with the characters in alphabetical order. That is, rather than calling anagrams('intergral'), we wish to effectively call anagrams('aegilnrt'). This serves to force results to be discovered in alphabetical order by the nature of our recursion. That is, it will try to find all results starting with 'a', then all results starting with 'e', and so on, with the same process being followed recursively (Note: you do not need to re-sort those characters within the recursive function. They will automatically remain sorted given the coded logic.)

Have anagrams function avoid computing duplicate answers
In some cases, you will find that the anagrams function places the same word on the list of results multiple times. For example, a call to anagrams('retrace') will generate results:
```
retrace
retrace
terrace
terrace
terrace
terrace
retrace
retrace
caterer
caterer
caterer
caterer
```
Although there are only three unique anagrams, you will notice that each of those three is reported four times (yet not necessarily consecutively). We could just let it generate duplicates and then filter them from the output during post-processing. However, it is better to change the implementation of the anagrams function to avoid the duplication in the first place, because that will save significant computation time.
A careful analysis of the recursion shows that the problem stems from the fact that the initial sequence of letters containes duplicates (e.g., the two 'r' characters). As an example, let's look at a call for 'retrace', but assuming that we use the first improvement above and start the initial call with sorted string anagrams('aceerrt', ''). That top level call spawns several independent recursive calls:
```
anagrams('ceerrt', 'a')       # i=0 pass
anagrams('aeerrt', 'c')       # i=1 pass
anagrams('acerrt', 'e')       # i=2 pass
anagrams('acerrt', 'e')       # i=3 pass
anagrams('aceert', 'r')       # i=4 pass
anagrams('aceert', 'r')       # i=5 pass
anagrams('aceerr', 't')       # i=6 pass
```
Note well that the i=2 and i=3 passes produce the same results, as in either case one of the 'e' characters is being presumed to start the word and the other 'e' remains in the character to use.
The remedy is that when the characters to use contains duplicates, we only want to start the recursion once for each unique character that can be chosen to be the next.

Allow anagrams function to consider multiword anagrams.
Our original version assumed that all given letters must be used to form a single word. But it is interesting to try to find multiword anagrams. For example, the string 'use python' is an anagram for 'pushy note'. In fact, we will be willing to ignore spaces and consider anagrams such as 'editions' and 'it is done'. For this reason, our given program already strips all spaces out of the user's input before computing anagrams.
Determining multiword anagrams efficiently will require more thoughtful code. In particular, if you consider the same style of recursion, with charsToUse and prefix, with prefix representing a partial solution possibly including spaces. For example, when evaluating anagrams('editions'), or more technically the sorted version anagrams('deiinost'), we might see an intermediate call to anagrams('eno', 'it is d').
To implement this, you may use a realtively similar strategy to the original, in that any of the remaining charaters to use can be added to the end of the partial solution. But there are two modifications. First, when doing a prefix search to prune impossible combinations, you should only consider the final partial word in the solution. Secondly, because we are willing to consider multiword solution, you may also consider adding a space to the end of the partial solution, but only if the final word of that solution is a legitimate word in the language.

Benchmarks

As a sanity check, the table below describes the number of solutions and number of internal recursive calls when computing the anagrams of various words. The columns for "Version #1" describe the results of a program with the first of the three required improvements (and actually the original); the columns for "Version #2" refer to a program with the first two improvements implemented; Version #3 is the final program (unless Extra credit is implemented).

Version #1 Version #2 Version #3 Extra

#soln #recur #soln #recur #soln #recur #soln #recur

trace 7 114 7 114 9 339 9 304

retrace 12 737 3 228 57 2,088 57 1,388

editing 6 387 3 214 71 2,818 71 1,514

editions 4 1,084 2 640 952 27,737 952 10,624

integral 6 1,211 6 1,211 712 35,951 712 17,208

diameters 6 2,907 3 1,676 6977 186,602 6977 62,069

coordinate 6 4,051 3 2,637 29365 1,103,198 29365 232,337

description 6 7,114 3 4,175 106,569 5,054,186 106,569 819,827

impersonated 4 15,390 2 9,167 2,998,908 89,063,550 2,998,908 9,728,898

disintegration 48 55,326 2 6,916 23,670,072 ? 23,670,072 31,783,356

	Version #1	Version #2	Version #3	Extra
	#soln	#recur	#soln	#recur	#soln	#recur	#soln	#recur
trace	7	114	7	114	9	339	9	304
retrace	12	737	3	228	57	2,088	57	1,388
editing	6	387	3	214	71	2,818	71	1,514
editions	4	1,084	2	640	952	27,737	952	10,624
integral	6	1,211	6	1,211	712	35,951	712	17,208
diameters	6	2,907	3	1,676	6977	186,602	6977	62,069
coordinate	6	4,051	3	2,637	29365	1,103,198	29365	232,337
description	6	7,114	3	4,175	106,569	5,054,186	106,569	819,827
impersonated	4	15,390	2	9,167	2,998,908	89,063,550	2,998,908	9,728,898
disintegration	48	55,326	2	6,916	23,670,072	?	23,670,072	31,783,356

As another point of reference, I have created a verbose version of my completed code (without extra credit), and have produced several complete taces of the execution:

Trace of anagrams for 'tears'
Trace of anagrams for 'retrace'
Trace of anagrams for 'editing'

Note that this debug code is a slightly different version than the one benchmarked in the above table, so the number of recursive calls may vary. But you might consider comparing the trace of the algorithm for my code to what your code does on that same example.

Submitting Your Assignment

Please submit a revised version of Anagram.py.

You should also submit a separate 'readme' text file, as outlined in the general webpage on programming assignments.

Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.

Grading Standards

The assignment is worth 10 points.

Extra Credit

There is another improvement to consider. With multiword anagrams, once you find one way to rearrange letters into legitimate words (e.g., 'it is done'), there there will certainly be other such anagrams that are formed by permuting those words (e.g., 'is it done', 'done it is'). Rather than allowing the original anagrams function to compute all of those variants, we can save computation time as follows.

We can force the original anagram recursion to only find one example of such a group of permuated anagrams by requiring that when multiple words are used in a solution, each word is alphabetically at least as great as its preceding words (e.g., as with 'done is it'). By pruning the recursion for any multiword partial solutions that violate this convention, we will save great time during the recursive computation.

Then, once the original recursion completes, we can use the canonical answers to re-generate the complete set of anagrams, for example rearranging 'done is it' into the six possible permutations of those three words. Those permutations can be computed with a separate function using a simple recursive approach (almost akin to the inefficient anagram finder from the chapter).

The only downside to this approach is that when combining all such solutions, we will no longer have the full list in alphabetical order, so we will need to do an explicit sort at the end.

Michael Goldwasser

Last modified: Tuesday, 03 May 2011

Saint Louis University

Computer Science 150
Introduction to Object-Oriented Programming

Michael Goldwasser

Spring 2011

Dept. of Math & Computer Science

Programming Assignment 08

More Anagrams

Due: 11:59pm, Wednesday 4 May 2011

Contents:

Overview

Collaboration Policy

Detailed Requirements

Benchmarks

Submitting Your Assignment

Grading Standards

Extra Credit

Computer Science 150 Introduction to Object-Oriented Programming

Spring 2011

Programming Assignment 08

More Anagrams

Due: 11:59pm, Wednesday 4 May 2011

Contents:

Computer Science 150
Introduction to Object-Oriented Programming