Course Home | Assignments | Class Photo | Computing Resources | Lab Hours/Tutoring | Python | Schedule

Saint Louis University

Computer Science 150
Introduction to Object-Oriented Programming

Michael Goldwasser

Spring 2011

Dept. of Math & Computer Science

Programming Assignment 08

More Anagrams

Due: 11:59pm, Wednesday 4 May 2011


Contents:


Overview

Chapter 11 describes a project for computing anagrams of words, that is, other words that can be formed through a rearrangement of letters (e.g., 'trace' and 'react'). In this assignment, we make several improvements to that code.


Collaboration Policy

For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.

Please make sure you adhere to the policies on academic integrity in this regard.


Detailed Requirements

Our starting point is a version of the anagram project from Chapter 11. The goal of this assignment is to implement three improvements to that program.
  1. Have anagrams function discover results in alphabetical order

    Note that in the original project, the results of the anagrams function are not alphabetical. For example, if computing anagrams for the word 'integral', the results are reported as

    integral
    triangle
    tanglier
    relating
    altering
    alerting
    

    An easy way to ensure that the results are discovered in alphabetical order by the process is to have the main part of the program make its initial call to the anagrams function with the characters in alphabetical order. That is, rather than calling anagrams('intergral'), we wish to effectively call anagrams('aegilnrt'). This serves to force results to be discovered in alphabetical order by the nature of our recursion. That is, it will try to find all results starting with 'a', then all results starting with 'e', and so on, with the same process being followed recursively (Note: you do not need to re-sort those characters within the recursive function. They will automatically remain sorted given the coded logic.)


  2. Have anagrams function avoid computing duplicate answers

    In some cases, you will find that the anagrams function places the same word on the list of results multiple times. For example, a call to anagrams('retrace') will generate results:

    retrace
    retrace
    terrace
    terrace
    terrace
    terrace
    retrace
    retrace
    caterer
    caterer
    caterer
    caterer
    
    Although there are only three unique anagrams, you will notice that each of those three is reported four times (yet not necessarily consecutively). We could just let it generate duplicates and then filter them from the output during post-processing. However, it is better to change the implementation of the anagrams function to avoid the duplication in the first place, because that will save significant computation time.

    A careful analysis of the recursion shows that the problem stems from the fact that the initial sequence of letters containes duplicates (e.g., the two 'r' characters). As an example, let's look at a call for 'retrace', but assuming that we use the first improvement above and start the initial call with sorted string anagrams('aceerrt', ''). That top level call spawns several independent recursive calls:

    anagrams('ceerrt', 'a')       # i=0 pass
    anagrams('aeerrt', 'c')       # i=1 pass
    anagrams('acerrt', 'e')       # i=2 pass
    anagrams('acerrt', 'e')       # i=3 pass
    anagrams('aceert', 'r')       # i=4 pass
    anagrams('aceert', 'r')       # i=5 pass
    anagrams('aceerr', 't')       # i=6 pass
    
    Note well that the i=2 and i=3 passes produce the same results, as in either case one of the 'e' characters is being presumed to start the word and the other 'e' remains in the character to use.

    The remedy is that when the characters to use contains duplicates, we only want to start the recursion once for each unique character that can be chosen to be the next.


  3. Allow anagrams function to consider multiword anagrams.

    Our original version assumed that all given letters must be used to form a single word. But it is interesting to try to find multiword anagrams. For example, the string 'use python' is an anagram for 'pushy note'. In fact, we will be willing to ignore spaces and consider anagrams such as 'editions' and 'it is done'. For this reason, our given program already strips all spaces out of the user's input before computing anagrams.

    Determining multiword anagrams efficiently will require more thoughtful code. In particular, if you consider the same style of recursion, with charsToUse and prefix, with prefix representing a partial solution possibly including spaces. For example, when evaluating anagrams('editions'), or more technically the sorted version anagrams('deiinost'), we might see an intermediate call to anagrams('eno', 'it is d').

    To implement this, you may use a realtively similar strategy to the original, in that any of the remaining charaters to use can be added to the end of the partial solution. But there are two modifications. First, when doing a prefix search to prune impossible combinations, you should only consider the final partial word in the solution. Secondly, because we are willing to consider multiword solution, you may also consider adding a space to the end of the partial solution, but only if the final word of that solution is a legitimate word in the language.


Benchmarks

As a sanity check, the table below describes the number of solutions and number of internal recursive calls when computing the anagrams of various words. The columns for "Version #1" describe the results of a program with the first of the three required improvements (and actually the original); the columns for "Version #2" refer to a program with the first two improvements implemented; Version #3 is the final program (unless Extra credit is implemented).

Version #1 Version #2 Version #3 Extra
#soln #recur #soln #recur #soln #recur #soln #recur
trace 7114 7114 9339 9304
retrace 12737 3228 572,088 571,388
editing 6387 3214 712,818 711,514
editions 41,084 2640 95227,737 95210,624
integral 61,211 61,211 71235,951 71217,208
diameters 62,907 31,676 6977186,602 697762,069
coordinate 64,051 32,637 293651,103,198 29365232,337
description 67,114 34,175 106,5695,054,186 106,569819,827
impersonated 415,390 29,167 2,998,90889,063,550 2,998,9089,728,898
disintegration 4855,326 26,916 23,670,072? 23,670,07231,783,356

As another point of reference, I have created a verbose version of my completed code (without extra credit), and have produced several complete taces of the execution:

Note that this debug code is a slightly different version than the one benchmarked in the above table, so the number of recursive calls may vary. But you might consider comparing the trace of the algorithm for my code to what your code does on that same example.

Submitting Your Assignment

Please submit a revised version of Anagram.py.

You should also submit a separate 'readme' text file, as outlined in the general webpage on programming assignments.

Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.


Grading Standards

The assignment is worth 10 points.


Extra Credit

There is another improvement to consider. With multiword anagrams, once you find one way to rearrange letters into legitimate words (e.g., 'it is done'), there there will certainly be other such anagrams that are formed by permuting those words (e.g., 'is it done', 'done it is'). Rather than allowing the original anagrams function to compute all of those variants, we can save computation time as follows.

We can force the original anagram recursion to only find one example of such a group of permuated anagrams by requiring that when multiple words are used in a solution, each word is alphabetically at least as great as its preceding words (e.g., as with 'done is it'). By pruning the recursion for any multiword partial solutions that violate this convention, we will save great time during the recursive computation.

Then, once the original recursion completes, we can use the canonical answers to re-generate the complete set of anagrams, for example rearranging 'done is it' into the six possible permutations of those three words. Those permutations can be computed with a separate function using a simple recursive approach (almost akin to the inefficient anagram finder from the chapter).

The only downside to this approach is that when combining all such solutions, we will no longer have the full list in alphabetical order, so we will need to do an explicit sort at the end.


Michael Goldwasser
Last modified: Tuesday, 03 May 2011