Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Saint Louis University

Computer Science 1020
Introduction to Computer Science: Bioinformatics

Michael Goldwasser

Spring 2019

Computer Science Department

Lab 07

Topic: Phylogenetic Cladograms
Collaboration Policy: The lab should be completed working in pairs
Submission Deadline:    11:50am Monday, 4 March 2019
11:59pm Tuesday, 5 March 2019

Contents:


Overview

In a series of labs, we explore the visualization of phylogetic trees, starting with a basic layout known as a cladogram. For example the tree we modeled in Python as

('A',
     ('B',(),()),
     ('C',
          ('D',
               ('E',(),()),
               ('F',(),())
          ),
          ('G', (), ())
     )
)
could be portrayed with the following diagram.

Such a phylogetic tree models the presumed relationship between species and their presumed common ancestors. However, in this basic view, there is no significance to the lengths of the edges drawn.

You will implement a recursive algorithm for drawing such trees, and using a simple graphics module in python known as Turtle Graphics. We will provide more documentation in a later section, but in short this module provides a virtual turtle that draws/writes on the canvas as it moves, and with the programmer instructing the turtle on how to move. However, before you begin doing any coding, we need you to better understand the recursive nature of the drawing algorithm, so we will ask you to do some pen-and-paper examples (but to later type up your answers within comments of your source code).


Algorithm Overview

Our drawings will be produced with a recursive algorithm. By convention, we will assure that when drawing a tree, the turtle begins facing rightward at the point that should become the left edge of the tree visualization, and with a vertical position that should be the vertical center of the eventual tree. Furthermore, we will assure that the leaves of the tree are to be laid out vertically at regular intervals (which we will denote as yScale within our later functions). For example, we might decide that leaves will be drawn at 10-pixel intervals on the vetical scale. By this convention, a generic schematic of a tree with eight leaves might appear as follows:

with the solid rectangle meant to portray the bounding-box of the tree, and the eight dashed horizontal lines designating the vertical location of where those eight leaves will eventually be drawn. Notice that the turtle starts precisely at the vertical center of the image.

With a recursive approach, we begin by moving rightward on a branch to a node, and then display its label as a string. If that node is a leaf, then we simply retrace our path back to where we began. More generally, if the tree has subtrees, rather than worrying about the full complexity of the tree, we focus only on drawing the "T" shape (possibly asymmetric) that will serve as the connector to the subtrees.

The key to the success of our algorithm (and avoiding overlap of subtrees) is in determining precisely how far upward/downward/rightward to move during the recursion. In determining how far upward/downward to move, we wish to maintain our convention that the eventual leaves of the tree be evenly spaced on the vertical scale. If we knew how many leaves are in the first and second subtree, we can use that information to determine the correct vertical offsets.

As a first example, consider the generic 8-leaf tree and presume that we knew that the first subtree had 3 of those leaves and the second subtree had 5 of those leaves. In this case, we should envision the recursive process as follows:

Notice that the top of the two subtree bounding boxes will cover three of the eventual leaves and we will bring the turtle precisely to the center of the left edge of that bounding box before starting the recursion. The bottom of the two subtrees will have five leaves, and thus we can determine where the left-center of that box should be.

Of course, if we had a different distibution of leaves, we would need our algorithm to "deliver" the turtle to other locations. For example, here is a schematic for an 8-leaf tree with 6 leaves in the first subtree and 2 in the second.

For convenience, the Python code we are providing you with already has a function with signture leafCount(tree) that returns the number of leaves within a given tree or subtree.

With that knowledge, the general algorithm will be implemented as follows:

  1. Move rightward.
  2. Draw the appropriate text label for a node.
  3. If the node is an internal node:
    1. test
    2. Turn and move the appropriate amount upward to the top point of the "T".
    3. Turn rightward and then start the recursive process again on the first subtree.
    4. Turn and move the appropriate amount downward to the bottom point of the "T".
    5. Turn rightward and then start the recursive process again on the second subtree.
    6. Retrace steps so that you are at the branch of the "T" facing rightward.
  4. Retreat leftward to the beginning of the "T" making sure to leave the turtle oriented rightward.


Warmup Questions

This brings us to your first part of the lab. We must eventually come up with a programatic way to determine the various distances for a new tree that we encounter. But before getting bogged down in Python code, you need to work out some cases by hand, and then hopefully determine a pattern that will allow you to generalize this to arbitrary size trees.

If we assume that the turtle starts at coordinate y=0, we wish to determine what the y value should be for starting the first subtree and what the y value should be for starting the second subtree. For the sake of these examples, let's assume that the leaves are rendered 10 pixels apart from each other. (In our real code, we'll make that yScale a parameter.) Also, unlike mathemticians, computer scientists tend to count pixels from the top of the screen downward, and thus we consider moving upward in the negative direction and downward in the positive direction.

Revisiting our first example of an 8-leaf tree with 5 leaves in its first subtree and 3 in its other, the first subtree should begin at height y = -25 and the second at height y = +15. In the second example, with 6 leaves in the top and 2 in the bottom, the starting heights for the recursions were y = -10 and y = +30 respectively.

You must complete the following table (which we've placed within the comments of the source code we are providing).
total
leaves
upper
leaves
lower
leaves
upper
y-value
lower
y-value
8 3 5 -25 +15
8 6 2 -10 +30
8 4 4
8 7 1
7 3 4
7 1 6
n a b
Note well that the last entry of this table is the most important, as you need to discover a formula that works for general parameters (presuming that a+b=n).


Turtle Graphics Primer

The turtle module is part of Python's standard libraries, to provide an illustrative way to generate some basic graphics. The turtle is effectively a virtual robot with a pen that draws lines as it moves. The only behaviors you will need to use are:

For convenience in testing, we have provided a function reset() within our code that clears the screen and returns the turtle to a starting position near the left edge of the screen. You should call reset() just before starting your draw function (but not from within!).


Time to Code

We are providing you with two files:

We have provided some utility functions to get you started and stubbed out the basic recursive function, draw, that has the following signature.

def draw(tree, xScale, yScale):
where xScale defines that horizontal offset to advance per level, and yScale is the vertical offset from leaf to leaf. The code follows the high-level algorithm enumerated above, distinguishing between a base case where you have a tree with empty subtrees and the general case in which the subtrees are nontrivial. However, we have stripped away the key geometric expressions determined from your warmup exercise, and we have left it to you to implement some utility functions advance and lateral by using the turtle graphics commands outlined above.

We have included a variety of sample trees within the Python code. Here are some renderings for you to match:


Submitting Your Assignment

One member of your partnership should electronically submit your modified file cladogram.py. The comments at the beginning of the file should clearly identify the member(s) of the partnernship and should include answers to the "warmup" questions.


Grading Standards

The assignment is worth 25 points, which will be assessed as follows:


Michael Goldwasser
CSCI 1020, Spring 2019
Last modified: Wednesday, 13 March 2019
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring