Saint Louis University |
Computer Science 1020
|
Computer Science Department |
Topic: | Phylogenetic Cladograms |
Collaboration Policy: | The lab should be completed working in pairs |
Submission Deadline: |
11:59pm Tuesday, 5 March 2019 |
In a series of labs, we explore the visualization of phylogetic trees, starting with a basic layout known as a cladogram. For example the tree we modeled in Python as
('A', ('B',(),()), ('C', ('D', ('E',(),()), ('F',(),()) ), ('G', (), ()) ) )could be portrayed with the following diagram.
Such a phylogetic tree models the presumed relationship between species and their presumed common ancestors. However, in this basic view, there is no significance to the lengths of the edges drawn.
You will implement a recursive algorithm for drawing such trees, and using a simple graphics module in python known as Turtle Graphics. We will provide more documentation in a later section, but in short this module provides a virtual turtle that draws/writes on the canvas as it moves, and with the programmer instructing the turtle on how to move. However, before you begin doing any coding, we need you to better understand the recursive nature of the drawing algorithm, so we will ask you to do some pen-and-paper examples (but to later type up your answers within comments of your source code).
Our drawings will be produced with a recursive algorithm. By
convention, we will assure that when drawing a tree, the turtle begins
facing rightward at the point that should become the left edge of the
tree visualization, and with a vertical position that should be the
vertical center of the eventual tree. Furthermore, we will assure
that the leaves of the tree are to be laid out vertically at
regular intervals (which we will denote as yScale within our
later functions). For example, we might decide that leaves will be
drawn at 10-pixel intervals on the vetical scale. By this convention,
a generic schematic of a tree with eight leaves might appear as
follows:
with the solid rectangle meant to portray the bounding-box of the
tree, and the eight dashed horizontal lines designating the vertical
location of where those eight leaves will eventually be drawn. Notice
that the turtle starts precisely at the vertical center of the image.
With a recursive approach, we begin by moving rightward on a branch to a node, and then display its label as a string. If that node is a leaf, then we simply retrace our path back to where we began. More generally, if the tree has subtrees, rather than worrying about the full complexity of the tree, we focus only on drawing the "T" shape (possibly asymmetric) that will serve as the connector to the subtrees.
The key to the success of our algorithm (and avoiding overlap of subtrees) is in determining precisely how far upward/downward/rightward to move during the recursion. In determining how far upward/downward to move, we wish to maintain our convention that the eventual leaves of the tree be evenly spaced on the vertical scale. If we knew how many leaves are in the first and second subtree, we can use that information to determine the correct vertical offsets.
As a first example, consider the generic 8-leaf tree and presume that
we knew that the first subtree had 3 of those leaves and the second
subtree had 5 of those leaves. In this case, we should envision the
recursive process as follows:
Notice that the top of the two subtree bounding boxes will cover three
of the eventual leaves and we will bring the turtle precisely to the
center of the left edge of that bounding box before starting the
recursion. The bottom of the two subtrees will have five leaves, and
thus we can determine where the left-center of that box should be.
Of course, if we had a different distibution of leaves, we would need
our algorithm to "deliver" the turtle to other locations. For example,
here is a schematic for an 8-leaf tree with 6 leaves in the first
subtree and 2 in the second.
For convenience, the Python code we are providing you with already has a function with signture leafCount(tree) that returns the number of leaves within a given tree or subtree.
With that knowledge, the general algorithm will be implemented as follows:
This brings us to your first part of the lab. We must eventually come up with a programatic way to determine the various distances for a new tree that we encounter. But before getting bogged down in Python code, you need to work out some cases by hand, and then hopefully determine a pattern that will allow you to generalize this to arbitrary size trees.
If we assume that the turtle starts at coordinate y=0, we wish to determine what the y value should be for starting the first subtree and what the y value should be for starting the second subtree. For the sake of these examples, let's assume that the leaves are rendered 10 pixels apart from each other. (In our real code, we'll make that yScale a parameter.) Also, unlike mathemticians, computer scientists tend to count pixels from the top of the screen downward, and thus we consider moving upward in the negative direction and downward in the positive direction.
Revisiting our first example of an 8-leaf tree with 5 leaves in
its first subtree and 3 in its other, the first subtree should begin
at height
You must complete the following table (which we've placed within the comments of the source code we are providing).
total leaves |
upper leaves |
lower leaves |
upper y-value |
lower y-value |
---|---|---|---|---|
8 | 3 | 5 | -25 | +15 |
8 | 6 | 2 | -10 | +30 |
8 | 4 | 4 | ||
8 | 7 | 1 | ||
7 | 3 | 4 | ||
7 | 1 | 6 | ||
n | a | b |
The turtle module is part of Python's standard libraries, to provide an illustrative way to generate some basic graphics. The turtle is effectively a virtual robot with a pen that draws lines as it moves. The only behaviors you will need to use are:
For convenience in testing, we have provided a function reset() within our code that clears the screen and returns the turtle to a starting position near the left edge of the screen. You should call reset() just before starting your draw function (but not from within!).
We are providing you with two files:
cladogram.py
This is the source code and the file that you must modify and
submit.
Note Well: the comment section at the begin of the
source code provides you with a place to enter your names and
your answers to the warmup exericses.
samples.py
This file provides a variety of sample trees for use in
testing your software (many of which are described below as examples).
We have provided some utility functions to get you started and stubbed out the basic recursive function, draw, that has the following signature.
def draw(tree, xScale, yScale):where xScale defines that horizontal offset to advance per level, and yScale is the vertical offset from leaf to leaf. The code follows the high-level algorithm enumerated above, distinguishing between a base case where you have a tree with empty subtrees and the general case in which the subtrees are nontrivial. However, we have stripped away the key geometric expressions determined from your warmup exercise, and we have left it to you to implement some utility functions advance and lateral by using the turtle graphics commands outlined above.
We have included a variety of sample trees within the Python code. Here are some renderings for you to match:
The tree from the overview is named fourB
and our image was parameterized as
The command
An example based loosely on page 118 of our text book,
if drawn as
Our next example is a phylogetic tree for some tree frogs. There are numberic labels on the internal nodes which you may ignore for now. (We will use them in our next lab.)
This example, if drawn as
while the same tree drawn as
Our biggest example is named complex and can be
rendered as
One member of your partnership should electronically submit your modified file cladogram.py. The comments at the beginning of the file should clearly identify the member(s) of the partnernship and should include answers to the "warmup" questions.
The assignment is worth 25 points, which will be assessed as follows: