Saint Louis University |
Computer Science 1020
|
Computer Science Department |
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.
Please make sure you adhere to the policies on academic integrity in this regard.
For this homework, we are going to directly follow the project described in Chapter 8 of the text, and their supplemental website. Please make sure to read the chapter to understand the biological context of the task.
While we will provide some additional guidance on the rest of this page, all of the details of this project, and step-by-step instructions for how to proceed can be found at
The textbook companion website
The primary description of this assignment will task you with implementing four different functions. We're planning on using four class periods to (hopefully) allow you to get much of the work done in class. In that spirit, we hope you can complete the following tasks successfully by the end of those class periods (although you're certainly welcome to catch up outside of class, or to work faster than this pace).
class period | function |
---|---|
Friday, 2 March 2018 | memoAlignScore(S1, S2, gap, subMat, memo) |
Monday, 5 March 2018 | allScores(geneList1, geneList2) |
Wednesday, 7 March 2018 | closestMatch(geneName, allScoresD) |
Friday, 9 March 2018 | printBRH(geneName, allScoresD) |
We will be needing to exercise some additional Python techniques to get the work done.
memoAlignScore(S1, S2, gap, subMat, memo)
Adapt the technique that the book demonstrates for LCS in
Chapter 7.6 to the new alignScore function given in Chapter 8.3.
allScores(geneList1, geneList2)
This will require the use of nested loops, which is
when one loop is executed in the body of another, so that for
each pass of the outer loop, the entire inner loop is executed.
In particular, since you want to compare every gene a in the
first list to every gene b in the second list, your code my have
a structure such as
for a in geneList1: for b in geneList2: # compute the score for the appopriate protein sequences for genes a and b
closestMatch(geneName, allScoresD)
This is pretty typical of needing a loop to look for the best of
something. In this case, need to retrieve an (a,b) pair
from among the keys of the score dictionary and see if one of
a or b is the gene you are interested in.
printBRH(geneName, allScoresD)
The most challenging part of this might be that we haven't done
a lot with how to format strings to create the desired output.
In Python2, when you use the print command, if you give
many arguments separated by commas in your code, they will be
printed on the same line yet separated by spaces. So you might
produce this line with a syntax such as
print chromeA, startA, nameA, '---', chromeB, startB, nameBif only you had already defined such variables with the information that you seek.
There are three files you will need, which you may either getting by downloading and unpacking, this zip file, by typing the following if working on hopper:
or by downloading each of the three individual python files:cp -Rp /public/goldwasser/1020/homeworks/homology .
blosum62.py
A Python file that defines the variable blossum62 to be
a dictionary that maps amino acid pairs, such as
humanChickenProteins.py
A Python file that provides all of the underlying biological
data needed for this project.
SexChromosomeEvolution.py
This is the file you should edit and submit. While the
authors presume that you will start from scratch, we've taken
the liberty to start out the file and create the expected
function declarations, but with you needing to do the four
implementations.
At the bottom of this file, we've also automated the various tests that are described in the prose of the project description so that you can more easily test the various aspects of your functions along the way. See the project description for the expected results.
All of the following files should be submitted electronically:
SexChromosomeEvolution.py
This file should contain the complete source code for your
implementation.
SexChromosomeEvolution.txt
This text file should contain both your analysis as described at
the end of the project description, and a copy of the full
output of your final ortholog computation for the full
human/chicken data set.
Please also include the names of both partners at the top of this file.
To submit, please follow the instructions on our submit system, using the website password that you indicated when completing the course questionnaire. Please also note the late policy for homeworks.
The assignment is worth 40 points, which will be assessed as follows: