Saint Louis University |
Computer Science 1020
|
Computer Science Department |
Topic: | Python Control Structures |
Techniques: | Use of for loops and if statements |
Reading: | lecture notes |
Collaboration Policy: | The lab should be completed working in pairs |
Submission Deadline: |
11:50am Wednesday, 6 February 2019 |
Files you need:
We will again rely on guinea pig mDNA as our test sequence. But this
time, you should write your own code in a different file and import
the guinea pig dna from a separate file. We suggest you download and
unpack the following zip file: lab04.zip
Example:
To get you started, we are giving our own solution to a first sample question. How often is a base followed immediately by the same base? If this were completely random, we'd expect this to be 0.25, however we observe 0.284285714286. We compute this as follows
count = 0 for k in range(len(dna)-1): # N.B. stop value if dna[k] == dna[k+1]: count += 1 percent = count/(len(dna)-1)
Note well that we do an index-based loop but with range(len(dna)-1). As a sanity check, if the dna has length 5, we only need to test four pairs of neighbors, and thus compare dna[0] to dna[1], dna[1] to dna[2], dna[2] to dna[3], and dna[3] to dna[4].
The following is a simulation of the code on a small strand of DNA, TCCACTTAAA.
To receive full credit for this lab, you must provide Python code to solve at least four of the following five questions. (hardcopy of response sheet)
What percentage of codons across all primary reading frames are ATG?
If this were completely random, we'd expect 1/64=1.5625% of the triples.
We observe 1.196% for guinea pig and 0.978% for human.
If two consecutive nucleotides match each other, how often is
the next nucleotide that same nucleotide?
If nucleotides were completely random, we’d expect 25%;
We observe 28.392% in guinea pig and 30.620% in human.
How many times does a motif of the form CC?AT occur within the
sequence? (where ? could be anything)
For guinea pig, 111 times; for humans, 132 times.
When the motif CC?AT does occur, what percentage of the
time is the middle nucleotide an A? (A so-called cat box CCAAT)
For guinea pig, 27.027%; for humans, 21.212%.
The pattern CCAAT is known as a "cat" box. What are the
relative percentage of bases immediately following the pattern
CCAA in the dna?
Guinea Pig | |||
A: 28.431% | C: 31.373% | G: 10.784% | T: 29.412% |
Human | |||
A: 39.416% | C: 29.197% | G: 10.949% | T: 20.438% |
Rather than copying all of your source code to the paper handout, we ask that one member of your team electronically submit your source code through our git repository Specifically, please submit the single file lab04.py to the lab04 folder in the submission system. Only one person in the pair should submit the file, but please make sure that you add a Python comment (i.e., line starting with # symbol), that identifies both members of the partnership as authors.