Saint Louis University |
Computer Science 1020
|
Computer Science Department |
Topic: | Python Control Structures |
Techniques: | Use of for loops and if statements |
Reading: | lecture notes |
Collaboration Policy: | The lab should be completed working in pairs |
Submission Deadline: | 3:00pm Wednesday, 31 January 2018 |
Files you need:
We will again rely on guinea pig mDNA as our test sequence. But this
time, you should write your own code in a different file and import
the guinea pig dna from a separate file. We suggest you download and
unpack the following zip file: lab02.zip
Advice:
All of this is new, and so we know this will take some practice (hence
the lab). For this lab, the goal is to get used to how to do
appropriate "bookkeeping" with variables while making use of a single
for loop and some nested conditionals to capture the desired
logic. So for each task, the goal is to think about what information
you might keep track of if you were doing the task by hand (yet on a
very long sequence that you only see one character at a time).
Example:
To get you started, we are giving our own solution to a first sample question. How often is a base followed immediately by the same base? If this were completely random, we'd expect this to be 0.25, however we observe 0.284285714286. We compute this as follows
count = 0 prev = 'x' for base in dna: if base == prev: count += 1 prev = base percent = count/float(len(dna)-1)
We wish to build intuition where you can start to imagine that code as it would execute on a computer. At this stage, it may still help to see a simulation of this code on a small strand of DNA, TCCACTTAAA.
To receive full credit for this lab, you must provide Python code to solve at least three of the following five questions. (hardcopy of response sheet)
What percentage of consecutive bases are pattern 'AT'?
If this were completely random, we'd expect 1/16=0.0625 of the pairs,
yet we observe 0.094.
What are the relative percentage of bases that immediately
follow an 'A'?
We find the following:
A: | 0.3160823594880356 |
C: | 0.23298089408273048 |
G: | 0.15785568540159525 |
T: | 0.29308106102763865 |
What percentage of the time is a base the same as the base that was TWO earlier?
If this were completely random, we'd expect 0.25; we
observe 0.264599083279.
How many times does the pattern CCAAT occur?
We want you to determine this WITHOUT use of the built-in count method.
Hint: Keep track of a sliding window of the most recent five characters
What is the length of the longest consecutive sequence of a repeated base and which base is it?
As discovered in the first lab, there are 9 consecutive A's;
this is the longest such streak for any base.
Rather than copying all of your source code to the paper handout, we ask that one member of your team electronically submit your source code. To do so, you will need to use the webpage password that was chosen as part of the course questionnaire.
Specifically, please submit the single file lab02.py to the lab02 folder in the submission system. Only one person in the pair should submit the file, but please make sure that you add a Python comment (i.e., line starting with # symbol), that identifies both members of the partnership as authors.