Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit

Saint Louis University

Computer Science 1020
Introduction to Computer Science: Bioinformatics

Michael Goldwasser

Spring 2018

Computer Science Department

Lab 02

Topic: Python Control Structures
Techniques: Use of for loops and if statements
Reading: lecture notes
Collaboration Policy: The lab should be completed working in pairs
Submission Deadline:    3:00pm Wednesday, 31 January 2018

Overview

Files you need:
We will again rely on guinea pig mDNA as our test sequence. But this time, you should write your own code in a different file and import the guinea pig dna from a separate file. We suggest you download and unpack the following zip file: lab02.zip


Advice:
All of this is new, and so we know this will take some practice (hence the lab). For this lab, the goal is to get used to how to do appropriate "bookkeeping" with variables while making use of a single for loop and some nested conditionals to capture the desired logic. So for each task, the goal is to think about what information you might keep track of if you were doing the task by hand (yet on a very long sequence that you only see one character at a time).


Example:

To get you started, we are giving our own solution to a first sample question. How often is a base followed immediately by the same base? If this were completely random, we'd expect this to be 0.25, however we observe 0.284285714286. We compute this as follows

count = 0
prev = 'x'
for base in dna:
    if base == prev:
        count += 1
    prev = base

percent = count/float(len(dna)-1)

We wish to build intuition where you can start to imagine that code as it would execute on a computer. At this stage, it may still help to see a simulation of this code on a small strand of DNA, TCCACTTAAA.


Your Task

To receive full credit for this lab, you must provide Python code to solve at least three of the following five questions. (hardcopy of response sheet)


  1. What percentage of consecutive bases are pattern 'AT'?
    If this were completely random, we'd expect 1/16=0.0625 of the pairs, yet we observe 0.094.

     

  2. What are the relative percentage of bases that immediately follow an 'A'?
    We find the following:
    A: 0.3160823594880356
    C: 0.23298089408273048
    G: 0.15785568540159525
    T: 0.29308106102763865

     

  3. What percentage of the time is a base the same as the base that was TWO earlier?
    If this were completely random, we'd expect 0.25; we observe 0.264599083279.

     

  4. How many times does the pattern CCAAT occur?
    We want you to determine this WITHOUT use of the built-in count method.
    Hint: Keep track of a sliding window of the most recent five characters

     

  5. What is the length of the longest consecutive sequence of a repeated base and which base is it?
    As discovered in the first lab, there are 9 consecutive A's; this is the longest such streak for any base.

     


Submitting Your Assignment

Rather than copying all of your source code to the paper handout, we ask that one member of your team electronically submit your source code. To do so, you will need to use the webpage password that was chosen as part of the course questionnaire.

Specifically, please submit the single file lab02.py to the lab02 folder in the submission system. Only one person in the pair should submit the file, but please make sure that you add a Python comment (i.e., line starting with # symbol), that identifies both members of the partnership as authors.


Michael Goldwasser
CSCI 1020, Spring 2018
Last modified: Thursday, 01 February 2018
Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit