Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Saint Louis University

Computer Science 1020
Introduction to Computer Science: Bioinformatics

Michael Goldwasser

Spring 2019

Computer Science Department

Lab 04

Topic: Python Control Structures
Techniques: Use of for loops and if statements
Reading: lecture notes
Collaboration Policy: The lab should be completed working in pairs
Submission Deadline:    11:50am Monday, 4 February 2019
11:50am Wednesday, 6 February 2019

Overview

Files you need:
We will again rely on guinea pig mDNA as our test sequence. But this time, you should write your own code in a different file and import the guinea pig dna from a separate file. We suggest you download and unpack the following zip file: lab04.zip


Example:

To get you started, we are giving our own solution to a first sample question. How often is a base followed immediately by the same base? If this were completely random, we'd expect this to be 0.25, however we observe 0.284285714286. We compute this as follows

count = 0
for k in range(len(dna)-1):    # N.B. stop value
    if dna[k] == dna[k+1]:
        count += 1

percent = count/(len(dna)-1)

Note well that we do an index-based loop but with range(len(dna)-1). As a sanity check, if the dna has length 5, we only need to test four pairs of neighbors, and thus compare dna[0] to dna[1], dna[1] to dna[2], dna[2] to dna[3], and dna[3] to dna[4].

The following is a simulation of the code on a small strand of DNA, TCCACTTAAA.


Your Task

To receive full credit for this lab, you must provide Python code to solve at least four of the following five questions. (hardcopy of response sheet)

  1. What percentage of codons across all primary reading frames are ATG?
    If this were completely random, we'd expect 1/64=1.5625% of the triples. We observe 1.196% for guinea pig and 0.978% for human.

     

  2. If two consecutive nucleotides match each other, how often is the next nucleotide that same nucleotide?
    If nucleotides were completely random, we’d expect 25%;
    We observe 28.392% in guinea pig and 30.620% in human.

     

  3. How many times does a motif of the form CC?AT occur within the sequence? (where ? could be anything)
    For guinea pig, 111 times; for humans, 132 times.

     

  4. When the motif CC?AT does occur, what percentage of the time is the middle nucleotide an A? (A so-called cat box CCAAT)
    For guinea pig, 27.027%; for humans, 21.212%.

     

  5. The pattern CCAAT is known as a "cat" box. What are the relative percentage of bases immediately following the pattern CCAA in the dna?
    Guinea Pig
    A: 28.431% C: 31.373% G: 10.784% T: 29.412%
    Human
    A: 39.416% C: 29.197% G: 10.949% T: 20.438%

     


Submitting Your Assignment

Rather than copying all of your source code to the paper handout, we ask that one member of your team electronically submit your source code through our git repository Specifically, please submit the single file lab04.py to the lab04 folder in the submission system. Only one person in the pair should submit the file, but please make sure that you add a Python comment (i.e., line starting with # symbol), that identifies both members of the partnership as authors.


Michael Goldwasser
CSCI 1020, Spring 2019
Last modified: Monday, 04 February 2019
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring