Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Saint Louis University

Computer Science 1020
Introduction to Computer Science: Bioinformatics

Michael Goldwasser

Spring 2019

Computer Science Department

Homework Assignment 02

Inverted Repeats

Due: 11:59pm, Friday, 8 February 2019


Contents:


Collaboration Policy

For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.

Please make sure you adhere to the policies on academic integrity in this regard.


Overview

In this project, we explore the existence of inverted repeats, which is when some particular motif is later followed in a DNA strand by its reverse complement. In three dimensions, such a single strand might recombine in a loop, with the motif binding with its inverted repeat, forming what is known as a transposable element which often are a source of mutations.

As an example, consider the single strand ATCCGAATACGGTTCGGGTA. Notice the motif CGAA that is later followed by the reverse complement TTCG. While this is only one strand of what is double-stranded DNA, during the replication process when it is single stranded, there is a chance that it binds with itself at the motif and its inverted repeat, forming a loop as follows:

...ATC GGTA...
     C G
     G C
     A T
     A T
    T   G
     ACG
If that loop were to be disconnected for some reason, the result mutation would result in the remaining ATCGGTA.


Your Task

Your task is to author a function, mutate(dna, motif), which looks for an inverted repeat of the given motif, and if found, which returns the dna strand that would result after deleting the implicit loop that forms. More specifically, your function's high-level logic should be

  1. Look for the leftmost occurrence of the original motif. If none is found, then return the original dna unmodified.

  2. Starting after the end of the original motif, look for the next subsequent occurrence of the reverse complement of that motif (the so called inverted repeat). If no such occurrence is found, return the original dna unmodified.

  3. Compute and return the mutated DNA that results from removing the motif, its inverted repeat, and the zero or more characters that might be found between them.

Your function is only responsible for performing a mutation that occurs between the first occurrence of the pattern, and the next subsequent occurrence of the inverted repeat (that which is completely disjoint from the forward motif).


Testing

A very important aspect of software development is thorough testing. We gave one example in the introduction, but just because your code works correctly on that test does not mean that it works correctly in general. It is important that you consider the variety of allowable inputs that your program might encounter, and to make sure to express your logic in a general enough way so that it works for all such inputs.

For example, here are a few more well-formed tests cases to consider:

original DNA motif resulting DNA comments
ATCCGAATACGGTTCGGGTA CGAA ATCGGTA first example
ATCCGAATACGGTTGGGTA CGAA ATCCGAATACGGTTGGGTA no inverted repeat exists
ATCCAATACGGTTCGGGTA CGAA ATCCAATACGGTTCGGGTA original motif not found
ATCCGAATACGGTTCGGGTTCGA CGAA ATCGGTTCGA if multiple inverted repeats exists,
match with the next subsequent one
AGTCACATGATCAGT C AGTATCAGT The motif can be any non-empty string

Certainly if your program does not work correctly on these cases, you should reconsider your implementation. But even if you are correct on those cases as well, there may still be other combinations of factors that might cause a problem for some implementations. For this reason, in addition to having each team submit their Python source code, we are asking that each team also develop a series of additional test cases that you think might cause trouble for some implementations. We will then test each submitted program on each submitted test case and give credit in the grading standards both for how well your implementation does when faced with other students' tests, and how well your tests due in exposing flaws in other students' impelementations.

To allow us to more easily execute such a competition, it is imperative that you provide your test cases strictly adhereing to the following specifications. You are to submit your tests in a separate file named tests.txt, which must be saved in plain text format. (We suggest you simply use the IDLE text editor to create this as well.) The contents of the file should be as follows:

As an example, the test file that would correspond to the five cases in the above table should be submitted: tests.txt. But you are welcome to delete those cases and provide ten new cases (since presumably other students will have already tested their code on the five given examples).


Files You Need

To get you going, we are providing three files, which you may either download indivdiually, or combined as this zip file to be unpacked. The three files are

If you are already working on our department's computer system and prefer to copy these files using commandline techniques, you may execute the following command from whatever working directory you'd like them place:

cp -Rp /public/goldwasser/1020/homeworks/mutate  .


Advice

This task can be accomplished with careful use of methods of the str class (and possibly the list class). There tend to be two different schools of thought for how to accomplish the underlying string manipulations:

We suggest using the first of these approaches, and offer extra credit to anyone willing to explore by implementing both approaches.


Submitting Your Assignment Electronically

All of the following files should be submitted electronically:

Please also note the late policy for homeworks.


Grading Standards

The assignment is worth 40 points, which will be assessed as follows:


Extra Credit

In the advice section we outline two different approaches for cutting apart the string: one based on use of index and slicing, and the other based on the split method. For extra credit, given an alternative impelementation that uses whatever is the opposite approach than you used for your official submission.

So as not to risk losing points on the required part of the assignment due to a failed extra credit attempt, please submit an original version of your assignment in a file mutate.py and the separate extra credit version in a file mutateExtra.py.


Michael Goldwasser
Last modified: Thursday, 31 January 2019