Saint Louis University |
Computer Science 1300
|
Computer Science Department |
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.
Please make sure you adhere to the policies on academic integrity in this regard.
DNA can be modeled as a string of characters using the alphabet A, C, G, and T. One form of DNA mutation occurs when a substring of the DNA is reversed during the replication process. Usually, such a reversal occurs between what are termed inverted pairs, where some substring is followed later by its reversal.
As an example, consider the original DNA strand:
CGATTGAACATGTAAGTCCAATT
This example happens to have an inverted pair, with the original
marker being TGAA and its subsequent reversed pair being
AAGT as shown below.
CGATTGAACATGTAAGTCCAATT
It is possible that the entire slice of DNA delimited by those
patterns could be inverted and reattached, since the bonds at each
end will be locally the same. In that case, the resulting mutated DNA
would appear as
CGATTGAATGTACAAGTCCAATT
In effect, the 13 character strand has been flipped, noting that the
middle 5 characters TGTAC are effectively reversed relative
to the original strand.
You are to design a program that works as follows. It should ask the user for an original DNA string as well as the particular pattern that is inverted. It should then locate the leftmost occurrence of that pattern, and the next subsequent and disjoint occurrence of the inverted pattern. The output should be the mutated DNA, with the segment including the inverted pair reversed. An example session might proceed as follows (where the user input is shown in bold):
Enter a DNA sequence: CGATTGAACATGTAAGTCCATT Enter the pattern: TGAA Mutated DNA sequence: CGATTGAATGTACAAGTCCAATT
For the sake of this assignment, you may assume that the user enters valid input, defined as follows.
Your program is responsible for performing only the single mutation that occurs between the first occurrence of the pattern, and the next subsequent occurrence of the reversed pattern (that which is completely disjoint from the forward pattern).
However, you need not concern yourself with how your program behaves if given errant input that does not meet these formal specifications. (We will need to learn about conditional statements in Ch. 4 in order to write a program that gracefully handles such situations.)
A very important aspect of software development is thorough testing. We gave one example of a run of this program in the introduction, but just because your code works correctly on that test does not mean that it works correctly in general. It is important that you consider the variety of allowable inputs that your program might encounter, and to make sure to express your logic in a general enough way so that it works for all such inputs.
For example, here are a few more well-formed tests cases to consider:
original DNA | forward pattern | mutated DNA | comments |
---|---|---|---|
CGATTGAACATGTAAGTCCATT | TGAA | CGATTGAATGTACAAGTCCATT | first example |
CGATTGAAGTTGTAAGTCCATT | TGAA | CGATTGAATGTTGAAGTCCATT | reverse pattern must be disjoint from forward pattern |
ATTGCACGTTACCTGCAT | CGT | ATTGCACGTCCATTGCAT | reverse pattern only relevant when after forward pattern |
CATGTTACATGTTA | AT | CATTGTACATGTTA | only perform one mutation |
Certainly if your program does not work correctly on those cases, you should reconsider your implementation. But even if you are correct on those cases as well, there may still be other combinations of factors that might cause a problem for some implementations. For this reason, in addition to having each team submit their Python source code, we are asking that each team also develop a series of additional test cases that you think might cause trouble for some implementations. We will then test each submitted program on each submitted test case and give credit in the grading standards both for how well your implementation does when faced with other students' tests, and how well your tests due in exposing flaws in other students' impelementations.
To allow us to more easily execute such a competition, it is imperative that you provide your test cases strictly adhereing to the following specifications. You are to submit your tests in a separate file named tests.txt, which must be saved in plain text format. (We suggest you simply use the IDLE text editor to create this as well.) The contents of the file should be as follows:
This task can be accomplished with careful use of methods of the str class, and possibly the list class. There tend to be two different schools of thought for how to accomplish the underlying string manipulations:
Using either strategy, there is a low-level task of needing to reverse a string. This arises both to reverse the original marker to get its inverted form, and later to reverse the substring between the marker pair. In an ideal world, the str class would support a reverse() method - but alas, no such method exists. However, negative slices can be used to produce a reversal. See the "For the Guru" box on page 55.
All of the following files should be submitted electronically:
dna.py
This file should contain the complete source code for your
implementation.
tests.txt
This plain text file should contain the ten test cases
which you wish to apply to other students' submitted
implementations, as described in the above section
on testing
readme.txt
This file should include the
standard information
requested for all projects. In addition, if you worked as a
pair, please make sure that both partners are clearly identified
and briefly describe the contributions of each person in the effort.
Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.
The assignment is worth 40 points, which will be assessed as follows:
In the advice section we outline two different approaches for cutting apart the string: one based on use of index and slicing, and the other based on the split method. For extra credit, given an alternative impelementation that uses whatever is the opposite approach than you used for your official submission.
So as not to risk losing points on the required part of the assignment due to a failed extra credit attempt, please submit an original version of your assignment in a file dna.py and the separate extra credit version in a file dnaExtra.py.