Saint Louis University |
Computer Science 150
|
Dept. of Math & Computer Science |
One form of DNA mutation occurs when a substring of the DNA is reversed during the replication process. Most often, such a reversal occurs between what are termed inverted pairs.
As an example, consider an original DNA strand represented by the following sequence of nucleic acids, AAGTACCGTCATGATTGAGTCTAGCGGATACTGAACGTAT. In this example, there exists an inverted pair based upon the marker TCAT. That is, there is a substring which begins with TCAT and ends with the reversed form of that marker, namely TACT. Because of the similarity between the inverted pair, it occasionally happens that the intermediate substring is itself reversed. In the example above, such a reversal results in mutating the original DNA strand, AAGTACCGTCATGATTGAGTCTAGCGGATACTGAACGTAT into the resulting strand AAGTACCGTCATAGGCGATCTGAGTTAGTACTGAACGTAT.
Our goal of this assignment is to simulate one such DNA reversal.
Each assignment in this course will aim to introduce several new techniques, as well as in reinforcing techniques introduced in earlier assignments.
With this being our first assignment, everything is new. Specific techniques to be used are:
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.
Please make sure you adhere to the policies on academic integrity in this regard.
When your program begins, the user is expected to identify both the marker which makes up the left part of the inverted pair, as well as the full DNA strand. For this assignment, you are to assume that the user enters these two pieces of information on a single line, separated by a comma. Thus for the earlier example from above, the user would input:
TCAT,AAGTACCGTCATCATTGAGTCTAGCGGATACTGAACGTATTo read this initial input from the user, there is a Python command raw_input() which will return the user's input as a string object. You may assume that the specified marker indeed occurs in the given DNA, first in the original order, and later in the inverted order.
We ask that your program accomplish two separate tasks. The first is to print the overall DNA strand which results based upon the performance of a reversal as described above.
We also ask that you perform one further computation on the resulting DNA strand. The "GC-content" of a DNA strand is the percentage of the characters which are either G or C (as opposed to A or T). Notice that the reversal does not alter the GC-content. For the above example, the GC-content is 42.5%, as there are seventeen occurrences of G or C out of the 40 overall characters. Your program should conclude by informing the user of the GC-content of the given DNA.
In some sense, this would be an easier assignment later in the course when we have more tools available in our repertoire. Yet we already know enough to accomplish this task if we carefully use the methods of both the str class and the list class.
Though the two relevant pieces of input are originally inputted by the user as a single line, it is not too difficult to break that apart into the two relevant pieces (the marker and the original DNA). This can be accomplished either using the split method of the str class, or by locating the comma using the index method and then using the ability to splice a portion of a string based on the desired starting and ending indices.
Another challenge is the need to reverse strings. You will need to reverse the original marker to get its inverted form and later to reverse the substring between the marker pair. In an ideal world, the str class would support a reverse() method - but alas, no such method exists. Yet we can accomplish such a reversal as follows. We can take a string and turn it into a list of individual characters by sending the string as a parameter to the list constructor. The list class does support a reverse() method, so we can reverse the list of characters, and then join then back together to get the reversed string.
The other big challenge is to properly identified the portion of the DNA between the marker pair. Again, there is more than one way to accomplish this, but the most likely approach to take is to use the index method of a string to find the location of the markers and then to properly identify the strand between then. Once that is done, it should be possible to calculate the reversal of that intermediate strand, and to piece together the various portions of the DNA for the resulting output.
The calculation of the GC-content should be more straight forward, given the use of the count method of the str class. The biggest hurdle here may be in getting the percentage to appear nicely as output.
You should create a new file, dna.py, which contains all of your own code. This file must be submitted electronically.
You should also submit a separate 'readme' text file. If you worked as a pair, please make this clear and briefly describe the contributions of each person in the effort.
Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.
The assignment is worth 10 points.
Though we have suggested that the most natural solution to this assignment, might be based upon use of the index (or equivalently, the find) method of the str class.
Even without use of the index or find methods, it is still possible to accomplish the desired task based only upon the use of the other str and list methods we have discussed. For extra credit, give such a solution to this assignment.
So as not to risk losing points on the required part of the assignment due to a failed extra credit attempt, please submit an original version of your assignment in a file dna.py and the separate extra credit version in a file dnaExtra.py.