Saint Louis University |
Computer Science 1020
|
Computer Science Department |
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.
Please make sure you adhere to the policies on academic integrity in this regard.
In this project, we explore the existence of inverted repeats, which is when some particular motif is later followed in a DNA strand by its reverse complement. In three dimensions, such a single strand might recombine in a loop, with the motif binding with its inverted repeat, forming what is known as a transposable element which often are a source of mutations.
As an example, consider the single strand
...ATC GGTA... C G G C A T A T T G ACGIf that loop were to be disconnected for some reason, the result mutation would result in the remaining ATCGGTA.
Your task is to author a function,
Look for the leftmost occurrence of the original motif. If none is found, then return the original dna unmodified.
Starting after the end of the original motif, look for the next subsequent occurrence of the reverse complement of that motif (the so called inverted repeat). If no such occurrence is found, return the original dna unmodified.
Compute and return the mutated DNA that results from removing the motif, its inverted repeat, and the zero or more characters that might be found between them.
Your function is only responsible for performing a mutation that occurs between the first occurrence of the pattern, and the next subsequent occurrence of the inverted repeat (that which is completely disjoint from the forward motif).
A very important aspect of software development is thorough testing. We gave one example in the introduction, but just because your code works correctly on that test does not mean that it works correctly in general. It is important that you consider the variety of allowable inputs that your program might encounter, and to make sure to express your logic in a general enough way so that it works for all such inputs.
For example, here are a few more well-formed tests cases to consider:
original DNA | motif | resulting DNA | comments |
---|---|---|---|
ATCCGAATACGGTTCGGGTA | CGAA | ATCGGTA | first example |
ATCCGAATACGGTTGGGTA | CGAA | ATCCGAATACGGTTGGGTA | no inverted repeat exists |
ATCCAATACGGTTCGGGTA | CGAA | ATCCAATACGGTTCGGGTA | original motif not found |
ATCCGAATACGGTTCGGGTTCGA | CGAA | ATCGGTTCGA | if multiple inverted repeats exists, match with the next subsequent one |
AGTCACATGATCAGT | C | AGTATCAGT | The motif can be any non-empty string |
Certainly if your program does not work correctly on these cases, you should reconsider your implementation. But even if you are correct on those cases as well, there may still be other combinations of factors that might cause a problem for some implementations. For this reason, in addition to having each team submit their Python source code, we are asking that each team also develop a series of additional test cases that you think might cause trouble for some implementations. We will then test each submitted program on each submitted test case and give credit in the grading standards both for how well your implementation does when faced with other students' tests, and how well your tests due in exposing flaws in other students' impelementations.
To allow us to more easily execute such a competition, it is imperative that you provide your test cases strictly adhereing to the following specifications. You are to submit your tests in a separate file named tests.txt, which must be saved in plain text format. (We suggest you simply use the IDLE text editor to create this as well.) The contents of the file should be as follows:
To get you going, we are providing three files, which you may either download indivdiually, or combined as this zip file to be unpacked. The three files are
mutate.py
This is the file in which you must place your code. For
convenience, we have already provided an implmentation of the
reverse_complement function that was originally given
in lecture
notes.
We have begun to define the required mutate(dna,motif)
function but the body of that function is your responsibility.
runtests.py
This is a file we are providing for the ease of running multiple
tests cases and echoing the results. This is the one you should
"Run" in IDLE to execute the program. (But you need not concern
yourself with understanding the source code. I'll explain if you
want, but I'd prefer not to!)
tests.txt
This is a model of the file format for providing test cases, and
it includes the original five test cases described in this
project description. You are eventually to edit this file and
provide ten of your own tests, with the goal of coming up with
the most devious tests that are likely to trip up other
students' implementations.
If you are already working on our department's computer system and prefer to copy these files using commandline techniques, you may execute the following command from whatever working directory you'd like them place:
cp -Rp /public/goldwasser/1020/homeworks/mutate .
This task can be accomplished with careful use of methods of the str class (and possibly the list class). There tend to be two different schools of thought for how to accomplish the underlying string manipulations:
One approach is to make use of the find method of a string to locate the motif and inverted repeat, and then to properly splice out the mutation using appropriate indexes to produce the result. Conditionals will be needed to recognize when a motif wasn't located and to respond in kind.
Another approach is to make extensive use of the split method of the string class, using the motif and later the inverted repeat as the pattern upon which you split. Then the pieces that results can be manipulated before reassembling the resulting DNA strand.
We suggest using the first of these approaches, and offer extra credit to anyone willing to explore by implementing both approaches.
All of the following files should be submitted electronically:
mutate.py
This file should contain the complete source code for your
implementation.
tests.txt
This plain text file should contain the ten test cases
which you wish to apply to other students' submitted
implementations, as described in the above section
on testing
readme.txt
With all of our programming projects, we ask that you submit one
additional text file, named readme.txt that allows you
to briefly discuss any successes or challenges you faced while
working on the project.
In addition, if you worked as a
pair, please make sure that both partners are clearly identified
and briefly describe the contributions of each person in the
effort.
Please also note the late policy for homeworks.
The assignment is worth 40 points, which will be assessed as follows:
In the advice section we outline two different approaches for cutting apart the string: one based on use of index and slicing, and the other based on the split method. For extra credit, given an alternative impelementation that uses whatever is the opposite approach than you used for your official submission.
So as not to risk losing points on the required part of the assignment due to a failed extra credit attempt, please submit an original version of your assignment in a file mutate.py and the separate extra credit version in a file mutateExtra.py.