Saint Louis University |
Computer Science 1020
|
Computer Science Department |
For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.
Please make sure you adhere to the policies on academic integrity in this regard.
While it would be great if biological sequences always matched precisely what we expected, there are many context where we need to be willing to consider an approximate match for a motif pattern. For this assignment, we will have you develop an alternative version of the typical "find" behavior for Python strings, but instead of insisting on finding a perfect match, we will allow for a match that has up to k replacements (or what are known as Single Nucleotide Polymorphisms, or SNPs, in genetics).
As an example, consider searching for the motif CAT in the strand GTACGTACATT. If looking for the leftmost occurence of the exact pattern, we would located it beginning at index 7.
01234567890 GTACGTACATTHowever, if we were willing to allow up to 1 SNP, we would find the first approximate match starting at index 3.
01234567890 GTACGTACATT CATand if we were to allow up to 2 SNPs, then the first approximate match starts at index 1
01234567890 GTACGTACATT CAT
Your task is to implement a function
However, when writing code for more complex logic, it is often helpful to have a modular design in which you define additional functions that encapsulate certain subtasks. For this program, we wish to have you define two different functions.
The first function,
With that function completed, you should then have an easier time
implementing the desired
It will be important to test your code
on a variety of scenarios that could arise. For example, here are a
handful of cases to consider:
strand | pattern | k | result | comment |
---|---|---|---|---|
GTACGTACATT | CAT | 0 | 7 | |
GTACGTACATT | CAT | 1 | 3 | |
GTACGTACATT | CAT | 2 | 1 | |
GTCATTACAGT | CAT | 2 | 2 | okay to have fewer than k SNPs |
TACGTAAATT | CTAG | 2 | 3 | |
TACGTAAATT | CTAG | 1 | -1 | no sufficient match |
As with the previous homework, we are providing a mechanism for you to automate a series of tests by specifying those tests in a simple text file, with lines of the form
strand pattern kFor example, a test file that encodes the six tests given above can be found as tests.txt.
As part of this homework assignment, you must submit not only your Python implementation, but also your own set of up to ten tests. We will then test each submitted program on each submitted test case and give credit in the grading standards both for how well your implementation does when faced with other students' tests, and how well your tests due in exposing flaws in other students' impelementations.
To get you going, we are providing three files, which you may either download indivdiually, or combined as this zip file to be unpacked. The three files are
findApprox.py
This is the file in which you must place your code.
We have begun to define both the isApproxMatch function
and the findApprox function.
tests.txt
This is a model of the file format for providing test cases, and
it includes the original five test cases described in this
project description. You are eventually to edit this file and
provide ten of your own tests, with the goal of coming up with
the most devious tests that are likely to trip up other
students' implementations.
If you are already working on our department's computer system and prefer to copy these files using commandline techniques, you may execute the following command from whatever working directory you'd like them place:
cp -Rp /public/goldwasser/1020/homeworks/findApprox .
All of the following files should be submitted electronically:
findApprox.py
This file should contain the complete source code for your
implementation.
tests.txt
This plain text file should contain the ten test cases
which you wish to apply to other students' submitted
implementations, as described in the above section
on testing
readme.txt
With all of our programming projects, we ask that you submit one
additional text file, named readme.txt that allows you
to briefly discuss any successes or challenges you faced while
working on the project.
In addition, if you worked as a
pair, please make sure that both partners are clearly identified
and briefly describe the contributions of each person in the
effort.
To submit, please follow the instructions on our submit system, using the website password that you indicated when completing the course questionnaire. Please also note the late policy for homeworks.
The assignment is worth 40 points, which will be assessed as follows: