Computer Science 1300
Introduction to Object-Oriented Programming

Programming Assignment 02

DNA Reversal

Due: 11:59pm, Monday, 6 February 2017

Collaboration Policy

For this assignment, you are allowed to work with one other student if you wish (in fact, we suggest that you do so). If any student wishes to have a partner but has not been able to locate one, please let the instructor know so that we can match up partners.

Please make sure you adhere to the policies on academic integrity in this regard.

Overview

DNA can be modeled as a string of characters using the alphabet A, C, G, and T. One form of DNA mutation occurs when a substring of the DNA is reversed during the replication process. Usually, such a reversal occurs between what are termed inverted pairs, where some substring is followed later by its reversal.

As an example, consider the original DNA strand:

CGATTGAACATGTAAGTCCAATT

This example happens to have an inverted pair, with the original marker being TGAA and its subsequent reversed pair being AAGT as shown below.

CGATTGAACATGTAAGTCCAATT

It is possible that the entire slice of DNA delimited by those patterns could be inverted and reattached, since the bonds at each end will be locally the same. In that case, the resulting mutated DNA would appear as

CGATTGAATGTACAAGTCCAATT

In effect, the 13 character strand has been flipped, noting that the middle 5 characters TGTAC are effectively reversed relative to the original strand.

You are to design a program that works as follows. It should ask the user for an original DNA string as well as the particular pattern that is inverted. It should then locate the leftmost occurrence of that pattern, and the next subsequent occurrence of the inverted pattern. The output should be the mutated DNA, with the segment including the inverted pair reversed. An example session might proceed as follows (where the user input is shown in bold):

Enter a DNA sequence: CGATTGAACATGTAAGTCCATT
Enter the pattern: TGAA
Mutated DNA sequence: CGATTGAATGTACAAGTCCAATT

Formal Specifications

For the sake of this assignment, you may assume that the user enters valid input, defined as follows.

The designated pattern will appear at least once within the original DNA sequence
The original DNA strand will contain at least one occurrence of the reversed pattern that occurs completely after the first occurrence of the forward pattern. (Although there may be additional occurrences of the reversed pattern elsewhere.)

Your program is responsible for performing only the single mutation that occurs between the first occurrence of the pattern, and the next subsequent occurrence of the reversed pattern (that which is completely disjoint from the forward pattern).

You should make sure to test your program on a variety of inputs, not just the one example given above. However, you need not concern yourself with how your program behaves if given errant input that does not meet these formal specifications. (We will need to learn about conditional statements in Ch. 4 in order to write a program that gracefully handles such situations.)

To aid you in your testing, here are some other well-formed test cases that your program should properly handle (not that these are the only tests to consider):

original DNA forward pattern mutated DNA comments

CGATTGAACATGTAAGTCCATT TGAA CGATTGAATGTACAAGTCCATT first example

CGATTGAAGTTGTAAGTCCATT TGAA CGATTGAATGTTGAAGTCCATT reverse pattern must be disjoint from forward pattern

ATTGCACGTTACCTGCAT CGT ATTGCACGTCCATTGCAT reverse pattern only relevant when after forward pattern

CATGTTACATGTTA AT CATTGTACATGTTA only perform one mutation

original DNA	forward pattern	mutated DNA	comments
CGATTGAACATGTAAGTCCATT	TGAA	CGATTGAATGTACAAGTCCATT	first example
CGATTGAAGTTGTAAGTCCATT	TGAA	CGATTGAATGTTGAAGTCCATT	reverse pattern must be disjoint from forward pattern
ATTGCACGTTACCTGCAT	CGT	ATTGCACGTCCATTGCAT	reverse pattern only relevant when after forward pattern
CATGTTACATGTTA	AT	CATTGTACATGTTA	only perform one mutation

Advice

This task can be accomplished with careful use of methods of the str class, and possibly the list class. There tend to be two different schools of thought for how to accomplish the underlying string manipulations:

One approach is to make use of the index method of a string to find the location of the markers, and then to properly identify the indices of the strand between them using string slicing to extract it. Once that is done, it should be possible to calculate the reversal of that intermediate strand, and to piece together the various portions of the DNA for the resulting output.
Another approach is to make extensive use of the split method of the string class, using the marker and later the reverse marker as the pattern upon which you split. Then the pieces that results can be manipulated before reassembling the resulting DNA strand.

Using either strategy, there is a low-level task of needing to reverse a string. This arises both to reverse the original marker to get its inverted form, and later to reverse the substring between the marker pair. In an ideal world, the str class would support a reverse() method - but alas, no such method exists. However, negative slices can be used to produce a reversal. See the "For the Guru" box on page 55.

Submitting Your Assignment

You should create a new file, dna.py, which contains all of your own code. This file must be submitted electronically.

You should also submit a separate 'readme' text file. If you worked as a pair, please make this clear and briefly describe the contributions of each person in the effort.

Please see details regarding the submission process from the general programming web page, as well as a discussion of the late policy.

Grading Standards

The assignment is worth 40 points, which will be assessed as follows:

(5 points) Program correctly takes its input from the user, including appropriate prompts.
(5 points) Program correctly locates the first occurrence of the marker.
(5 points) Program correctly locates the next subsequent occurrence of the reversed marker.
(5 points) Program correctly composes the appropriate mutated DNA sequence.
(5 points) Correctness of the program's behavior on variety of test cases.
(10 points) Readability of the source code, including well chosen variable names and appropriate inline comments
(5 points) Quality of the required 'readme' file

Extra Credit

In the advice section we outline two different approaches for cutting apart the string: one based on use of index and slicing, and the other based on the split method. For extra credit, given an alternative impelementation that uses whatever is the opposite approach than you used for your official submission.

So as not to risk losing points on the required part of the assignment due to a failed extra credit attempt, please submit an original version of your assignment in a file dna.py and the separate extra credit version in a file dnaExtra.py.

Michael Goldwasser

Last modified: Tuesday, 14 February 2017

Saint Louis University

Computer Science 1300
Introduction to Object-Oriented Programming

Michael Goldwasser

Spring 2017

Dept. of Math & Computer Science

Programming Assignment 02

DNA Reversal

Due: 11:59pm, Monday, 6 February 2017

Contents:

Collaboration Policy

Overview

Formal Specifications

Advice

Submitting Your Assignment

Grading Standards

Extra Credit

Computer Science 1300 Introduction to Object-Oriented Programming

Spring 2017

Programming Assignment 02

DNA Reversal

Due: 11:59pm, Monday, 6 February 2017

Contents:

Computer Science 1300
Introduction to Object-Oriented Programming