Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Saint Louis University

Computer Science 1020
Introduction to Computer Science: Bioinformatics

Michael Goldwasser

Spring 2019

Computer Science Department

Homework Assignment 04

Smith-Waterman Algorithm

Contents:


Collaboration Policy

For this assignment, you must work alone. Please make sure you adhere to the policies on academic integrity in this regard.


Overview

Topic: Smith-Waterman Algorithm
Related Reading: pp. 50-51 and 57 of text and Wikipedia
Due: 11:59pm, Monday, 25 February 2019

The Needleman-Wunsch algorithm produces an optimal global pairwise alignment for a given metric, in that the entirety of two sequences must be aligned. A downside to doing such global alignment is that there might be two sequences that have some strong commonalities (e.g. conserved genes), but with one or both sequences having mutations that caused large portions of the sequence to be inserted, duplicated, or otherwised changed and so the global alignment score might not reflect the local similarities as strongly because the negative contributions of the rest of the sequence.

The Smith-Waterman algorithm is a varient which instead computes the optimal local pairwise alignment. The goal is to allow for finding an alignment between any portions of the two original sequences that demonstrate the strongest alignment. While this seems to require a different approach, as there are many possible portions that might be considered, it turns out that the optimal local alignment can be computed with the Smith-Waterman algorithm, which is very similar in design to the Needleman-Wunsch algorithm.

The algorithmic changes are as follows:

  1. When completing the table, we do not allow for any entries to become negative. That is, while there still may be negative contributions from gap penalties or mismatch scores, if the overall entry were to become negative, it should be set to zero.

  2. In line with the above rule, the top row and leftmost column are set to all zeros (rather than to the negative gap penalties used in the Needleman-Wunsch algorithm).

  3. Rather than considering only the bottom-right entry of the completed table (which for Needlman-Wunsch represents the optimal global alignment score for the full sequences), we are interested for the largest entry that occurs anywhere within the table. That is the entry that defines the optimal local alignment score.

  4. To reconstruct the actual alignment acheiving the optimal score, we begin at the cell of the table in which that score is found. However, rather than necessarily tracing that square back to the top-left corner, we perform the reverse engineering step until the first time that we reach a cell that has a value of zero.

Further discussion of the algorithm can be found in the Wikipedia article, which includes a detailed example for illustration.


Your Task

Your task is to adapt our original implementation of the Needleman-Wunsch algorithm, to produce a working implementation of the Smith-Waterman algorithm. The hope is that if you understand how the original Python code implements the Needleman-Wunsch algorithm, then you will be able to focus on the relatively few places where that code must be adjusted to produce the Smith-Waterman algorithm.

If you have any questions about how the original code works, feel free to ask!


Files You Need

We are providing two files (available individually or as a combined zip file):


Submitting Your Assignment Electronically

You should submit two files to the appropriate folder in our git repository. If working with a partner, only one of you needs to submit these files.


Grading Standards

This assignment is worth 40 points, apportioned as follows:


Michael Goldwasser
CSCI 1020, Spring 2019
Last modified: Tuesday, 12 March 2019
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring