Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit

Saint Louis University

Computer Science 1020
Introduction to Computer Science: Bioinformatics

Michael Goldwasser

Spring 2018

Computer Science Department

Lab 06

Topic: Sorting by Reversals
Collaboration Policy: The lab should be completed working in pairs
Submission Deadline:    11:59pm Friday, 27 April 2018

Contents:


Overview

We are going to implement a simple version of an algorithm for sorting by reversals, that is guaranteed to produce a solution that uses at most twice as many reversals as is necessary. For convenience, our experiments will always build initial sequences that keep 0 at the far left and that keeps the maximum value at the far right. (This helps, because then you can write loops that assume there is always an element before or after any meaningful location.)

We use the following form of the algorithm.

while there remain one or more breakpoints:
        locate a potential strip [a,b] whose reversal would remove the maximum number of breakpoints
        if that reversal would remove 1 or more breakpoints:
                perform the reversal
        else:
                there must not be any decreasing subsequences
                find the first increasing strip beyond the one starting with 0
                reverse that strip

We have already implemented this algorithm and a framework for creating random experiments and display a trace of the executing algorithm. However, we have stripped away implementations of four key utility functions which you must implement.


Your Tasks

The four required functions you must implment are as follows:


Files You Need

We are providing you the framework as file reversals.py. You will need to add to the code in that file.


Submitting Your Assignment

One member of your partnership should electronically submit your modified file reversals.py. The comments at the beginning of the file should clearly identify the member(s) of the partnernship.


Grading Standards

The assignment is worth 10 points, which will be assessed as follows:


An Example


For sake of reference, here is the complete output of our program when executing experiment(25, 15, 8).
Creating initial random pattern:
performing inversion for a=5, b=23 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------========================================================---
data[k]:  0  1  2  3  4 23 22 21 20 19 18 17 16 15 14 13 12 11 10  9  8  7  6  5 24
There are 2 breakpoints in this pattern

performing inversion for a=3, b=18 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------===============================================------------------
data[k]:  0  1  2 10 11 12 13 14 15 16 17 18 19 20 21 22 23  4  3  9  8  7  6  5 24
There are 4 breakpoints in this pattern

performing inversion for a=2, b=8 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------====================------------------------------------------------
data[k]:  0  1 15 14 13 12 11 10  2 16 17 18 19 20 21 22 23  4  3  9  8  7  6  5 24
There are 6 breakpoints in this pattern

performing inversion for a=22, b=23 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------------------------------------------------------------------=====---
data[k]:  0  1 15 14 13 12 11 10  2 16 17 18 19 20 21 22 23  4  3  9  8  7  5  6 24
There are 7 breakpoints in this pattern

performing inversion for a=15, b=19 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------------------------==============---------------
data[k]:  0  1 15 14 13 12 11 10  2 16 17 18 19 20 21  9  3  4 23 22  8  7  5  6 24
There are 9 breakpoints in this pattern

performing inversion for a=10, b=17 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------------------------------=======================---------------------
data[k]:  0  1 15 14 13 12 11 10  2 16  4  3  9 21 20 19 18 17 23 22  8  7  5  6 24
There are 10 breakpoints in this pattern

performing inversion for a=5, b=20 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------===============================================------------
data[k]:  0  1 15 14 13  8 22 23 17 18 19 20 21  9  3  4 16  2 10 11 12  7  5  6 24
There are 12 breakpoints in this pattern

performing inversion for a=2, b=7 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------=================---------------------------------------------------
data[k]:  0  1 23 22  8 13 14 15 17 18 19 20 21  9  3  4 16  2 10 11 12  7  5  6 24
There are 12 breakpoints in this pattern

performing inversion for a=1, b=7 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---====================---------------------------------------------------
data[k]:  0 15 14 13  8 22 23  1 17 18 19 20 21  9  3  4 16  2 10 11 12  7  5  6 24
There are 13 breakpoints in this pattern

performing inversion for a=9, b=22 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------=========================================------
data[k]:  0 15 14 13  8 22 23  1 17  5  7 12 11 10  2 16  4  3  9 21 20 19 18  6 24
There are 15 breakpoints in this pattern

performing inversion for a=9, b=11 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------========---------------------------------------
data[k]:  0 15 14 13  8 22 23  1 17 12  7  5 11 10  2 16  4  3  9 21 20 19 18  6 24
There are 16 breakpoints in this pattern

performing inversion for a=6, b=23 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------------------=====================================================---
data[k]:  0 15 14 13  8 22  6 18 19 20 21  9  3  4 16  2 10 11  5  7 12 17  1 23 24
There are 16 breakpoints in this pattern

performing inversion for a=2, b=16 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------============================================------------------------
data[k]:  0 15 10  2 16  4  3  9 21 20 19 18  6 22  8 13 14 11  5  7 12 17  1 23 24
There are 18 breakpoints in this pattern

performing inversion for a=9, b=19 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------================================---------------
data[k]:  0 15 10  2 16  4  3  9 21  7  5 11 14 13  8 22  6 18 19 20 12 17  1 23 24
There are 19 breakpoints in this pattern

performing inversion for a=8, b=19 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------------------------===================================---------------
data[k]:  0 15 10  2 16  4  3  9 20 19 18  6 22  8 13 14 11  5  7 21 12 17  1 23 24
There are 19 breakpoints in this pattern


==============================
Time to solve...
Able to remove 2 breakpoints
performing inversion for a=2, b=15 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------=========================================---------------------------
data[k]:  0 15 14 13  8 22  6 18 19 20  9  3  4 16  2 10 11  5  7 21 12 17  1 23 24
There are 17 breakpoints in this pattern

Able to remove 2 breakpoints
performing inversion for a=5, b=18 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------=========================================------------------
data[k]:  0 15 14 13  8  7  5 11 10  2 16  4  3  9 20 19 18  6 22 21 12 17  1 23 24
There are 15 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=1, b=9 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---==========================---------------------------------------------
data[k]:  0  2 10 11  5  7  8 13 14 15 16  4  3  9 20 19 18  6 22 21 12 17  1 23 24
There are 14 breakpoints in this pattern

Able to remove 2 breakpoints
performing inversion for a=2, b=12 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ------================================------------------------------------
data[k]:  0  2  3  4 16 15 14 13  8  7  5 11 10  9 20 19 18  6 22 21 12 17  1 23 24
There are 12 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=1, b=21 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---==============================================================---------
data[k]:  0 17 12 21 22  6 18 19 20  9 10 11  5  7  8 13 14 15 16  4  3  2  1 23 24
There are 11 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=1, b=5 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---==============---------------------------------------------------------
data[k]:  0  6 22 21 12 17 18 19 20  9 10 11  5  7  8 13 14 15 16  4  3  2  1 23 24
There are 10 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=1, b=11 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---================================---------------------------------------
data[k]:  0 11 10  9 20 19 18 17 12 21 22  6  5  7  8 13 14 15 16  4  3  2  1 23 24
There are 9 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=1, b=7 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---====================---------------------------------------------------
data[k]:  0 17 18 19 20  9 10 11 12 21 22  6  5  7  8 13 14 15 16  4  3  2  1 23 24
There are 8 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=1, b=22 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---=================================================================------
data[k]:  0  1  2  3  4 16 15 14 13  8  7  5  6 22 21 12 11 10  9 20 19 18 17 23 24
There are 7 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=11, b=12 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------------=====------------------------------------
data[k]:  0  1  2  3  4 16 15 14 13  8  7  6  5 22 21 12 11 10  9 20 19 18 17 23 24
There are 6 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=5, b=12 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------=======================------------------------------------
data[k]:  0  1  2  3  4  5  6  7  8 13 14 15 16 22 21 12 11 10  9 20 19 18 17 23 24
There are 5 breakpoints in this pattern

Able to remove 2 breakpoints
performing inversion for a=13, b=22 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------------------=============================------
data[k]:  0  1  2  3  4  5  6  7  8 13 14 15 16 17 18 19 20  9 10 11 12 21 22 23 24
There are 3 breakpoints in this pattern

Able to remove 0 breakpoints
performing inversion for a=9, b=16 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------=======================------------------------
data[k]:  0  1  2  3  4  5  6  7  8 20 19 18 17 16 15 14 13  9 10 11 12 21 22 23 24
There are 3 breakpoints in this pattern

Able to remove 1 breakpoint
performing inversion for a=9, b=20 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------===================================------------
data[k]:  0  1  2  3  4  5  6  7  8 12 11 10  9 13 14 15 16 17 18 19 20 21 22 23 24
There are 2 breakpoints in this pattern

Able to remove 2 breakpoints
performing inversion for a=9, b=12 results in
      k:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
         ---------------------------===========------------------------------------
data[k]:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
There are 0 breakpoints in this pattern

Used 15 reversals

Extra Credit

There is an unnecessary inefficiency in our implementation of the overall algorithm for computing the series of reversals. In particular, our part of the code contains a function, find_best_reversal(data), that is tasked with finding a reversal [a,b] that fixes the greatest number of breakpoints for the current data set. Our naive implementation simply tries every possible [a,b] pair with a < b, and thus for a sequence of length n, this process requires time that is quadratic in n.

It is possible to reimplement this function so that it only requires linear time per call. Rather than consider every possible [a,b] pair, we wish to be more refined, by noting that the only way a reversal will reduce the number of breakpoints is if it brings together some pair of consecutive values, v and v+1, that were not previously neighboring each other. So we can do our search for ranges [a,b] by instead considering every possible pair of consecutive values v and v+1, consider where those two values are currently located in the data, and then the two possible reversals that will bring them next to each other (one that moves v toward v+1 and one that moves v+1 next to v).

So we look for the best possible [a,b] as follows.
for each value v:
        locate both v and v+1 in the data set.
        if not already neighboring:
                consider the two reversals that bring them together as neighbors.
                if either reversal is the best so far, remember it.

Nominally, this algorithm is not immediately an improvement. Although there are only linearly many breakpoints to consider, instead of quadratic, if we need to use a loop each time we have to loate v and v+1 in the data set, the overall process of finding the best reversal will still be quadratic. But the second improvement is that we can spend linear time pre-processing to build a reverse index identifying where each value v can be found. Specifically, we can build a second list so that expression loc[v] is the index of where v is found within the current data. We build this with a single loop that goes through every index j and sets loc[data[j]] = j.

Your goal is to take these strategies and provide a new implementation of find_best_reversal(data) that takes linear time rather than quadratic time.


Michael Goldwasser
CSCI 1020, Spring 2018
Last modified: Tuesday, 01 May 2018
Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit