Saint Louis University |
Computer Science 1020
|
Computer Science Department |
Topic: | Sorting by Reversals |
Collaboration Policy: | The lab should be completed working in pairs |
Submission Deadline: | 11:59pm Monday, 29 April 2019 |
We are going to implement a simple version of an algorithm for sorting by reversals, that is guaranteed to produce a solution that uses at most twice as many reversals as is necessary. For convenience, our experiments will always build initial sequences that keep 0 at the far left and that keeps the maximum value at the far right. (This helps, because then you can write loops that assume there is always an element before or after any meaningful location.)
We use the following form of the algorithm.
while there remain one or more breakpoints:
locate a potential strip [a,b] whose reversal would remove the maximum number of breakpoints
if that reversal would remove 1 or more breakpoints:
perform the reversal
else:
there must not be any decreasing subsequences
find the first increasing strip beyond the one starting with 0
reverse that strip
We have already implemented this algorithm and a framework for creating random experiments and display a trace of the executing algorithm. However, we have stripped away implementations of four key utility functions which you must implement.
The four required functions you must implment are as follows:
perform_reversal(data, a, b) function implementation
Give a list of values, data, and indices a and
b, you must produce and return a new list that reflects
a reversal of all the values from index a to b
inclusive. For example, a call to
perform_reverals([0, 5, 4, 1, 3, 2, 6], 2, 5)
should return the list [0, 5, 2, 3, 1, 4, 6]
Note: We use your implementation of this function both during the algorithm but also to produce the original shuffled input. So as soon as you have this implemented, you should be able to verify that it is working based on how it shuffles the input.
count_breakpoints(data) function implementation
This should return the number of "breakpoints", which are
defined as numbers that are neighbors to each other but with
values that are 2 or more away from each other. For example,
a call to
count_breakpoints([0, 5, 4, 2, 3, 1, 6, 7])
should return 4 because of the breakpoint between 0 and 5,
between 4 and 2, between 3 and 1, and between 1 and 6.
Note: We will use this during the algorithm, mostly to track our progress, but also to know when we have reached a solution with zero breakpoints.
count_breakpoints_fixed(data, a, b) function implementation
This should return the number of breakpoints that would be
removed if we were to perform a reversal of section [a,b].
For example, a call to
count_breakpoints_fixed([0, 5, 4, 2, 3, 1, 6, 7], 3, 4)
should return 2 because reversing the indicated section
(changing the 2,3 to 3,2) would resolve two different
breakpoints, as the 3 would now be next to the 4, and the 2 next
to the 1.
Note: This function is key in our loop that tries to identify which potential reversal fixes the most breakpoints. Formally, it might be that a reversal doesn't fix any breakpoints, but in fact creates new breakpoints that weren't there. While you could consider those and report a negative number of "fixed" breakpoints, it will suffice if you just count the truly fixed ones.
find_increasing_strip(data) function implementation
Under an assumption that the entire sequence is composed of
increasing strips, you are to identify the indices a and b that
define the second of those strips (that is, the one after that
which starts with 0).
For example, a call to
find_increasing_strip[0, 1, 2, 5, 6, 7, 3, 4, 8])
should return the index pair (3,5) because those indices identify
the extent of the strip holding values 5,6,7.
Note: This last function is only triggered in special case when there is no way to do a reversal that fixes a breakpoints (which implies that there are no decreasing strips). Since we pick random shuffles by default, it may be that some excutions of the program trigger this function, while others do not. So you might get lucky and solve the problem even before implementing this function properly. But you should test on some cases that do trigger this function.
The (pseudo)random number generator for Python uses an algorithm
that is generally the same on all platforms across the same
Python versions, though there is no such guarantee. With that
said, for Python 3.6, the following parameters should cause the
experiment to trigger the special case (in fact, the following
each trigger that special case twice):
experiment(25, 15, 98)
experiment(25, 15, 335)
experiment(25, 15, 363)
experiment(25, 15, 631)
We are providing you the framework as file reversals.py. You will need to add to the code in that file.
One member of your partnership should electronically submit your modified file reversals.py. The comments at the beginning of the file should clearly identify the member(s) of the partnernship.
The assignment is worth 25 points, which will be assessed as follows:
Creating initial random pattern: performing inversion for a=5, b=23 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------========================================================--- data[k]: 0 1 2 3 4 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 24 There are 2 breakpoints in this pattern performing inversion for a=3, b=18 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------===============================================------------------ data[k]: 0 1 2 10 11 12 13 14 15 16 17 18 19 20 21 22 23 4 3 9 8 7 6 5 24 There are 4 breakpoints in this pattern performing inversion for a=2, b=8 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------====================------------------------------------------------ data[k]: 0 1 15 14 13 12 11 10 2 16 17 18 19 20 21 22 23 4 3 9 8 7 6 5 24 There are 6 breakpoints in this pattern performing inversion for a=22, b=23 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------------------------------------------------------------------=====--- data[k]: 0 1 15 14 13 12 11 10 2 16 17 18 19 20 21 22 23 4 3 9 8 7 5 6 24 There are 7 breakpoints in this pattern performing inversion for a=15, b=19 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------------------------==============--------------- data[k]: 0 1 15 14 13 12 11 10 2 16 17 18 19 20 21 9 3 4 23 22 8 7 5 6 24 There are 9 breakpoints in this pattern performing inversion for a=10, b=17 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------------------------------=======================--------------------- data[k]: 0 1 15 14 13 12 11 10 2 16 4 3 9 21 20 19 18 17 23 22 8 7 5 6 24 There are 10 breakpoints in this pattern performing inversion for a=5, b=20 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------===============================================------------ data[k]: 0 1 15 14 13 8 22 23 17 18 19 20 21 9 3 4 16 2 10 11 12 7 5 6 24 There are 12 breakpoints in this pattern performing inversion for a=2, b=7 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------=================--------------------------------------------------- data[k]: 0 1 23 22 8 13 14 15 17 18 19 20 21 9 3 4 16 2 10 11 12 7 5 6 24 There are 12 breakpoints in this pattern performing inversion for a=1, b=7 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---====================--------------------------------------------------- data[k]: 0 15 14 13 8 22 23 1 17 18 19 20 21 9 3 4 16 2 10 11 12 7 5 6 24 There are 13 breakpoints in this pattern performing inversion for a=9, b=22 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------=========================================------ data[k]: 0 15 14 13 8 22 23 1 17 5 7 12 11 10 2 16 4 3 9 21 20 19 18 6 24 There are 15 breakpoints in this pattern performing inversion for a=9, b=11 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------========--------------------------------------- data[k]: 0 15 14 13 8 22 23 1 17 12 7 5 11 10 2 16 4 3 9 21 20 19 18 6 24 There are 16 breakpoints in this pattern performing inversion for a=6, b=23 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------------------=====================================================--- data[k]: 0 15 14 13 8 22 6 18 19 20 21 9 3 4 16 2 10 11 5 7 12 17 1 23 24 There are 16 breakpoints in this pattern performing inversion for a=2, b=16 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------============================================------------------------ data[k]: 0 15 10 2 16 4 3 9 21 20 19 18 6 22 8 13 14 11 5 7 12 17 1 23 24 There are 18 breakpoints in this pattern performing inversion for a=9, b=19 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------================================--------------- data[k]: 0 15 10 2 16 4 3 9 21 7 5 11 14 13 8 22 6 18 19 20 12 17 1 23 24 There are 19 breakpoints in this pattern performing inversion for a=8, b=19 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------------------------===================================--------------- data[k]: 0 15 10 2 16 4 3 9 20 19 18 6 22 8 13 14 11 5 7 21 12 17 1 23 24 There are 19 breakpoints in this pattern ============================== Time to solve... Able to remove 2 breakpoints performing inversion for a=2, b=15 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------=========================================--------------------------- data[k]: 0 15 14 13 8 22 6 18 19 20 9 3 4 16 2 10 11 5 7 21 12 17 1 23 24 There are 17 breakpoints in this pattern Able to remove 2 breakpoints performing inversion for a=5, b=18 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------=========================================------------------ data[k]: 0 15 14 13 8 7 5 11 10 2 16 4 3 9 20 19 18 6 22 21 12 17 1 23 24 There are 15 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=1, b=9 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---==========================--------------------------------------------- data[k]: 0 2 10 11 5 7 8 13 14 15 16 4 3 9 20 19 18 6 22 21 12 17 1 23 24 There are 14 breakpoints in this pattern Able to remove 2 breakpoints performing inversion for a=2, b=12 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ------================================------------------------------------ data[k]: 0 2 3 4 16 15 14 13 8 7 5 11 10 9 20 19 18 6 22 21 12 17 1 23 24 There are 12 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=1, b=21 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---==============================================================--------- data[k]: 0 17 12 21 22 6 18 19 20 9 10 11 5 7 8 13 14 15 16 4 3 2 1 23 24 There are 11 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=1, b=5 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---==============--------------------------------------------------------- data[k]: 0 6 22 21 12 17 18 19 20 9 10 11 5 7 8 13 14 15 16 4 3 2 1 23 24 There are 10 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=1, b=11 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---================================--------------------------------------- data[k]: 0 11 10 9 20 19 18 17 12 21 22 6 5 7 8 13 14 15 16 4 3 2 1 23 24 There are 9 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=1, b=7 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---====================--------------------------------------------------- data[k]: 0 17 18 19 20 9 10 11 12 21 22 6 5 7 8 13 14 15 16 4 3 2 1 23 24 There are 8 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=1, b=22 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---=================================================================------ data[k]: 0 1 2 3 4 16 15 14 13 8 7 5 6 22 21 12 11 10 9 20 19 18 17 23 24 There are 7 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=11, b=12 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------------=====------------------------------------ data[k]: 0 1 2 3 4 16 15 14 13 8 7 6 5 22 21 12 11 10 9 20 19 18 17 23 24 There are 6 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=5, b=12 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------=======================------------------------------------ data[k]: 0 1 2 3 4 5 6 7 8 13 14 15 16 22 21 12 11 10 9 20 19 18 17 23 24 There are 5 breakpoints in this pattern Able to remove 2 breakpoints performing inversion for a=13, b=22 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------------------=============================------ data[k]: 0 1 2 3 4 5 6 7 8 13 14 15 16 17 18 19 20 9 10 11 12 21 22 23 24 There are 3 breakpoints in this pattern Able to remove 0 breakpoints performing inversion for a=9, b=16 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------=======================------------------------ data[k]: 0 1 2 3 4 5 6 7 8 20 19 18 17 16 15 14 13 9 10 11 12 21 22 23 24 There are 3 breakpoints in this pattern Able to remove 1 breakpoint performing inversion for a=9, b=20 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------===================================------------ data[k]: 0 1 2 3 4 5 6 7 8 12 11 10 9 13 14 15 16 17 18 19 20 21 22 23 24 There are 2 breakpoints in this pattern Able to remove 2 breakpoints performing inversion for a=9, b=12 results in k: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ---------------------------===========------------------------------------ data[k]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 There are 0 breakpoints in this pattern Used 15 reversals
There is an unnecessary inefficiency in our implementation of the overall algorithm for computing the series of reversals. In particular, our part of the code contains a function, find_best_reversal(data), that is tasked with finding a reversal [a,b] that fixes the greatest number of breakpoints for the current data set. Our naive implementation simply tries every possible [a,b] pair with a < b, and thus for a sequence of length n, this process requires time that is quadratic in n.
It is possible to reimplement this function so that it only requires linear time per call. Rather than consider every possible [a,b] pair, we wish to be more refined, by noting that the only way a reversal will reduce the number of breakpoints is if it brings together some pair of consecutive values, v and v+1, that were not previously neighboring each other. So we can do our search for ranges [a,b] by instead considering every possible pair of consecutive values v and v+1, consider where those two values are currently located in the data, and then the two possible reversals that will bring them next to each other (one that moves v toward v+1 and one that moves v+1 next to v).
So we look for the best possible [a,b] as follows.
for each value v:
locate both v and v+1 in the data set.
if not already neighboring:
consider
the two reversals that bring them together as neighbors.
if
either reversal is the best so far, remember it.
Nominally, this algorithm is not immediately an improvement. Although
there are only linearly many breakpoints to consider, instead of
quadratic, if we need to use a loop each time we have to loate
v and v+1 in the data set, the overall process of
finding the best reversal will still be quadratic. But the second
improvement is that we can spend linear time pre-processing to build a
reverse index identifying where each value v can be
found. Specifically, we can build a second list so that expression
loc[v] is the index of where v is found within the
current data. We build this with a single loop that goes through every
index j and sets
Your goal is to take these strategies and provide a new implementation of find_best_reversal(data) that takes linear time rather than quadratic time.