Saint Louis University |
Computer Science 1020
|
Computer Science Department |
For this assignment, you must work alone. Please make sure you adhere to the policies on academic integrity in this regard.
Topic: Sequence Assembly Algorithms
Related Reading: notes from JHU (
Overlap graphs,
De Bruijn graphs)
Due:
2:10pm, Friday, 4 May 2018
(11 points)
The JHU
notes on the OLC algorithm, and
this Langmead video,
introduces the concept of an overlap graph for a
set of reads. (Such as the example that is on page 20 of
those notes, even though the pages are not explicitly numbered.)
Illustrate the overlap graph that results from the following
reads, including all edges that represent an overlap of 4 or greater.
{
AGCAGG,
AGGCAG,
CAGGCA,
GAGCAG,
GCAGCA,
GCAGGC,
GGCAGC
}
(9 points)
Starting on page 21 of the
JHU
OLC slides is a discussion that some edges are
redundant because they can be transitively inferred from other
edges. While there are many edges in the previous graph that
represent an overlap of 4, many of those are redundant by this definition.
Identify the three edges representing overlaps of 4 that are not redundant.
(10 points)
This Langmead video, and what is implicitly
page 5 of the JHU
notes on De Bruijn graphs, shows an example of a
De Bruijn graph that is built from a set of 3-mers.
Illustrate the De Bruijn graph that results from the
following 4-mers:
(10 points)
Although the pages are not visually numbered, what is implicitly
be page 10 of the JHU
notes on De Bruijn graphs defines a Eulerian walk of a
directed graph, and page 21 demonstrates how a Eulerian
walk of a De Bruijn graph implies a potential assembly of
the original k-mers by overlapping the pieces represented by the
nodes in order.
Your De Bruijn graph from the previous question should have 8 edges, and two possible Eulerian walks. Give the assemblies that would be implied by each of those two Eulerian walks.