Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Saint Louis University

Computer Science 1020
Introduction to Computer Science: Bioinformatics

Michael Goldwasser

Spring 2019

Computer Science Department

Homework Assignment 01

DNA Profiling


Overview

Topic: DNA Profiling
Related Reading: none
Due: 11:00am, Friday, 1 February 2019

Please make sure you adhere to the policies on academic integrity.


This homework is motivated by the topic of DNA profiling (see more details at Wikipedia, San Jose Tech Museum, NCBI). However, you do not need to read details on the topic to solve this homework, as we are using the motivation to craft a stand-alone logic puzzle for you to solve. The general framework is that analysis of an individual will produce a series of discrete "markers" and that we assume these markers are inherited using the following rules:

The challenge ahead of you is to serve as a detective and to analyze three different data sets (provided below) for the following settings.

  1. The first data set is from a hypothetical crime scene in which there is a sample of DNA evidence collected, and the DNA profile of several suspects. The evidence collected may contain DNA of several people, but the rule we presume is that if a suspect was at the scene, all markers found in that suspect's DNA sample will also be found at the scene. The goal is to determine which suspect(s) have profiles that are consistent with the collected evidence.

  2. The second data set is from a hypothetical paternity test, in which the DNA profile of the mother and child is given, along with some potential fathers. The goal is to determine which candidate(s) have profiles that are consistent with being the father, given that all of the child's features must be inherited from one or both of its parents.

  3. The third data set is from a hypothetical archaeology site, which is believe to contain data from a family unit. The goal is to reconstruct a family tree that is consistent with the collected DNA. Based on other factors, researchers are fairly confident that this extended family began with two individual who had two children together, and that each of those children married and the couple had two children of their own.

To download YOUR individual data set, you must use your official SLUnet id (e.g. goldwamh), which is not to be confused with your SLU email address nor your Banner ID. The basic form of the URL to download your data set is http://cs.slu.edu/~goldwasser/1020/homeworks/profiling/data/username.pdf except with your SLUnet username in place of username, using entirely lowercase letters. If you wish, enter your username into the form below (lowercased), and we'll take you directly to your data set.


Problems to be Submitted (40 points)

  1. (8 points)

    Which suspect(s) does the crime scene DNA implicate? Briefly explain why.

  2. (8 points)

    Which potential father(s) is consistent with the paternity test data? Briefly explain why.

  3. (24 points)

    Draw a family tree that is consistent with the given data. If you are unable to completely reconstruct such a tree, identify as many consistent parent/parent/child triplets as you were able to find for partial credit.


Examples

Again please note that we have created an individual data set just for you, and you must download it as described above and answer the questions based on your data set. However, for the sake of illustration, we work through a sample data set below.

Crime Scene

Analysis: D is the presumed criminal. First note that all markers for D are found at the crime scene. More so, it seems suspects A, B, C, and E are cleared by this analysis as follows:

Paternity Test

Analysis: B is the only choice as father that is consistent with the data set. First note that all markers of the child could be found in one or both parents if Mom and B were the parents. The other choices are inconsistent for the following reasons: (That said, seems this wasn't the most interesting of data sets...)

Family Tree Reconstruction

The following family tree is consistent with the data set.

      C   G
      |\ /|
      | X |
      |/ \|
  J   A   I   F
  |\ /|   |\ /|
  | X |	  | X |
  |/ \|	  |/ \|
  D   E   H   B
That is, A and I are both childen of couple C+G. D and E are both children of couple J+A, and H and B are both children of couple I+F.

Note well: while your data set is different, we guarantee that there is a solution that is consistent with this same "shape" of a family tree.


Michael Goldwasser
CSCI 1020, Spring 2019
Last modified: Monday, 28 January 2019
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring