Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit

Open Reading Frames (ORFs) and Genes


Additional Readings/Resources


Introduction

One important goal in genetic analysis is to take a successfully sequenced genomes and try to better differentiate "coding" and "noncoding" portions of the DNA. We are especially interested in protein-coding genes within a DNA sequence.

For prokaryotes, it is estimated that about 80% of the DNA is coding, but for eukaryotes it is often that only 1-3% of the DNA is coding.


Open Reading Frames (ORFs)

While strings are convenient for representing a sequence of characters, Python allows for representation of sequences of arbitrary types of data as well. The primary structure for such a sequence is a Python list.

Recall from the central dogma that coding regions of DNA are convert to RNA and then to proteins, with each triple of nucleotides (codon) leading to a specific amino acid in the protein sequence. There are some cases where several distinct codons end up producing the same amino acid. There is also a particular codon (ATG in the original DNA sequence) that is known as the "start codon", which produces the amino acid methionine, yet this start codon is key at the molecular level for getting the process rolling.

There are also three specific codons that serve as stop codons for the process, and these are TAA, TGA, and TAG.

The precise conversion from codons to amino acids can either be given as a complete table, or sometimes is described using a wheel-like structure that is more convenient for tracing codons as a three-letter sequence.


Reading Frames

Because it matters where you start grouping three nucleotides into a codon, there are actually six different reading frames, three in the forward direction, and three because there could be coding regions that are on the reverse complementary strand.


In-class Example

How many ORFs are you able to find for the following strand? (including possible ORFs in the implicit complementary strand)

TTACCTATGCATGCATAACTGA


Michael Goldwasser
Last modified: Wednesday, 14 February 2018
Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit