Notes: DNA Transcription in MATLAB


What is DNA Transcription?

Transcription is the process by which your DNA creates RNA, and is an essential part of life. Computationally we can look at DNA and RNA as being composed of a sequence of several fundamental bases: adenine (A), guanine (G), cytosine (C), thymine (T), and uracil (U). While the biological process of transcription is quite complex, the high level expression of what happens in terms of bases is quite simple.

For an input DNA sequence composed of A, G, C, and T, the output is an RNA sequence composed of A, G, C, and U where:

For example:

 
Input  DNA: ATTGCGAGTC
Output RNA: UAACGCUCAG

Goal today:

There is one last complication however. In practice DNA seqeuences are quite long- for example the human genome is over three billion base pairs. Thus, genome data is always stored digitally rather than entered or worked with by hand. Thus our goal today is to write a function that takes two input arguments: the name of an input file and the name of an output file.

Our function today will open a DNA input file for reading, convert the DNA base pairs there to RNA according to the mapping above, and then write the RNA output to the output file. If our function encounters any errors- for example if the input or output files can't be opened, then our function will print a descriptive error message and quit gracefully rather than crashing.

You can see an example input file here: DNA.txt

And the corresponding output file is here: DNA_transcribed.txt