Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

Python Control Structures

Introduction

It is typically much easier to "read" a new language than to "write" in a new language, so for that reason, we will introduce a variety of Python control structures in this lecture, as motivated by some biological examples. I don't expect you to master all these techniques quite yet; later we'll come back to visit these techniques with the mindset of an author.


Additional Readings


Function definition

We can define our own functions for a variety of tasks. As a simple example just to get used to the syntax, a Python implementation of a mathematical functions such as f(x) = 3x2 - 2x + 5 might be written as


def f(x):
    """Return the quantity 3x2 - 2x + 5"""
    return 3*x*x - 2*x + 5

Formally, the first of those two lines states that we are defining a new function, that we want the function to be named f, and that the function takes a single parameter that we will name x. It is also important that the first line ends with a colon, which is a symbol that will be used in a variety of control structures in Python.

The second line (starting and ending with the triple-quotes) is entirely optional, but it is a Pythonic way to provide documentation for what this function does. From within the Python interpreter, someone can issue the command, help(f), to see this documentation.

The final line serves as the body of the function, which is the code that should be executed when the function is called. In this case, the body is a single statement but more generally the body can have many statements. Python uses indentation to define the scope of the function body. The special return command is used to indicate the value that should be sent back to the caller of the function at the conclusion.

In terms of the mathematical computation, notice that the * operator is used to indicate multiplication. Also note that Python follows algebraic convention in terms of order of operations, and thus the multiplications, such as 2*x will be done before the addition and subtraction. The spacing in our computation was purely for visual appeal. The same result would be computed even if we had written the final line as


def f(x):
    return 3*x*x-2*x+5

For those who want an advanced lesson, Python uses the ** operator for exponentiation (which has precedence even over multiplication), so this could have been written as


def f(x):
    return 3*x**2 - 2*x + 5

Probably not that important for x2 but can be helpful for higher powers.


GC content

While the first example had a single numeric parameter, a function can have parameters of any type (and multiple parameters, if desired).

As a biological example, we consider computing the GC-content of a dna string, which we will view as a floating-point number between 0.0 and 1.0 that is the ratio between the number of bases that are either G or C relative to the total number of bases. That is, we want the fraction

$\frac{\#\mbox{G} \ +\ \#\mbox{C}}{\#\mbox{bases}}$

here is a function that computes the GC-content of a given dna string (returned as a floating-point number between 0.0 and 1.0).


def gc_content(dna):
    """Return the quantity representing the fraction of bases that are G or C"""
    return (dna.count('C') + dna.count('G')) / float(len(dna))

With this example, we wish to highlight two more lessons about numeric computations in Python.


Loops and conditionals

The above function for computing the GC-content of a strand of DNA relies on the fact that strings in Python support their own count function, which does the step of looping through the entire string and keeping track of the number of matches. Here we provide a more homespun approach that demonstrates both the use of a construct known as a for loop and an if statement.


def gc_content(dna):
    match = 0                               # number of GC matches we find
    for base in dna:
        if base == 'G' or base == 'C':
            match += 1                      # shorthand for match = match + 1
    return float(match) / len(dna)

See it run:

There are a lot of new techniques to unpack here, so let's unpack each part of this code.


If/elif construct

The above example relied on a compound boolean condition to test whether the base was C or G. We could have stated that more distinctly as two separate tests using an extension of the if-statement syntax using an "elif" clause, which is short for "else if".


def gc_content(dna):
    match = 0                               # number of GC matches we find
    for base in dna:
        if base == 'C':
            match += 1
        elif base == 'G':
            match += 1
    return float(match) / len(dna)

In the body of the for loop, this logic is akin to the following process. If the base character is C, then the match count is increased. Otherwise it performs a second test to see if the base character is G, in which case it also increases the match count.

In this particular example, notice that the body of the if block and the elif block are the same. In general, those could be different actions in those two blocks (as in our next example). In fact, if they are the same actions in both, the original version with a single compound condition is preferred, because it makes more clear that there is only a single action that might be taken, but one that could be triggered by two possible conditions.


Computing a reverse complement strand of DNA

In our next example, we demonstrate that the result of a function need not be numeric. We design a function, reverse_complement(dna), that computes and returns the reverse complement sequence for a single strand of dna. As an example, a call to reverse_complement('CCGAT') should produce the string 'ATCGG' with the A of the result being the complement of the final T of the original, the T of the result being the complement of the A at the second-to-last location of the original, and so forth.


def reverse_complement(dna):
    """Return a string representing the reverse complement of the single dna strand."""
    other = ''              # start with an empty string
    for base in dna:
        if base == 'G':
            other += 'C'
        elif base == 'C':
            other += 'G'
        elif base == 'A':
            other += 'T'
        else:               # presumably, only other possibility is T
            other += 'A'
    return other[ : :-1]    # reverse the result

See it run:

What is new in this example is a more general form of conditional where we can have a variety of possible cases. The elif keyword is shorthand for the phrase "else if". So the logic within the forloop could be phrased in English as

"if the base is G, add a C; else if it is a C, add a G; else if it is an A, add a T; else add an A."""
For the final case, we could have again used an elif to explicitly check for a base of T, but if we presume the original dna was legitimate, then we needn't bother checking because by process of elimination, if it was not G, C, or A, it must be T.

Also, notice that our overall process was to construct a new string, other, piecewise while converting each base of the original strand. Finally, since the goal was to produce the reverse complement, the return statement uses the slicing notation with a skip of negative one to produce the reversed string.


Dictionaries


# while we could define the following dictionary within the body function
# we might also construct it once, outside the function, since it is
# always the same
complement = { 'G':'C', 'C':'G', 'A':'T', 'T':'A' } 

def reverse_complement(dna):
    """Return a string representing the reverse complement of the single dna strand."""
    other = ''              # start with an empty string
    for base in dna:
        other += complement[base]
    return other[ : :-1]    # reverse the result


Transcribing DNA to RNA

Exercise: Use techniques as above to define a function dna2rna(seq) that transcribes a string of DNA, such as 'CCGAT' to the corresponding RNA sequence to which it binds ('GGCUA' in this example).


Michael Goldwasser
Last modified: Friday, 02 February 2018
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring