Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring

For Loops

A critical aspect of computing is the ability to repeat a sequence of instructions using a control structure known as a loop. In Python, there are several forms of loops and we will begin with a form known as a for loop.

Typical Form

At its core, a for loop is used to repeate a block of code once for each element of a sequence. For example, this could be to loop for each character of a string, each element of a list, or each line of a file. The basic syntax of a for loop appears as follows.

for variableName in sequence:
    one or more commands
    that are to be repeated
As a biological example, we consider computing the GC-content of a dna string, which we will view as a floating-point number between 0.0 and 1.0 that is the ratio between the number of bases that are either G or C relative to the total number of bases. We can perform such a computation with the following code.
match = 0
for base in dna:
    if base == 'G' or base == 'C':
        match = match + 1
gcContent = match/len(dna)
The following demonstrates the execution of this code on a small example.

As an aside, we will note that the count method of the string class would allow us to compute the same value as

gcContent = (dna.count('G') + dna.count('C')) / len(dna)
although that implementation would in fact execute two implicit loops, one for each call to count.

Index-based Loop

While the direct application of a for loop over elements of a sequence is the preferred syntax due to its simplicity, there are some situations in which it does not suffice. In some situations, it is important during each repetition that you not only know the element of the sequence but also the index at which it occurs within the syntax. The context of the index would allow you to more easily examine elements nearby to the current one, or perhaps examining the corresponding element at the same position of a different sequence.

In such situations, the technical approach is to use an index-based version of a for loop in which we formally iterate over a range of integers indices (rather than iterating directly over the original sequence).

As a motivating example, consider the goal of counting the number of mistmatched basepairs between a reference sequence and an indvidual's allele. Assuming variables reference and allele we could compute this as follows.

errors = 0
for k in range(len(reference)):
    if allele[k] != reference[k]:
        errors = errors + 1

The key to this approach is use of another built-in function named range. The range function produces a sequence of integers. There are three forms of range.

You should notice a great similarity between the use of parameters for a range and the use of parameters when describing slices of a string, although the syntax is different (with commas separating range parameters, and colons separating those arguments for a slice).

Returning to use of the range function for an index-based for loop, consider a dna string with length 5. Notice that len(dna) is 5 and thus range(len(dna)) produces sequence [0,1,2,3,4]. So as we loop through that range, we get each legitimate index into the string, which we might use as dna[k].

However, we could alter the range for different purposes. For example, we might loop over range(0, len(dna), 3) if examining the codons of a coding sequence.

As another example, we consider counting the number of times a dna base is immediately followed by the same base. We can implement that count as follows:

count = 0
for k in range(len(dna)-1):
    if dna[k] == dna[k+1]:
        count += 1
Note well that in this example, we chose to loop over range(len(dna)-1) rather than range(len(dna)). As a sanity check, assume that dna had length 5. Then we only need a loop that executes 4 times to compare the four pairs of neighbors. Our loop would be executing only over the sequence [0, 1, 2, 3], during which we end up comparing dna[0] to dna[1], dna[1] to dna[2], dna[2] to dna[3], and dna[3] to dna[4].


Michael Goldwasser
Last modified: Sunday, 27 January 2019
Course Home | Assignments | Data Sets/Tools | Python | Schedule | Git Submission | Tutoring