Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit

Lists and Index-based For Loops

Introduction


Additional Readings


Lists

While strings are convenient for representing a sequence of characters, Python allows for representation of sequences of arbitrary types of data as well. The primary structure for such a sequence is a Python list.


Constructing lists with the range function

Some other time, we will see that there are many ways to construct and manipulate lists. For today, we'll focus specifically on Python's range function which can be used to create regular sequences of integers.

There are three forms of range.

You should notice a great similarity between the use of parameters for a range and the use of parameters when describing slices of a string, although the syntax is different (with commas separating range parameters, and colons separating those arguments for a slice).


Index-based loops

The reason the range function is so important in Python is that it allows for a technique known as an index-based loop.

We have already seen a for loop to iterate through the characters of a string. While this is quite intuitive, a problem is that when you are in one pass of such a loop, you have a name for the current element but you do not have any context for where that element is relative to others.

Just as a loop can iterate over characters of a string, it can be used to iterate through elements of a list (any list). When we use range to make a list, we can define a range to intentionally correspond to the integers that are indices of some other sequence. As a simple example, a direct for loop on a string might appear as


for base in dna:
    print(base)

A corresponding index-based loop instead formally defines a loop variable which is an integer index that can subsequently be used to index into the original sequence. An equivalent behavior to the previous loop might be expressed as

for k in range(len(dna)):
    print(dna[k])

For example, if the original dna string had length 5, then notice that len(dna) is 5 and thus range(len(dna)) produces sequence [0,1,2,3,4]. So as we loop through that range, we get each legitimate index into the string, which we might use as dna[k].

Clearly, for the above example, the more direct for loop is cleaner and more intuitive. We prefer that direct loop when you simply want to process each element once, without any need for context. But as we saw in lab02, we may sometimes want to better understand the neighborhood around an element, and knowledge of an element's index helps. For example, we had a warmup question on lab02 that asked to count how many times a dna base is immediately followed by the same base. We can implement that count as follows:


count = 0
for k in range(len(dna)-1):
    if dna[k] == dna[k+1]:
        count += 1

Note well that in this example, we chose to loop over range(len(dna)-1) rather than range(len(dna)). As a sanity check, assume that dna had length 5. Then we only need a loop that executes 4 times to compare the four pairs of neighbors. Our loop would be executing only over the sequence [0, 1, 2, 3], during which we end up comparing dna[0] to dna[1], dna[1] to dna[2], dna[2] to dna[3], and dna[3] to dna[4].


Michael Goldwasser
Last modified: Monday, 05 February 2018
Course Home | Assignments | Computing Resources | Data Sets | Lab Hours/Tutoring | Python | Schedule | Submit