Lists and Index-based For Loops

Introduction

Additional Readings

Lists and the range function are introduced in sections 0.11 and 0.12 of Chapter 0.
Chapter 2 of the text introduces use of the range function to perform index-based for loops.

Lists

While strings are convenient for representing a sequence of characters, Python allows for representation of sequences of arbitrary types of data as well. The primary structure for such a sequence is a Python list.

Constructing lists with the `range` function

Some other time, we will see that there are many ways to construct and manipulate lists. For today, we'll focus specifically on Python's range function which can be used to create regular sequences of integers.

There are three forms of range.

The first version uses a single parameter. The syntax range(k) produces the list of numbers [0, 1, 2, ..., k-1]. As is the case with many Python conventions, notice that the range starts at zero and goes up to but not including the given stop value. (Technically, in Python2 range produces an actual list, while in Python3 range produces something known as a generator for the sequence.)

We will see use of range(k) a lot because those integers from 0 to k-1 are precisely the indices of a list of k items.
The second version uses two parameters, which give a starting value and stopping value for the range. Specifically, a syntax such as range(j, k) produces the list of integers [j, j+1, j+2, ..., k-1]. Of course this range will be empty if j ≥ k.
The third version uses three parameters, with the third being the step size for the sequence. For example, we could get some even numbers with range(0, 10, 2) which produces [0, 2, 4, 6, 8].

A negative step size can be used to get a decreasing sequence, such as range(10, 5, -1) which produces sequence [10, 9, 8, 7, 6], since the range still goes from the starting value, up to but not including the designated stop value.

You should notice a great similarity between the use of parameters for a range and the use of parameters when describing slices of a string, although the syntax is different (with commas separating range parameters, and colons separating those arguments for a slice).

Index-based loops

The reason the range function is so important in Python is that it allows for a technique known as an index-based loop.

We have already seen a for loop to iterate through the characters of a string. While this is quite intuitive, a problem is that when you are in one pass of such a loop, you have a name for the current element but you do not have any context for where that element is relative to others.

Just as a loop can iterate over characters of a string, it can be used to iterate through elements of a list (any list). When we use range to make a list, we can define a range to intentionally correspond to the integers that are indices of some other sequence. As a simple example, a direct for loop on a string might appear as

for base in dna:
    print(base)

A corresponding index-based loop instead formally defines a loop variable which is an integer index that can subsequently be used to index into the original sequence. An equivalent behavior to the previous loop might be expressed as

for k in range(len(dna)):
    print(dna[k])

For example, if the original dna string had length 5, then notice that len(dna) is 5 and thus range(len(dna)) produces sequence [0,1,2,3,4]. So as we loop through that range, we get each legitimate index into the string, which we might use as dna[k].

Clearly, for the above example, the more direct for loop is cleaner and more intuitive. We prefer that direct loop when you simply want to process each element once, without any need for context. But as we saw in lab02, we may sometimes want to better understand the neighborhood around an element, and knowledge of an element's index helps. For example, we had a warmup question on lab02 that asked to count how many times a dna base is immediately followed by the same base. We can implement that count as follows:

count = 0
for k in range(len(dna)-1):
    if dna[k] == dna[k+1]:
        count += 1

Note well that in this example, we chose to loop over range(len(dna)-1) rather than range(len(dna)). As a sanity check, assume that dna had length 5. Then we only need a loop that executes 4 times to compare the four pairs of neighbors. Our loop would be executing only over the sequence [0, 1, 2, 3], during which we end up comparing dna[0] to dna[1], dna[1] to dna[2], dna[2] to dna[3], and dna[3] to dna[4].

Michael Goldwasser