Iteration

Iteration

Source: this section is heavily based on Chapter 7 of [ThinkCS].

Until now we have seen that the while statement can be used for repeatedly executing lines of code. Repeated execution of a set of statements is called iteration. Because iteration is so common, we are going to dive deeper in this topic in this chapter. Among others, we are going to introduce the for statement --- another way to have your program do iteration, useful in slightly different circumstances.

The while loop

Until now, we have seen the while statement for repeating code:

runs_scored = 0
while runs_scored < 5:
    print(runs_scored)
    runs_scored = runs_scored + 1

In this code, we first initialize the variable runs_scored, and then repeatedly increment this variable, until a stop condition is reached. We print the values 0, 1, 2, 3 and 4; then we stop the repetition when runs_scored reaches the value 5. We exploit here the fact that the assignment statement in the loop modifies the value of the variable runs_scored in the memory of the computer.

Note that overall, we repeat the 3rd line in this program 5 times: 1 time when runs_scored==0, one time when runs_scored==1, ..., and 1 time when runs_scored==4 -- 5 in total.

The while statement can always be used to repeat statements. However, it is not always the shortest approach. Python offers an alternative solution: the for loop.

The for loop

In general, we often wish to repeat statements by giving values to one variable. In our earlier example, we repeat the line print(runs_scored) for the values runs_scored==0, runs_scored==1, etc.

The for notation of Python allows us to write this more shortly as follows:

for runs_scored in [0,1,2,3,4]:
    print(runs_scored)

In this code, [0,1,2,3,4] is a list that specifies the values that are going to be assigned to the variable runs_scored; the block of code in the for loop is executed for each value in this list.

While in this example our lists consists of numbers, this need not be the case. We can also iterate over other types. For instance:

for f in ["Joe", "Zoe", "Brad", "Angelina", "Zuki", "Thandi", "Paris"]:
    invitation = "Hi " + f + ".  Please come to my party on Saturday!"
    print(invitation)

Running through all the items in a list is called traversing the list, or traversal.

When we run this, the output looks like this:

Hi Joe.  Please come to my party on Saturday!
Hi Zoe.  Please come to my party on Saturday!
Hi Brad.  Please come to my party on Saturday!
Hi Angelina.  Please come to my party on Saturday!
Hi Zuki.  Please come to my party on Saturday!
Hi Thandi.  Please come to my party on Saturday!
Hi Paris.  Please come to my party on Saturday!
  • The variable f in the for statement at line 1 is called the loop variable. We could have chosen any other variable name instead.
  • Lines 2 and 3 are the loop body. The loop body is always indented. The indentation determines exactly what statements are "in the body of the loop".
  • On each iteration or pass of the loop, we consider one of the elements in the list, and execute the body of the list for that value.
  • At the end of each execution of the body of the loop, Python returns to the for statement, to assign the next value to f.

As the program executes, the interpreter always keeps track of which statement is about to be executed. We call this the control flow, of the flow of execution of the program. When humans execute programs, they often use their finger to point to each statement in turn. So we could think of control flow as "Python's moving finger".

Flowchart of a for loop

Control flow is often easy to visualize and understand if we draw a flowchart. This shows the exact steps and logic of how the for statement executes.

/syllabus/info1-theory/assets/flowchart_for.png

Let us write code now to sum up all the elements in a list of numbers [1.25, 2.5, 1.75]. Do this by hand first, and try to isolate exactly what steps you take. You'll find you need to keep some "running total" of the sum so far, either on a piece of paper, in your head, or in your calculator. Remembering things from one step to the next is precisely why we have variables in a program: so we'll need some variable to remember the "running total". It should be initialized with a value of zero, and then we need to traverse the items in the list. For each item, we'll want to update the running total by adding the next number to it.

""" Sum all the numbers in a list, and print the total. """
running_total = 0
for x in [1.25, 2.5, 1.75]:
    running_total = running_total + x
print(running_total)

The range(n) notation

In the code above, we wrote

for runs_scored in [0,1,2,3,4]:
    print(runs_scored)

It is cumbersome to have to write all the numbers explicitly. Fortunately, Python has a shorthand notation for this, which is the following:

for runs_scored in range(5):
    print(runs_scored)

This code will produce exactly the same output as the previous code. Simple, isn't it?

Actually, not really; most beginning programmers have a very hard time using this notation correctly.

The issue is that we write range(5) with the value 5 to create a list that starts at 0 and ends at 4. There is a good reason for this: the list specified by range(5) contains 5 elements. These 5 elements are 0, 1, 2, 3 and 4.

Consequently, the following code:

for runs_scored in range(5):
    print(runs_scored + 1)

prints the values 1, 2, 3, 4, and 5.

If you are confused by this, you are not alone; you are almost certain to make mistakes with this. However, it is extremely important to understand this correctly for your programs to work correctly.

The Collatz 3n + 1 sequence

As we have now seen, we can iterate using both a for and a while statement. How to make the choice between the two?

Let's look at a sequence that has fascinated and foxed mathematicians for many years. They still cannot answer even quite simple questions about this.

The "computational rule" for creating the sequence is to start from some given n, and to generate the next term of the sequence from n, either by halving n, (whenever n is even), or else by multiplying it by three and adding 1. The sequence terminates when n reaches 1.

This Python function captures that algorithm, where we calculate the sequence for n = 19:

n = 19
""" Print the 3n+1 sequence from n,
    terminating when it reaches 1.
"""
while n != 1:
    print(n, end=", ")
    if n % 2 == 0:        # n is even
        n = n // 2
    else:                 # n is odd
        n = n * 3 + 1
print(n, end=".\n")

Notice first that the print function on line 6 has an extra argument end=", ". This tells the print function to follow the printed string with whatever the programmer chooses (in this case, a comma followed by a space), instead of ending the line. So each time something is printed in the loop, it is printed on the same output line, with the numbers separated by commas. The call to print(n, end=".\n") at line 11 after the loop terminates will then print the final value of n followed by a period and a newline character. (You'll cover the \n (newline character) later).

The condition for continuing with this loop is n != 1, so the loop will continue running until it reaches its termination condition, (i.e. n == 1).

Each time through the loop, the program outputs the value of n and then checks whether it is even or odd. If it is even, the value of n is divided by 2 using integer division. If it is odd, the value is replaced by n * 3 + 1. Here is the output of this program:

19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13,
                    40, 20, 10, 5, 16, 8, 4, 2, 1.

Since n sometimes increases and sometimes decreases, there is no obvious proof that n will ever reach 1, or that the program terminates. For some particular values of n, we can prove termination. For example, if the starting value is a power of two, then the value of n will be even each time through the loop until it reaches 1. The previous example ends with such a sequence, starting with 16.

See if you can find a small starting number that needs more than a hundred steps before it terminates.

Particular values aside, the interesting question was first posed by a German mathematician called Lothar Collatz: the Collatz conjecture (also known as the 3n + 1 conjecture), is that this sequence terminates for all positive values of n. So far, no one has been able to prove it or disprove it! (A conjecture is a statement that might be true, but nobody knows for sure.)

Think carefully about what would be needed for a proof or disproof of the conjecture "All positive integers will eventually converge to 1 using the Collatz rules". With fast computers we have been able to test every integer up to very large values, and so far, they have all eventually ended up at 1. But who knows? Perhaps there is some as-yet untested number which does not reduce to 1.

You'll notice that if you don't stop when you reach 1, the sequence gets into its own cyclic loop: 1, 4, 2, 1, 4, 2, 1, 4 ... So one possibility is that there might be other cycles that we just haven't found yet.

Wikipedia has an informative article about the Collatz conjecture. The sequence also goes under other names (Hailstone sequence, Wonderous numbers, etc.), and you'll find out just how many integers have already been tested by computer, and found to converge!

Choosing between for and while

Use a for loop if you know, before you start looping, the maximum number of times that you'll need to execute the body. For example, if you're traversing a list of elements, you know that the maximum number of loop iterations you can possibly need is "all the elements in the list". Or if you need to print 12 numbers, we know right away how many times the loop will need to run.

So any problem like "iterate this weather model for 1000 cycles", or "search this list of words", "find all prime numbers up to 10000" suggest that a for loop is best.

By contrast, if you are required to repeat some computation until some condition is met, and you cannot calculate in advance when (of if) this will happen, as we did in this 3n + 1 problem, you'll need a while loop.

We call the first case definite iteration --- we know ahead of time some definite bounds for what is needed. The latter case is called indefinite iteration --- we're not sure how many iterations we'll need --- we cannot even establish an upper bound!

Tracing a program

To write effective computer programs, and to build a good conceptual model of program execution, a programmer needs to develop the ability to trace the execution of a computer program. Tracing involves becoming the computer and following the flow of execution through a sample program run, recording the state of all variables and any output the program generates after each instruction is executed.

To understand this process, let's trace the program from the previous section, with starting value n=3 instead of n=19. At the start of the trace, we have a variable, n (the parameter), with an initial value of 3. Since 3 is not equal to 1, the while loop body is executed. 3 is printed and 3 % 2 == 0 is evaluated. Since it evaluates to False, the else branch is executed and 3 * 3 + 1 is evaluated and assigned to n.

To keep track of all this as you hand trace a program, make a column heading on a piece of paper for each variable created as the program runs and another one for output. Our trace so far would look something like this:

n               output printed so far
--              ---------------------
3               3,
10

Since 10 != 1 evaluates to True, the loop body is again executed, and 10 is printed. 10 % 2 == 0 is true, so the if branch is executed and n becomes 5. By the end of the trace we have:

n               output printed so far
--              ---------------------
3               3,
10              3, 10,
5               3, 10, 5,
16              3, 10, 5, 16,
8               3, 10, 5, 16, 8,
4               3, 10, 5, 16, 8, 4,
2               3, 10, 5, 16, 8, 4, 2,
1               3, 10, 5, 16, 8, 4, 2, 1.

Tracing can be a bit tedious and error prone (that's why we get computers to do this stuff in the first place!), but it is an essential skill for a programmer to have. From this trace we can learn a lot about the way our code works. We can observe that as soon as n becomes a power of 2, for example, the program will require log2(n) executions of the loop body to complete. We can also see that the final 1 will not be printed as output within the body of the loop, which is why we put the special print function at the end.

Tracing a program is, of course, related to single-stepping through your code and being able to inspect the variables. Using the computer to single-step for you is less error prone and more convenient. Also, as your programs get more complex, they might execute many millions of steps before they get to the code that you're really interested in, so manual tracing becomes impossible. Being able to set a breakpoint where you need one is far more powerful. So we strongly encourage you to invest time in learning using to use your programming environment (Thonny, in this course) to full effect.

There are also some great visualization tools becoming available to help you trace and understand small fragments of Python code. The one we recommend is at http://www.pythontutor.com/visualize.html .

Counting digits

Let us consider another example where a while loop is necessary instead of a for loop.

The following code counts the number of decimal digits in a positive integer n, specified by the user:

n = int(input("Give a number:" ))
count = 0
while n != 0:
    count = count + 1
    n = n // 10
print(count)

If the user enters 710, the code will print 3. Trace the execution of this function call (perhaps using the single step function in Thonny, or the Python visualizer, or on some paper) to convince yourself that it works.

This code demonstrates an important pattern of computation called a counter. The variable count is initialized to 0 and then incremented each time the loop body is executed. When the loop exits, count contains the result --- the total number of times the loop body was executed, which is the same as the number of digits.

Note that even though we have a counter here, we cannot use a for loop! In a for loop, we need to specify in advance all the elements we are going to iterate over, but in this case we don't know the end value of count before the loop is executed!

If we wanted to only count digits that are either 0 or 5, adding a conditional before incrementing the counter will do the trick:

n = int(input("Give a number: "))
count = 0
while n > 0:
    digit = n % 10
    if digit == 0 or digit == 5:
        count = count + 1
    n = n // 10
print(count)

Confirm that when we enter 1055030250, as result 7 is printed. What happens when the user enters 0?

Abbreviated assignment

Incrementing a variable is so common that Python provides an abbreviated syntax for it:

>>> count = 0
>>> count += 1
>>> count
1
>>> count += 1
>>> count
2

count += 1 is an abreviation for count = count + 1 . We pronounce the operator as "plus-equals". The increment value does not have to be 1:

>>> n = 2
>>> n += 5
>>> n
7

There are similar abbreviations for -=, *=, /=, //= and %=:

>>> n = 2
>>> n *= 5
>>> n
10
>>> n -= 4
>>> n
6
>>> n //= 2
>>> n
3
>>> n %= 2
>>> n
1

These abbreviations are often used in while loops:

n = int(input("Give a number: "))
while n > 0:
    digit = n % 10
    if digit == 0 or digit == 5:
        count += 1
    n //= 10
print(count)

The break statement

Until now, we have seen that a while loop continues until its condition is no longer True, and a for loop considers all values in a list. Sometimes, however, we wish to stop earlier.

The break statement is used to immediately leave the body of its loop. The next statement to be executed is the first one after the body:

for i in [12, 16, 17, 24, 29]:
    if i % 2 == 1:  # If the number is odd
       break        #  ... immediately exit the loop
    print(i)
print("done")

This prints:

12
16
done

The pre-test loop --- standard loop behaviour

for and while loops do their tests at the start, before executing any part of the body. They're called pre-test loops, because the test happens before (pre) the body. break and return are our tools for adapting this standard behaviour.

/syllabus/info1-theory/assets/pre_test_loop.png

Other flavours of loops

Sometimes we'd like to have the middle-test loop with the exit test in the middle of the body, rather than at the beginning or at the end. Or a post-test loop that puts its exit test as the last thing in the body. Other languages have different syntax and keywords for these different flavours, but Python just uses a combination of while and if condition: break to get the job done.

A typical example is a problem where the user has to input numbers to be summed. To indicate that there are no more inputs, the user enters a special value, often the value -1, or the empty string. This needs a middle-exit loop pattern: input the next number, then test whether to exit, or else process the number:

The middle-test loop flowchart

/syllabus/info1-theory/assets/mid_test_loop.png
total = 0
while True:
    response = input("Enter the next number. (Leave blank to end)")
    if response == "":
        break
    total += int(response)
print("The total of the numbers you entered is ", total)

Convince yourself that this fits the middle-exit loop flowchart: line 3 does some useful work, lines 4 and 5 can exit the loop, and if they don't line 6 does more useful work before the next iteration starts.

The while bool-expr: uses the Boolean expression to determine whether to iterate again. True is a trivial Boolean expression, so while True: means always do the loop body again. This is a language idiom --- a convention that most programmers will recognize immediately. Since the expression on line 2 will never terminate the loop, (it is a dummy test) the programmer must arrange to break (or return) out of the loop body elsewhere, in some other way (i.e. in lines 4 and 5 in this sample). A clever compiler or interpreter will understand that line 2 is a fake test that must always succeed, so it won't even generate a test, and our flowchart never even put the diamond-shape dummy test box at the top of the loop!

Similarly, by just moving the if condition: break to the end of the loop body we create a pattern for a post-test loop. Post-test loops are used when you want to be sure that the loop body always executes at least once (because the first test only happens at the end of the execution of the first loop body). This is useful, for example, if we want to play an interactive game against the user --- we always want to play at least one game:

while True:
    play_the_game_once()
    response = input("Play again? (yes or no)")
    if response != "yes":
        break
print("Goodbye!")

Hint: Think about where you want the exit test to happen

Once you've recognized that you need a loop to repeat something, think about its terminating condition --- when will I want to stop iterating? Then figure out whether you need to do the test before starting the first (and every other) iteration, or at the end of the first (and every other) iteration, or perhaps in the middle of each iteration. Interactive programs that require input from the user or read from files often need to exit their loops in the middle or at the end of an iteration, when it becomes clear that there is no more data to process, or the user doesn't want to play our game anymore.

An example

The following program implements a simple guessing game:

import random                   # We cover random numbers in the
rng = random.Random()           #  modules chapter, so peek ahead.
number = rng.randrange(1, 1000) # Get random number between [1 and 1000).

guesses = 0
msg = ""

while True:
    guess = int(input(msg + "\nGuess my number between 1 and 1000: "))
    guesses += 1
    if guess > number:
        msg += str(guess) + " is too high.\n"
    elif guess < number:
        msg += str(guess) + " is too low.\n"
    else:
        break

input("\n\nGreat, you got it in {0} guesses!\n\n".format(guesses))

This program makes use of the mathematical law of trichotomy (given real numbers a and b, exactly one of these three must be true: a > b, a < b, or a == b).

At line 18 there is a call to the input function, but we don't do anything with the result, not even assign it to a variable. This is legal in Python. Here it has the effect of popping up the input dialog window and waiting for the user to respond before the program terminates. Programmers often use the trick of doing some extra input at the end of a script, just to keep the window open.

Also notice the use of the msg variable, initially an empty string, on lines 6, 12 and 14. Each time through the loop we extend the message being displayed: this allows us to display the program's feedback right at the same place as we're asking for the next guess.

The continue statement

This is a control flow statement that causes the program to immediately skip the processing of the rest of the body of the loop, for the current iteration. But the loop still carries on running for its remaining iterations:

for i in [12, 16, 17, 24, 29, 30]:
    if i % 2 == 1:      # If the number is odd
       continue         # Don't process it
    print(i)
print("done")

This prints:

12
16
24
30
done

Newton's method for finding square roots

Loops are often used in programs that compute numerical results by starting with an approximate answer and iteratively improving it.

For example, before we had calculators or computers, people needed to calculate square roots manually. Newton used a particularly good method (there is some evidence that this method was known many years before). Suppose that you want to know the square root of n. If you start with almost any approximation, you can compute a better approximation (closer to the actual answer) with the following formula:

better = (approx + n/approx)/2

Repeat this calculation a few times using your calculator. Can you see why each iteration brings your estimate a little closer? One of the amazing properties of this particular algorithm is how quickly it converges to an accurate answer --- a great advantage for doing it manually.

By using a loop and repeating this formula until the better approximation gets close enough to the previous one, we can write a function for computing the square root. (In fact, this is how your calculator finds square roots --- it may have a slightly different formula and method, but it is also based on repeatedly improving its guesses.)

This is an example of an indefinite iteration problem: we cannot predict in advance how many times we'll want to improve our guess --- we just want to keep getting closer and closer. Our stopping condition for the loop will be when our old guess and our improved guess are "close enough" to each other.

Ideally, we'd like the old and new guess to be exactly equal to each other when we stop. But exact equality is a tricky notion in computer arithmetic when real numbers are involved. Because real numbers are not represented absolutely accurately (after all, a number like pi or the square root of two has an infinite number of decimal places because it is irrational), we need to formulate the stopping test for the loop by asking "is a close enough to b"? This stopping condition can be coded like this:

if abs(a-b) < 0.001:  # Make this smaller for better accuracy
      break

Notice that we take the absolute value of the difference between a and b!

This problem is also a good example of when a middle-exit loop is appropriate:

n = float(input("Provide a floating point number: "))
approx = n/2.0     # Start with some or other guess at the answer
while True:
    better = (approx + n/approx)/2.0
    if abs(approx - better) < 0.001:
        return better
    approx = better

The output for input 25.0 is:

5.00000000002

See if you can improve the approximations by changing the stopping condition. Also, step through the algorithm (perhaps by hand, using your calculator) to see how many iterations were needed before it achieved this level of accuracy for sqrt(25).

Algorithms

Newton's method is an example of an algorithm: it is a mechanical process for solving a category of problems (in this case, computing square roots).

Some kinds of knowledge are not algorithmic. For example, learning dates from history or your multiplication tables involves memorization of specific solutions.

But the techniques you learned for addition with carrying, subtraction with borrowing, and long division are all algorithms. Or if you are an avid Sudoku puzzle solver, you might have some specific set of steps that you always follow.

One of the characteristics of algorithms is that they do not require any intelligence to carry out. They are mechanical processes in which each step follows from the last according to a simple set of rules. And they're designed to solve a general class or category of problems, not just a single problem.

Understanding that hard problems can be solved by step-by-step algorithmic processes (and having technology to execute these algorithms for us) is one of the major breakthroughs that has had enormous benefits. So while the execution of the algorithm may be boring and may require no intelligence, algorithmic or computational thinking --- i.e. using algorithms and automation as the basis for approaching problems --- is rapidly transforming our society. Some claim that this shift towards algorithmic thinking and processes is going to have even more impact on our society than the invention of the printing press. And the process of designing algorithms is interesting, intellectually challenging, and a central part of what we call programming.

Some of the things that people do naturally, without difficulty or conscious thought, are the hardest to express algorithmically. Understanding natural language is a good example. We all do it, but so far no one has been able to explain how we do it, at least not in the form of a step-by-step mechanical algorithm.

Glossary

algorithm
A step-by-step process for solving a category of problems.
body
The statements inside a loop.
breakpoint
A place in your program code where program execution will pause (or break), allowing you to inspect the state of the program's variables, or single-step through individual statements, executing them one at a time.
bump
Programmer slang. Synonym for increment.
continue statement
A statement that causes the remainder of the current iteration of a loop to be skipped. The flow of execution goes back to the top of the loop, evaluates the condition, and if this is true the next iteration of the loop will begin.
cursor
An invisible marker that keeps track of where the next character will be printed.
decrement
Decrease by 1.
development plan
A process for developing a program. In this chapter, we demonstrated a style of development based on developing code to do simple, specific things and then encapsulating and generalizing.
encapsulate
To divide a large complex program into components (like functions) and isolate the components from each other (by using local variables, for example).
escape sequence
An escape character, \, followed by one or more printable characters used to designate a nonprintable character.
generalize
To replace something unnecessarily specific (like a constant value) with something appropriately general (like a variable or parameter). Generalization makes code more versatile, more likely to be reused, and sometimes even easier to write.
initialization (of a variable)
To initialize a variable is to give it an initial value. Since in Python variables don't exist until they are assigned values, they are initialized when they are created. In other programming languages this is not the case, and variables can be created without being initialized, in which case they have either default or garbage values.
loop body
Any number of statements nested inside a loop. The nesting is indicated by the fact that the statements are indented under the for loop statement.
loop variable
A variable used as part of a for loop. It is assigned a different value on each iteration of the loop.
middle-test loop
A loop that executes some of the body, then tests for the exit condition, and then may execute some more of the body. We don't have a special Python construct for this case, but can use while and break together.
nested loop
A loop inside the body of another loop.
newline
A special character that causes the cursor to move to the beginning of the next line.
post-test loop
A loop that executes the body, then tests for the exit condition. We don't have a special Python construct for this, but can use while and break together.
pre-test loop
A loop that tests before deciding whether the execute its body. for and while are both pre-test loops.
range
A built-in function in Python for generating sequences of integers. It is especially useful when we need to write a for loop that executes a fixed number of times.
single-step
A mode of interpreter execution where you are able to execute your program one step at a time, and inspect the consequences of that step. Useful for debugging and building your internal mental model of what is going on.
tab
A special character that causes the cursor to move to the next tab stop on the current line.
trichotomy
Given any real numbers a and b, exactly one of the following relations holds: a < b, a > b, or a == b. Thus when you can establish that two of the relations are false, you can assume the remaining one is true.
trace
To follow the flow of execution of a program by hand, recording the change of state of the variables and any output produced.
terminating condition
A condition that occurs which causes a loop to stop repeating its body. In the for loops we saw in this chapter, the terminating condition has been when there are no more elements to assign to the loop variable.

References

[ThinkCS]How To Think Like a Computer Scientist --- Learning with Python 3

Page précédente Page suivante