Specifications and Tests

Source: this section is not based on external sources.

The main purpose of coding is to create an executable program. Code should however not only execute; it should also satisfy a number of additional requirements:

it should be correct: it should do what it is expected to do;
it should be readable: others should be able to easily understand it;
it should be maintainable: it should be easy to add features, remove features, correct problems.

To understand the importance of these requirements, consider the following sitations:

You wish to add a feature to a program that you wrote one year ago.
You write a program that a collaborator needs to modify later on.
You are contacted about a bug in a program that you wrote a year ago.
You wish to add a feature to a library written by somebody else.
You are asked to maintain source code that was written by somebody else.

In all these cases, it is not only important that code executes; it is also important that your code is of sufficiently good quality to support these other requirements. Writing code of good quality is not easy. Some programmers that understand Python very well, will still write code that is very hard to read. Consider this fragment of bad code:

f=type((lambda:(lambda:None for n in range(len(((((),(((),())))))))))().next())
u=(lambda:type((lambda:(lambda:None for n in range(len(zip((((((((())))))))))))).func_code))()
n=f(u(int(wwpd[4][1]),int(wwpd[7][1]),int(wwpd[6][1]),int(wwpd[9][1]),wwpd[2][1],
    (None,wwpd[10][1],wwpd[13][1],wwpd[11][1],wwpd[15][1]),(wwpd[20][1],wwpd[21][1]),
    (wwpd[16][1],wwpd[17][1],wwpd[18][1],wwpd[11][1],wwpd[19][1]),wwpd[22][1],wwpd[25][1],int(wwpd[4][1]),wwpd[0][1]),
    {wwpd[27][1]:__builtins__,wwpd[28][1]:wwpd[29][1]})
c=partial(n, [x for x in map(lambda i:n(i),range(int(0xbeef)))])
FIGHT = f(u(int(wwpd[4][1]),int(wwpd[4][1]),int(wwpd[5][1]),int(wwpd[9][1]),wwpd[3][1],
        (None, wwpd[23][1]), (wwpd[14][1],wwpd[24][1]),(wwpd[12][1],),wwpd[22][1],wwpd[26][1],int(wwpd[8][1]),wwpd[1][1]),
        {wwpd[14][1]:c,wwpd[24][1]:urlopen,wwpd[27][1]:__builtins__,wwpd[28][1]:wwpd[29][1]})
FIGHT(msg)

Maintaining, modifying or building on this code is very difficult, even for the most experienced programmer. Hence, it is important that while you learn how to program, you also pay attention to how to write good code. We don't want you to create code such as the one in this example!

In this chapter, we will introduce you to the practice of writing specifications for the functions used in a program, followed by tests that check whether the source code meets the specifications.

Specifications in Python

Consider the following code:

def f ( n ):
    for i in range(1,n):
        if n % i == 0:
            return False
    return True

We could try to read this code in order to understand it. However, this would require quite some effort from every programmer using this function. Better is to give the function an interpretable name and add documentation; as we have seen earlier in this syllabus, we can do so by adding a block of comments after the function definition:

def prime ( n ):
    """ Return whether the number n is prime """
    for i in range(1,n):
        if n % i == 0:
            return False
    return True

This code is already easier to understand. However, its documentation is still not very precise. For instance, the user of the code could wonder whether it is possible to execute this code for n=0, =-1, or n=0.5. In many projects it is desirable to make the specification of each parameter as precise as possible. One way of doing this is as follows:

def prime ( n ):
    """ pre: n an integer >= 1
        post: true if the number n is prime
    """
    for i in range(1,n):
        if n % i == 0:
            return False
    return True

In this case we have used the docstring to make the specification more precise.

In principle, we are completely free to write anything we want in a docstring; by default, Python does not look at the content of the docstring. It is hence possible to write anything that helps to understand the code.

In most parts of this course, we will use preconditions and postconditions to specify the functionality of a function in more detail. Specifications of preconditions and postconditions are precise, but not too long either.

In practice, larger software projects impose even more structure on the content of docstrings. One approach that is often used is that of Google Style docstrings, which were originally used by Google in its Python projects, but which are now also used in many other projects. The following code illustrates a Google Style docstring:

def prime ( n ):
    """ Determines whether n is a prime number.

    This function determines whether the positive integer n is a prime number.

    Args:
        n: a positive integer

    Returns:
        A boolean indicating whether n is a prime number.
    """
    for i in range(2,n):
        if n % i == 0:
            return False
    return True

The structure of a Google Style Python docstring is always similar:

def function name ( arguments ):
    """ One short line describing the function.

    Longer text to describe the function.

    Args:
        argument name 1: short type description, with a short description
        ... (all arguments are described) ...
        argument name n: short type description, with a short description

    Returns:
        A short description of what the function returns.
    """
    function code

Here is another example:

def zipper ( l1, l2 ):
    """ Zips two lists, such that elements of l1 and l2 are interleaved.

    This function zips two given lists l1 and l2 into a new list
    [l1[0],l2[0],l1[1],l2[1],...,l1[n],l2[n]],
    where l1 and l2 must be lists of equal length n.

    For example, zipper ( [1,2], [3,4] ) will return [1,3,2,4].

    Args:
        l1: a list of length n
        l2: a list of the same length as l1

    Returns:
        A new list with elements [l1[0],l2[0],l1[1],l2[1],...,l1[n],l2[n]]
    """
    new_list = []
    for i in range ( len ( l1 ) ):
        new_list.append ( l1[i] )
        new_list.append ( l2[i] )

    return new_list

Note that the docstring in this example is very verbose. In practice, one will not encounter such long docstrings for many short functions. Still, many programmers consider such long documentation the best approach. In this course we will from time to time ask you explicitly to provide such long Google Style docstrings; where this is not indicated explicitly, you may write shorter docstrings, such as using pre- and postconditions. However, you are always expected to provide an informative docstring.

Tests in Python

Now that we know how to write a proper specification for a function in Python, the next question is how we ensure that code satisfies the requirements specified in the docstring. A very common approach (that we will also use during the exams) is that of running unit tests. An unit test is a piece of code that tests whether a function operates as intended. This is an example of one unit test, for the prime function provided above:

if prime ( 10 ) == True:
    print ( "Error: 10 is not prime" )

This code runs the prime function and evaluates its return value: if the function returns True it has made an error, and we display an error message.

Note that there are many implementations of the prime function that will pass this test, while they are incorrect. For instance, this code will not give an error for the test case:

def prime ( n ):
    return False

Clearly, one test does not suffice to prove that an implementation of prime is correct.

The most straightforward solution to reduce the chances that an incorrect implementation is considered to be correct, is to write multiple tests:

if prime ( 10 ) == True:
    print ( "Error: 10 should not be prime" )
if prime ( 8 ) == True:
    print ( "Error: 8 should not be prime" )
if prime ( 5 ) == False:
    print ( "Error: 5 should be prime" )
if prime ( 3 ) == False:
    print ( "Error: 3 should be prime" )
if prime ( 997 ) == False:
    print ( "Error: 997 should be prime" )

How many test cases do we need to provide to test a function?

To answer this question, you will need to know more about theoretical computer science; you will need to study questions of computability. This is the subject of another course and will not be discussed in detail here. However, important to know is that in general, it is impossible to determine a finite set of tests to determine the correctness of a function that accepts a large number of different inputs. Hence, while tests can provide strong evidence that a function is implemented correctly, they are never sufficient evidence. In practice if often happens that you start with a small set of tests, while you discover later on that the code is still incorrect. In this case, the proper approach is to test cases.

Writing tests is so common in Python, that Python provides special notation to simplify their specification. A common approach is the one offered by the unittest module; however, you will need to know more about Python before you will be able to use this module. For most of this course, we will therefore rely on a more simple approach that is provided by the assert statement. Using this statement, the earlier set of tests can be written more compactly:

assert prime ( 10 ) == False, "10 should not be prime"
assert prime ( 8 ) == False, "8 should not be prime"
assert prime ( 5 ) == True, "5 should be prime"
assert prime ( 3 ) == True, "3 should be prime"
assert prime ( 997 ) == True, "997 should be prime"

Or even shorter:

assert not prime ( 10 ), "10 should not be prime"
assert not prime ( 8 ), "8 should not be prime"
assert prime ( 5 ), "5 should be prime"
assert prime ( 3 ), "3 should be prime"
assert prime ( 997 ), "997 should be prime"

If we execute this code for an incorrect impementation of prime, Python will give a message such as this one:

Traceback (most recent call last):
File "test.py", line 20, in <module>
assert prime ( 5 ) == True, "5 should be prime"
AssertionError: 5 should be prime

The execution of the code will stop immediately at the test case that fails. Hence, for any given assert statement, Python will test whether the condition provided is satisfied; if not, it will print the message provided and will stop the execution immediately.

For our zipper function we can now write test cases in a similar fashion:

assert zipper ( [1,2], [3,4] ) == [1,3,2,4], "[1,2], [3,4] not zipped correctly"
assert zipper ( [1,1,1], [2,2,2] ) == [1,2,1,2,1,2], "[1,2,1,2,1,2] not zipped correctly"
assert zipper ( [100,300], [400,200] ) == [100,400,300,200], "[100,300], [400,200] not zipped correctly"

Note that these cases only test lists of lengths smaller than 3; it could be good to add some larger test cases as well to reduce the chances that an incorrect implementation still passes all the tests.

We could put a function and its tests in one file, as follows:

def prime ( n ):
    """ Determines whether n is a prime number.

    This function determines whether the positive integer n is a prime number.

    Args:
        n: a positive integer

    Returns:
        A boolean indicating whether n is a prime number.
    """
    return True

assert not prime ( 10 ), "10 should not be prime"
assert not prime ( 8 ), "8 should not be prime"
assert prime ( 5 ), "5 should be prime"
assert prime ( 3 ), "3 should be prime"
assert prime ( 997 ), "997 should be prime"

If we execute this code, we will get one error message before the code terminates. For a correct implementation of the prime function, the code will execute without printing a message.

Testing becomes slightly more complex if we wish to separate a program and its tests in separate files. Suppose we have this program, stored in a file prime.py:

def prime ( n ):
    """ Determines whether n is a prime number.

    This function determines whether the positive integer n is a prime number.

    Args:
        n: a positive integer

    Returns:
        A boolean indicating whether n is a prime number.
    """
    return True

for i in range ( 100 ):
    print ( i, prime ( i ) )

How can we test the prime function in a separate program? We could consider writing a second program, as follows:

from prime import *

assert not prime ( 10 ), "10 should not be prime"
assert not prime ( 8 ), "8 should not be prime"
assert prime ( 5 ), "5 should be prime"
assert prime ( 3 ), "3 should be prime"
assert prime ( 997 ), "997 should be prime"

However, this code has undesirable behavior: when executing the import statement, it will also execute the print statements in the prime.py program. The reason for this is that when executing the import statement, Python will execute all code in the prime.py file, including the print statements. How can we avoid that the print statements are executed? The standard solution in Python for this is to modify the prime.py file as follows:

def prime ( n ):
    """ Determines whether n is a prime number.

    This function determines whether the positive integer n is a prime number.

    Args:
        n: a positive integer

    Returns:
        A boolean indicating whether n is a prime number.
    """
    return True

if __name__ == "__main__":
    for i in range ( 100 ):
        print ( i, prime ( i ) )

In this code, the code below __name__ == "__main__" is only executed when the code is not imported from another file. This allows a second program to test the functions in the program without executing the code printing statements in the file.

Adding Assertions in Code

As we have seen in the earlier section, the assert statement is very useful to test whether a function has been implemented correctly. However,this is not the only way that assert can be used. Reconsider our implementation of the prime function:

def prime ( n ):
    """ Determines whether n is a prime number.

    This function determines whether the positive integer n is a prime number.

    Args:
        n: a positive integer

    Returns:
        A boolean indicating whether n is a prime number.
    """
    for i in range(2,n):
        if n % i == 0:
            return False
    return True

Note that in the specification we indicated that this function should only be executed on parameters that represent positive integers. While a perfect programmer would hence not use this function in any other context, no programmer is perfect. The incorrect use of a function can sometimes cause bugs that are very hard to track down. To help a programmer, it can be good to check that the function is used correctly. We can use assert here as well:

def prime ( n ):
    """ Determines whether n is a prime number.

    This function determines whether the positive integer n is a prime number.

    Args:
        n: a positive integer

    Returns:
        A boolean indicating whether n is a prime number.
    """
    assert type ( n ) == int and n >= 1, "n should be a positive integer"
    for i in range(2,n):
        if n % i == 0:
            return False
    return True

With this added line, every time a programmer uses the function with an argument of an incorrect type, or with a smaller than 1, the code will stop and give an error.

In this course, we will not require you to add asserts to all your functions; however, from time to time we will explicitly ask you to do so.

For the example of lists, the full code now becomes:

def zipper ( l1, l2 ):
    """ Zips two lists, such that elements of l1 and l2 are interleaved.

    This function zips two given lists l1 and l2 into a new list
    [l1[0],l2[0],l1[1],l2[1],...,l1[n],l2[n]],
    where l1 and l2 must be lists of equal length n.

    For example, zipper ( [1,2], [3,4] ) will return [1,3,2,4].

    Args:
        l1: a list of length n
        l2: a list of the same length as l1

    Returns:
        A new list with elements [l1[0],l2[0],l1[1],l2[1],...,l1[n],l2[n]]
    """
    assert type ( l1 ) == list and type ( l2 ) == list and len ( l1 ) == len ( l2 ), "l1 and l2 must be two lists of equal length"
    new_list = []
    for i in range ( len ( l1 ) ):
        new_list.append ( l1[i] )
        new_list.append ( l2[i] )

    return new_list

Note that the asserts that are added in the code should correspond closely to the information provided in the docstring.

Other Considerations

At this point you may have the impression that adding specifications, tests and asserts is sufficient to write code of good quality. Certainly it helps. However, it is not sufficient. During this course and during your study you will encounter additional approaches for writing code of good quality. We wish to mention a number of recommendations here:

Make sure that all functions and variables have names that are easy to understand.
Use a proper layout for your code, including white spaces. The gold standard for this is the so-called PEP 8 [Pep8] specification. It is highly recommended that your code confirms to PEP 8 standards, so click on this link to check its contents.
Make sure that your functions are not too long, and that each function has a clearly defined task.
Avoid copy-pasting code: if you need to copy a piece of code, consider whether it would make sense to put that piece of code in a separate function.

References

[Pep8]

PEP 8 -- Style Guide for Python Code