Source: this section is heavily based on Chapter 15 of [ThinkCS] though adapted to better fit with the contents, terminology and notations of this particular course.
Python is an object-oriented programming language, which means that it supports many of the features of the [object_oriented_programming] paradigm.
Object-oriented programming (OOP) has its roots in the 1960s, but it wasn't until the mid 1980s that it became a mainstream [programming_paradigm] used in the creation of new software. It was developed as a way to handle the rapidly increasing size and complexity of software systems, and to make it easier to modify and maintain these large and complex systems over time.
Up to now, most of the programs we have been writing in this course used a [procedural_programming] style. In the procedural programming paradigm the focus is on writing functions or procedures, which operate on data. In object-oriented programming, the focus is on creating objects which group both data and the functions or methods, which operate on that data. We have already seen examples of objects such as turtles and strings. An object definition often corresponds to some object or concept in the real world, and the functions (methods) that operate on (the data encapsulated in) that object correspond to the ways those real-world objects can interact.
Objects are created from classes. Classes describe what methods an object understands and what data it contains. We've already seen classes like str, int, float and Turtle. We are now ready to create our first user-defined class: the Point.
Consider the concept of a mathematical point. In two dimensions, a point can be considered as a pair of two numbers (the point's coordinates) that are treated collectively as a single object. Points are often written in parentheses, with a comma separating the coordinates. For example, (0,0) represents the origin, and (x,y) represents the point x units to the right (or left, if negative) and y units up (or down, if negative) from the origin.
Some of the typical operations that one associates with points might be calculating the distance of a point from the origin, or from another point, or finding a midpoint of two points, or asking if a point falls within a given rectangle or circle. We'll shortly see how we can organise these operations together with the data.
A natural way to represent a point in Python is with two numeric values. The question, then, is how to group these two values into a compound object. The quick and dirty solution is to use a tuple, for example we could write p = (0,0) and q = (1,1), and for some applications that might be a good choice. But we would still need to define dedicated procedures to do something useful with these tuples representing points.
An alternative is to define a new class. This approach involves a bit more effort, but its many advantages will become apparent soon. Since we want each of our points to have an x and a y value, our first class definition looks like this:
class Point: """ The Point class represents and manipulates x,y coordinates. """ def __init__(self): """ Create a new point at the origin """ self.x = 0 self.y = 0
Although class definitions like the one above can appear anywhere in a program, they are usually put near the beginning (after the import statements). Some programmers and languages prefer to put every class in a file or module of its own --- we won't do that for now. The syntax rules for a class definition are the same as for other compound statements. There is a header which begins with the keyword, class, followed by the name of the class, and ending with a colon. Indentation levels tell us where the class ends.
If the first line after the class header is a string, it becomes the [docstring] of the class, and will be recognised by various tools. (This is also the way docstrings work in functions.)
Every class should have an initialiser method which is automatically called whenever a new object (also known as instance) of that class is created (in what follows we will use the terms object and instance interchangeably). This initialiser method has a special name __init__ (with a double underscore character before and after the name). For the class Point, the __init__ method sets the x and y coordinates of the created object to zero. In general, the __init__ method gives a programmer the opportunity to set up the attributes required within a new instance of the class by giving them their initial state/values. The self parameter (we could choose any other name, but self is the convention) is automatically set to reference the newly created object that needs to be initialised. So, for example, self.x = 0 will assign the value of 0 to the x attribute of the newly created point object itself.
We can use our new Point class now to create two Point objects:
p = Point() # Instantiate an object of type Point q = Point() # Make a second point object print(p.x, p.y, q.x, q.y) # Each point object has its own x and y
This program prints:
0 0 0 0
because during the initialisation of the objects p and q, we created two attributes called x and y for each, and gave them both the value 0.
This way of creating objects should look familiar to you. We've used classes before to create multiple Turtle objects:
from turtle import Turtle tess = Turtle() # Instantiate an object of type Turtle alex = Turtle() # Instantiate a second object of type Turtle
The variables p and q above are assigned references to two new Point objects. A function like Turtle() or Point() that creates a new object instance from its corresponding class is called a constructor. Every class automatically provides a constructor function which is named the same as the class.
It may be helpful to think of a class as a factory for making objects. The class itself isn't an instance of a point, but it contains the machinery to make point instances. Every time we call the constructor, we're asking the factory to make us a new object. As the object comes off the production line, its initialisation method is executed to get the object properly set up with its default factory settings.
The combined process of "construct me a new object" and "get its settings initialised to the factory default settings" is called instantiation.
Object instances have both attributes (the data contained in the instance) and methods (the operations that act on that data). Whereas the methods are the same for all objects of a same class (we will see in a next section how to define such methods), the attribute values are specific to each particular instance of that class. For that reason, the attributes are sometimes also referred to as instance variables. Of course, initially they are initialised to the same factory default settings, but once an object has been created, we can modify its attribute values by using the following dot notation:
>>> p.x = 3 >>> p.y = 4
This sets the x attribute of the object instance p to the value 3 and its y attribute to the value 4.
Both modules and instances create their own namespaces, and the syntax for accessing names contained in each, called attributes, is the same. In this case the attribute we are selecting is a data item from an instance.
The following memory diagram shows the result of these assignments:
The variable p refers to a Point object, which contains two attributes x and y. Each attribute contains a number.
We can access the value of an attribute using the same syntax:
>>> print(p.y) 4 >>> x = p.x >>> print(x) 3
The expression p.x means, "Go to the point object that p refers to and get the value of its attribute named x". In this case, we assign that value to a global variable named x. There is no conflict between the variable named x (in the global namespace) and the attribute named x (in the namespace belonging to the instance). The purpose of the dot notation is to fully qualify which variable we are referring to unambiguously.
We can use dot notation as part of any expression, so the following statements are legal:
print("(x={0}, y={1})".format(p.x, p.y)) distance_from_origin = pow(p.x * p.x + p.y * p.y,1/2) print(distance_from_origin)
The first line outputs (x=3, y=4). (Note that the first line is equivalent to writing print("(x="+str(p.x)+", y="+str(p.y)+")") but uses the [format] method which supports advanced string formatting.) The second line calculates the value 5. The third line prints this calculated value.
To create a new point object at position (7, 6) we currently need three lines of code:
p = Point() # Create a new instance of class Point p.x = 7 # Set its x attribute to the value 7 p.y = 6 # Set its y attribute to the value 6
We can make our class constructor more general by placing extra parameters into the __init__ method, as shown in this example:
class Point: """ The Point class represents and manipulates x,y coordinates. """ def __init__(self, x=0, y=0): """ Create a new point at coordinates x, y. @pre: x and y are numbers (if not supplied, the number 0 will be used) @post: the attributes x and y of this point instance have been initialised to the values x and y passed as arguments """ self.x = x self.y = y # Other statements outside the class continue below here.
The x and y parameters here are both optional. If the caller does not supply any arguments for x and y, they'll get the default values of 0. Here is our improved class in action:
>>> p = Point(4, 2) >>> q = Point(6, 3) >>> r = Point() # r represents the origin (0, 0) >>> print(p.x, q.y, r.x) 4 3 0
Below you can find another memory diagram depicting the three objects that have been created in the computer's memory.
Technically speaking ...
If we are really fussy, we would argue that the __init__ method's docstring is inaccurate. Indeed, __init__ doesn't create the object (i.e. set aside memory for it; it's the constructor that does that), --- it just initialises the object to its factory-default settings after its creation.
But programming tools like for example PyScripter understand that instantiation --- creation and initialisation --- happen together, and they choose to display the initialiser's docstring as the tooltip to guide the programmer that calls the class constructor.
So we're writing the docstring so that it makes the most sense when it pops up to help the programmer who is using our Point class:
The key advantage of using a class like Point rather than a simple tuple (6, 7) now becomes apparent. We can add methods to the Point class that are sensible operations for points, but which may not be appropriate for other tuples like (25, 12) which might represent, say, a day and a month, e.g. Christmas day. So being able to calculate the distance from the origin is sensible for points, but not for (day, month) data. For (day, month) data, we'd like different operations, perhaps to find what day of the week it will fall on in 2050.
Creating a class like Point brings an exceptional amount of "organisational power" to our programs, and to our thinking. We can group together the sensible operations, and the kinds of data they apply to, and each instance of the class can have its own individual state.
A method behaves like a function except that it is invoked on a specific instance, e.g. t.right(90) which turns a Turtle object t 90 degrees to the right. Like data attributes, methods are accessed using the dot notation.
instance methods versus class methods
Technically speaking, there exist two kinds of methods in Python: instance methods, which can be invoked on specific instances (i.e., objects), and class methods, which can be invoked on a class itself without having to create an instance of that class first. Since most of the methods you will encounter will be instance methods, for now, when we use the term method, we mean instance method. We will not explain the notion of class methods yet, in order not to confuse you more than necessary.
Let's add another method, distance_from_origin, to our class Point to see better how methods work:
class Point: """ The Point class represents and manipulates x,y coordinates. """ def __init__(self, x=0, y=0): """ Create a new point at coordinates x, y. @pre: x and y are numbers (if not supplied, the number 0 will be used) @post: the attributes x and y of this point instance have been initialised to the values x and y passed as arguments """ self.x = x self.y = y def distance_from_origin(self): """ Compute my distance from the origin @pre: - @post: returns the Euclidian distance of this point to the origin (0,0) """ return pow((self.x ** 2) + (self.y ** 2),1/2)
When defining a method, it must always have a first parameter that refers to the instance being manipulated, i.e. the object itself. For that reason it is customary to name this parameter self.
Now let's create a few point instances, look at their attributes, and call our new distance calculation method on them. (Note that we must execute our new class definition above first, to make our modified Point class available to the interpreter.)
>>> p = Point(3, 4) >>> p.x 3 >>> p.y 4 >>> p.distance_from_origin() 5.0 >>> q = Point(5, 12) >>> q.x 5 >>> q.y 12 >>> q.distance_from_origin() 13.0 >>> r = Point() >>> r.x 0 >>> r.y 0 >>> r.distance_from_origin() 0.0
Notice that, although the method distance_from_origin(self) was defined with a first parameter self, the caller of distance_from_origin() does not explicitly supply an argument to match this self parameter; nevertheless this parameter will be bound to self automatically, behind our back. Remember that: when you define a method in a class you should add a first parameter self representing the instance being manipulated; when calling the method you should drop that parameter, it will be filled in automatically behind your back.
We can pass any object as an argument in the usual way. We've already seen this in some of the turtle examples, where we passed the turtle to some function, so that the function could control and use whatever turtle instance we passed to it. Be aware that a variable only holds a reference to an object, so passing a turtle object into a function creates an alias: both the caller and the called function now have a reference to that turtle, but there is only one turtle!
Here is a simple function involving our new Point objects:
def print_point(pt): print("({0}, {1})".format(pt.x, pt.y))
print_point takes a point as argument and formats the output in whichever way we choose. If we call print_point(p) with point p as defined previously, the output is (3, 4).
However, an object-oriented programmer would not do what we've just done with print_point. Rather than having a globally defined print function outside of the class definition, when working with classes and objects, a preferred alternative is to add a new method to the class definition. And we don't like chatterbox methods that call print. A better approach is to have a method so that every instance can produce a string representation of itself. This string representation can then easily be printed from the outside. Let's call this method that produces a string representation of an object to_string:
class Point: # ... same as before ... def to_string(self): return "({0}, {1})".format(self.x, self.y)
Again, observe how the method to_string takes a parameter self as first argument. Also observe how the point's attributes are accessed within that method by referring to self using the dot notation.
(As a reminder, the statement "({0}, {1})".format(self.x, self.y) is equivalent to writing "("+str(self.x)+", "+str(self.y)+")".)
Now we can say:
>>> p = Point(3, 4) >>> print(p.to_string()) (3, 4)
But doesn't there already exist a str type converter that can turn an object into a string? Yes! And doesn't print automatically use this when printing things? Yes again! But these automatic mechanisms do not (yet) seem to do exactly what we want:
>>> str(p) '<__main__.Point object at 0x01F9AA10>' >>> print(p) '<__main__.Point object at 0x01F9AA10>'
Rather than printing the contents of the object they print a unique reference to the object.
Luckily Python has a clever trick to fix this. If we call our new method __str__ (with a double underscore character before and after the method name) instead of to_string, the Python interpreter will use our code instead of the default str function whenever it needs to convert a Point to a string. Let's re-do this again, now:
class Point: # ... same as before ... def __str__(self): # All we have done is renamed the method return "({0}, {1})".format(self.x, self.y)
and now things are looking great!
>>> str(p) # Python now magically uses the __str__ # method that we wrote. (3, 4) >>> print(p) (3, 4)
Such special methods like __str__ (and also the __init__ method introduced before) are called [magic_methods] in Python. Typically, whenever you define your own new classes, you may want to implement such an __str__ method on them, to be able to easily inspect objects of those classes by printing them.
Functions and methods can return object instances. For example, given two Point objects, find their midpoint. First we'll write this as a regular function:
def midpoint(p1, p2): """ @pre: p1 and p2 are instances of class Point @post: returns the midpoint of points p1 and p2 """ mx = (p1.x + p2.x)/2 my = (p1.y + p2.y)/2 return Point(mx, my)
This function creates and returns a new Point object:
>>> p = Point(3, 4) >>> q = Point(5, 12) >>> r = midpoint(p, q) >>> print(r) (4.0, 8.0)
However, as mentioned before, an object-oriented programmer would prefer to define this as a method defined on the class, rather than as a globally defined function. So, let us try to write this function as a method instead. Suppose we have a point object, and wish to find the midpoint halfway between itself and some other target point:
class Point: # ... def halfway(self, target): """ @pre: target is an instance of class Point @post: returns a new instance of class Point representing the halfway point between myself and the target """ mx = (self.x + target.x)/2 my = (self.y + target.y)/2 return Point(mx, my)
This method is almost identical to the function, aside from some renaming. It's usage might be like this:
>>> p = Point(3, 4) >>> q = Point(5, 12) >>> r = p.halfway(q) >>> print(r) (4.0, 8.0)
While this example assigns each point to a variable, this need not be done. Just as function calls are composable, method calls and object instantiation are also composable, leading to this alternative that uses no variables:
>>> print(Point(3, 4).halfway(Point(5, 12))) (4.0, 8.0)
The original syntax for a function call, print_time(current_time), suggests that the function is the active agent. It says something like, "Hey, print_time! Here's an object for you to print."
In object-oriented programming, objects are considered the active agents instead. An invocation like current_time.print_time() says "Hey current_time! Please print yourself!"
In our early introduction to turtles, we used an object-oriented style, so that we said t.forward(100), which asks the turtle t to move itself forward by the given number of steps.
This change in perspective might be more polite, but it may not initially be obvious that it is useful. But sometimes shifting responsibility from the functions onto the objects makes it possible to write more versatile functions, and makes it easier to maintain and reuse code.
The most important advantage of the object-oriented style is that it fits our mental chunking and real-life experience more accurately. In real life our cook method is part of our microwave oven --- we don't have a cook function sitting in the corner of the kitchen, into which we pass the microwave! Similarly, we use the cellphone's own methods to send an sms, or to change its state to silent. The functionality of real-world objects tends to be tightly bound up inside the objects themselves. [object_oriented_programming] allows us to accurately mirror this when we organise our programs.
Objects are most useful when we also need to keep some state that is updated from time to time. Consider a turtle object. Its state consists of things like its position, its heading, its colour, and its shape. A method like left(90) updates the turtle's heading, forward changes its position, and so on.
For a bank account object, a main component of the state would be the current balance, and perhaps a log of all transactions. The methods would allow us to query the current balance, deposit new funds, or make a payment. Making a payment would include an amount, and a description, so that this could be added to the transaction log. We'd also want a method to show the transaction log.
- attribute
- One of the named data items that makes up an object. Another word for attribute is instance variable.
- class
- A user-defined compound type. A class can also be thought of as a template or factory for the objects that are instances of it.
- constructor
- A class can also be seen as a "factory" for making objects of a certain kind. Every class thus provides a constructor method, called by the same name as the class, for making new instances of this kind. If the class has an initialiser method, this method is used to get the attributes (i.e. the state) of the new instance properly set up.
- initialiser method
- A special method in Python (called __init__) that is invoked automatically to set a newly created object's attributes to their initial (factory-default) state.
- instance
- An object whose type is of some class. The words instance and object are used interchangeably.
- instance variable
- Since the attribute values of an object are specific to that particular object (i.e., another object of the same class may have another value for that attribute), they are sometimes also referred to as instance variables.
- instantiate
- To create an instance of a class, and to run its initialiser method.
- instance method
- A function that is defined inside a class definition and is invoked on instances of that class.
- magic method
- Magic methods are special methods like __init__ or __str__ that you can define to add some magic to your classes. For example Python magically knows that when a new object gets constructed it should call the __init__ method to initialise the attributes of the newly created object, or that when you print an object, it should call the __str__ method to get a printable string representation of the object. Magic methods are always surrounded by double underscores.
- method
- If it is clear from the context we will often refer to an instance method simply as a method. (We will learn later that there is also such a thing as class methods, which is not the same as instance methods.)
- object
- A compound data type that is often used to model a thing or concept in the real world. It bundles together the data and the operations that are relevant for that kind of data. The words instance and object are used interchangeably.
- object-oriented programming
- A powerful style of programming in which data and the operations that manipulate it are organized into objects.
- object-oriented language
- A language that provides features, such as user-defined classes and inheritance, that facilitate object-oriented programming.
[ThinkCS] | How To Think Like a Computer Scientist --- Learning with Python 3 |
[object_oriented_programming] | (1, 2) http://en.wikipedia.org/wiki/Object-oriented_programming |
[programming_paradigm] | http://en.wikipedia.org/wiki/Programming_paradigm |
[procedural_programming] | http://en.wikipedia.org/wiki/Procedural_programming |
[magic_methods] | https://rszalski.github.io/magicmethods/ |
[docstring] | https://www.python.org/dev/peps/pep-0257/ |
[format] | https://www.python.org/dev/peps/pep-3101/#id16 |