Using classes to de-couple code.
Overview
Teaching: 30 min
Exercises: 45 minQuestions
What is de-coupled code?
When is it useful to use classes to structure code?
How can we make sure the components of our software are reusable?
Objectives
Understand the object-oriented principle of polymorphism and interfaces.
Be able to introduce appropriate abstractions to simplify code.
Understand what decoupled code is, and why you would want it.
Be able to use mocks to replace a class in test code.
Introduction
When we’re thinking about units of code, one important thing to consider is whether the code is decoupled (as opposed to coupled). Two units of code can be considered decoupled if changes in one don’t necessitate changes in the other. While two connected units can’t be totally decoupled, loose coupling allows for more maintainable code:
- Loosely coupled code is easier to read as you don’t need to understand the detail of the other unit.
- Loosely coupled code is easier to test, as one of the units can be replaced by a test or mock version of it.
- Loose coupled code tends to be easier to maintain, as changes can be isolated from other parts of the code.
Introducing abstractions is a way to decouple code. If one part of the code only uses another part through an appropriate abstraction then it becomes easier for these parts to change independently.
Exercise: Decouple the file loading from the computation
Currently the function is hard coded to load all the files in a directory Decouple this into a separate function that returns all the files to load
Solution
You should have written a new function that reads all the data into the format needed for the analysis:
def load_inflammation_data(dir_path): data_file_paths = glob.glob(os.path.join(dir_path, 'inflammation*.csv')) if len(data_file_paths) == 0: raise ValueError(f"No inflammation csv's found in path {dir_path}") data = map(models.load_csv, data_file_paths) return list(data)
This can then be used in the analysis.
def analyse_data(data_dir): data = load_inflammation_data(data_dir) daily_standard_deviation = compute_standard_deviation_by_data(data) ...
This is now easier to understand, as we don’t need to understand the the file loading to read the statistical analysis, and we don’t have to understand the statistical analysis when reading the data loading. Ensure you re-run our regression test to check this refactoring has not changed the output of
analyse_data
.
Even with this change, the file loading is coupled with the data analysis.
For example, if we wave to support reading JSON files or CSV files
we would have to pass into analyse_data
some kind of flag indicating what we want.
Instead, we would like to decouple the consideration of what data to load
from the analyse_data
` function entirely.
One way we can do this is to use a language feature called a class.
Using Python Classes
A class is a way of grouping together data with some specific methods. In Python, you can declare a class as follows:
class Circle:
pass
They are typically named using UpperCase
.
You can then construct a class elsewhere in your code by doing the following:
my_circle = Circle()
When you construct a class in this ways, the classes construtor is called. It is possible to pass in values to the constructor that configure the class:
class Circle:
def __init__(self, radius):
self.radius = radius
my_circle = Circle(10)
The constructor has the special name __init__
(one of the so called “dunder methods”).
Notice it also has a special first parameter called self
(called this by convention).
This parameter can be used to access the current instance of the object being created.
A class can be thought of as a cookie cutter template, and the instances are the cookies themselves. That is, one class can have many instances.
Classes can also have methods defined on them.
Like constructors, they have an special self
parameter that must come first.
import math
class Circle:
...
def get_area(self):
return math.pi * self.radius * self.radius
...
print(my_circle.get_area())
Here the instance of the class, my_circle
will be automatically
passed in as the first parameter when calling get_area
.
Then the method can access the member variable radius
.
Exercise: Use a class to configure loading
Put the
load_inflammation_data
function we wrote in the last exercise as a member method of a new class calledCSVDataSource
. Put the configuration of where to load the files in the classes constructor. Once this is done, you can construct this class outside the the statistical analysis and pass the instance in toanalyse_data
.Hint
When we have completed the refactoring, the code in the
analyse_data
function should look like:def analyse_data(data_source): data = data_source.load_inflammation_data() daily_standard_deviation = compute_standard_deviation_by_data(data) ...
The controller code should look like:
data_source = CSVDataSource(os.path.dirname(InFiles[0])) analyse_data(data_source)
Solution
You should have created a class that looks something like this:
class CSVDataSource: """ Loads all the inflammation csvs within a specified folder. """ def __init__(self, dir_path): self.dir_path = dir_path def load_inflammation_data(self): data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.csv')) if len(data_file_paths) == 0: raise ValueError(f"No inflammation csv's found in path {self.dir_path}") data = map(models.load_csv, data_file_paths) return list(data)
We can now pass an instance of this class into the the statistical analysis function. This means that should we want to re-use the analysis it wouldn’t be fixed to reading from a directory of CSVs. We have “decoupled” the reading of the data from the statistical analysis.
def analyse_data(data_source): data = data_source.load_inflammation_data() daily_standard_deviation = compute_standard_deviation_by_data(data) ...
In the controller, you might have something like:
data_source = CSVDataSource(os.path.dirname(InFiles[0])) analyse_data(data_source)
While the behaviour is unchanged, how we call
analyse_data
has changed. We must update our regression test to match this, to ensure we haven’t broken the code:... def test_compute_data(): from inflammation.compute_data import analyse_data path = Path.cwd() / "../data" data_source = CSVDataSource(path) result = analyse_data(data_source) expected_output = [0.,0.22510286,0.18157299,0.1264423,0.9495481,0.27118211 ...
Interfaces
Another important concept in software design is the idea of interfaces between different units in the code. One kind of interface you might have come across are APIs (Application Programming Interfaces). These allow separate systems to communicate with each other - such as a making an API request to Google Maps to find the latitude and longitude of an address.
However, there are internal interfaces within our software that dictate how different parts of the system interact with each other. Even if these aren’t thought out or documented, they still exist!
For example, our Circle
class implicitly has an interface:
you can call get_area
on it and it will return a number representing its area.
Exercise: Identify the interface between
CSVDataSource
andanalyse_data
What is the interface that CSVDataSource has with
analyse_data
. Think about what functionsanalyse_data
needs to be able to call, what parameters they need and what it will return.Solution
The interface is the
load_inflammation_data
method.It takes no parameters.
It returns a list where each entry is a 2D array of patient inflammation results by day Any object we pass into
analyse_data
must conform to this interface.
Polymorphism
It is possible to design multiple classes that each conform to the same interface.
For example, we could provide a Rectangle
class:
class Rectangle(Shape):
def __init__(self, width, height):
self.width = width
self.height = height
def get_area(self):
return self.width * self.height
Like Circle
, this class provides a get_area
method.
The method takes the same number of parameters (none), and returns a number.
However, the implementation is different.
When classes share an interface, then we can use an instance of a class without knowing what specific class is being used. When we do this, it is called polymorphism.
Here is an example where we create a list of shapes (either Circles or Rectangles)
and can then find the total area.
Note how we call get_area
and Python is able to call the appropriate get_area
for each of the shapes.
my_circle = Circle(radius=10)
my_rectangle = Rectangle(width=5, height=3)
my_shapes = [my_circle, my_rectangle]
total_area = sum(shape.get_area() for shape in my_shapes)
This is an example of abstraction - when we are calculating the total area, the method for calculating the area of each shape is abstracted away to the relevant class.
How polymorphism is useful
As we saw with the Circle
and Square
examples, we can use common interfaces and polymorphism
to abstract away the details of the implementation from the caller.
For example, we could replace our CSVDataSource
with a class that reads a totally different format,
or reads from an external service.
All of these can be added in without changing the analysis.
Further - if we want to write a new analysis, we can support any of these data sources
for free with no further work.
That is, we have decoupled the job of loading the data from the job of analysing the data.
Exercise: Introduce an alternative implementation of DataSource
Create another class that supports loading JSON instead of CSV. There is a function in
models.py
that loads from JSON in the following format:[ { "observations": [0, 1] }, { "observations": [0, 2] } ]
It should implement the
load_inflammation_data
method. Finally, at run time construct an appropriate instance based on the file extension.Solution
You should have created a class that looks something like:
class JSONDataSource: """ Loads all the inflammation JSON's within a specified folder. """ def __init__(self, dir_path): self.dir_path = dir_path def load_inflammation_data(self): data_file_paths = glob.glob(os.path.join(self.dir_path, 'inflammation*.json')) if len(data_file_paths) == 0: raise ValueError(f"No inflammation JSON's found in path {self.dir_path}") data = map(models.load_json, data_file_paths) return list(data)
Additionally, in the controller will need to select the appropriate DataSource to provide to the analysis:
_, extension = os.path.splitext(InFiles[0]) if extension == '.json': data_source = JSONDataSource(os.path.dirname(InFiles[0])) elif extension == '.csv': data_source = CSVDataSource(os.path.dirname(InFiles[0])) else: raise ValueError(f'Unsupported file format: {extension}') analyse_data(data_source)
As you have seen, all these changes were made without modifying the analysis code itself.
Testing using Mock Objects
We can use this abstraction to also make testing more straight forward. Instead of having our tests use real file system data, we can instead provide a mock or dummy implementation instead of one of the real classes. Providing what we substitute conforms to the same interface, the code we are testing will work just the same. This dummy implementation could just returns some fixed example data.
An convenient way to do this in Python is using Python’s mock object library. These are a whole topic to themselves - but a basic mock can be constructed using a couple of lines of code:
from unittest.mock import Mock
mock_version = Mock()
mock_version.method_to_mock.return_value = 42
Here we construct a mock in the same way you’d construct a class. Then we specify a method that we want to behave a specific way.
Now whenever you call mock_version.method_to_mock()
the return value will be 42
.
Exercise: Test using a mock or dummy implementation
Complete this test for analyse_data, using a mock object in place of the
data_source
:from unittest.mock import Mock def test_compute_data_mock_source(): from inflammation.compute_data import analyse_data data_source = Mock() # TODO: configure data_source mock result = analyse_data(data_source) # TODO: add assert on the contents of result
Create a mock for to provide as the
data_source
that returns some fixed data to test theanalyse_data
method. Use this mock in a test.Don’t forget you will need to import
Mock
from theunittest.mock
package.Solution
from unittest.mock import Mock def test_compute_data_mock_source(): from inflammation.compute_data import analyse_data data_source = Mock() data_source.load_inflammation_data.return_value = [[[0, 2, 0]], [[0, 1, 0]]] result = analyse_data(data_source) npt.assert_array_almost_equal(result, [0, math.sqrt(0.25) ,0])
Object Oriented Programming
Using classes, particularly when using polymorphism, are techniques that come from
object oriented programming (frequently abbreviated to OOP).
As with functional programming different programming languages will provide features to enable you
to write object oriented code.
For example, in Python you can create classes, and use polymorphism to call the
correct method on an instance (e.g when we called get_area
on a shape, the appropriate get_area
was called).
Object oriented programming also includes information hiding. In this, certain fields might be marked private to a class, preventing them from being modified at will.
This can be used to maintain invariants of a class (such as insisting that a circles radius is always non-negative).
There is also inheritance, which allows classes to specialise the behaviour of other classes by inheriting from another class and overriding certain methods.
As with functional programming, there are times when object oriented programming is well suited, and times where it is not.
Good uses:
- Representing real world objects with invariants
- Providing alternative implementations such as we did with DataSource
- Representing something that has a state that will change over the programs lifetime (such as elements of a GUI)
One downside of OOP is ending up with very large classes that contain complex methods. As they are methods on the class, it can be hard to know up front what side effects it causes to the class. This can make maintenance hard.
Classes and functional programming
Using classes is compatible with functional programming. In fact, grouping data into logical structures (such as three numbers into a vector) is a vital step in writing readable and maintainable code with any approach. However, when writing in a functional style, classes should be immutable. That is, the methods they provide are read-only. If you require the class to be different, you’d create a new instance with the new values. (that is, the functions should not modify the state of the class).
Don’t use features for the sake of using features. Code should be as simple as it can be, but not any simpler. If you know your function only makes sense to operate on circles, then don’t accept shapes just to use polymorphism!
Key Points
Classes can help separate code so it is easier to understand.
By using interfaces, code can become more decoupled.
Decoupled code is easier to test, and easier to maintain.