Monkey Patching in Python
- Importance of Dynamic Languages in Monkey Patching
- Implement a Monkey Patch in Python
- Use Monkey Patch for Unit Testing in Python
- Conclusion
A piece of code is written to achieve the desired outcome, like sending data from the user to a database. But the code needs to be tweaked during testing phases, like checking whether the code runs correctly or for bugs.
Monkey patching is the process of assigning a stub or a similar piece of code so that the default behavior of the code gets changed. This article will focus on different ways for monkey patching in python.
Importance of Dynamic Languages in Monkey Patching
Only dynamic languages, of which Python is an excellent example, can be used for monkey patching. In static languages where everything needs to be defined, monkey patching is impossible.
As an example, monkey patching is the practice of adding attributes (whether methods or variables) during runtime rather than altering the object description. These are frequently used when working with those modules whose source code is unavailable, making it difficult to update the object definitions.
Monkey patching in Python can be helpful if a new version of an object is built with patched-in members inside a decorator instead of altering an existing object or class.
Implement a Monkey Patch in Python
Monkey patching in Python will be demonstrated through this program. A method will be assigned to a new decorated method to monkey patch it during runtime.
Code:
import pandas as pd
def word_counter(self):
"""This method will return all the words inside the column that has the word 'tom'"""
return [i for i in self.columns if "tom" in i]
pd.DataFrame.word_counter_patch = word_counter # monkey-patch the DataFrame class
df = pd.DataFrame([list(range(4))], columns=["Arm", "tomorrow", "phantom", "tommy"])
print(df.word_counter_patch())
Output:
"C:\Users\Win 10\main.py"
['tomorrow', 'phantom', 'tommy']
Process finished with exit code 0
Let’s break down the code to understand monkey patching in Python.
The first line of code imports the library pandas
used to create data frames in the program.
import pandas as pd
Then, since the distinction between a function and an unbound method is largely useless in Python 3, a method definition is established that exists unbound and free outside the scope of any class definitions:
def word_counter(self):
"""This method will return all the words inside the column that has the word 'tom'"""
return [i for i in self.columns if "tom" in i]
A new class is created using pd.Dataframe.word_counter
. Then the newly created class is attached to the method word_counter
.
What it does is it monkey patches the method word_counter
with the data frame class.
pd.DataFrame.word_counter_patch = word_counter # monkey-patch the DataFrame class
Once the method is attached to the class, a new data frame must be created to store the words. This data frame is assigned an object variable df
.
df = pd.DataFrame([list(range(4))], columns=["Arm", "tomorrow", "phantom", "tommy"])
Lastly, the monkey patch class is called by passing the data frame df
to it, which is printed. What happens here is that when the compiler calls the class word_counter_patch
, the monkey patching passes the data frame to the method word_counter
.
As classes and methods can be treated as object variables in dynamic programming languages, monkey patching in Python can be applied to methods using other classes.
print(df.word_counter_patch())
Use Monkey Patch for Unit Testing in Python
So far, we’ve learned how monkey patching in Python is executed on functions. This section will examine how to monkey patch the global variables using Python.
Pipelines will be used to demonstrate this example. For readers new to pipelines, it is a process to train and test machine learning models.
A pipeline has two modules, a training module that collects data - like text or images and a testing module.
What this program does is that the pipeline is created to search for several files in the data directory. In the test.py
file, the program creates a temporary directory with a single file and searches for the number of files in that directory.
Train the Pipeline for Unit Testing
The program creates a pipeline that collects data from two plain text files stored inside a directory, data
. To recreate this process, we must create the pipeline.py
and test.py
Python files in a parent directory where the folder data
is stored.
The pipeline.py
File:
from pathlib import Path
DATA_DIR = Path(__file__).parent / "data"
def collect_files(pattern):
return list(DATA_DIR.glob(pattern))
Let’s breakdown the code:
The pathlib
is imported as Path
will be used inside the code.
from pathlib import Path
This is a global variable DATA_DIR
that stores the location of data files. The Path
indicates the file inside the parent directory data.
DATA_DIR = Path(__file__).parent / "data"
A function collect_files
is created that takes one parameter, which is the string pattern that is needed to be searched.
The DATA_DIR.glob
method searches the pattern inside the data directory. The method returns a list.
def collect_files(pattern):
return list(DATA_DIR.glob(pattern))
How can the method collect_files
be tested correctly as a global variable is used at this point?
A new file, test.py
, needs to be created to store the code for testing the pipeline class.
The test.py
File:
import pipeline
def test_collect_files(tmp_path):
# given
temp_data_directory = tmp_path / "data"
temp_data_directory.mkdir(parents=True)
temp_file = temp_data_directory / "file1.txt"
temp_file.touch()
expected_length = 1
# when
files = pipeline.collect_files("*.txt")
actual_length = len(files)
# then
assert expected_length == actual_length
The first line of code imports pipeline
and pytest
Python libraries. Next, a test
function is created called test_collect_files
.
This function has a parameter temp_path
that will be used to get a temporary directory.
def test_collect_files(tmp_path):
The pipeline is divided into three sections - Given, When, and Then.
Inside the Given, a new variable is created named temp_data_directory
, which is nothing but a temporary path that points to the data
directory. This is possible because the tmp_path
fixture returns a path object.
Next, the data directory needs to be created. It is done using the mkdir
function, and the parent is set to true to ensure that all the parent directories inside this path are created.
Next, a single text file is created inside this directory, named file1.txt
, and then created using the touch
method.
A new variable, expected_length
, is created that returns the number of files inside the data directory. It is given a value of 1
as only one file is expected inside the data directory.
temp_data_directory = tmp_path / "data"
temp_data_directory.mkdir(parents=True)
temp_file = temp_data_directory / "file1.txt"
temp_file.touch()
expected_length = 1
Now the program enters the When section.
When the pipeline.collect_files
function is invoked, it returns a list of files having a pattern *.txt
, where *
is a string. It is then assigned to a variable files
.
The number of files is fetched using len(files)
, which returns the length of the list and is stored inside the variable actual_length
.
files = pipeline.collect_files("*.txt")
actual_length = len(files)
In the Then section, an assert
statement states that expected_length
must be equal to actual_length
. assert
is used to check whether a given statement is true.
Now the pipeline is ready for testing. Head over to the terminal and run the test.py
file using the command:
pytest test.py
When the test is run, it fails.
assert expected_length == actual_length
E assert 1 == 0
test.py:23: AssertionError
=============================== short test summary info ============================================
FAILED test.py::test_collect_files - assert 1 == 0
It happens because the expected length is 1
, but in reality, it is 2
. This happens because, at this point, the program is not using the temp directory; instead, it uses the real data directory created at the beginning of the program.
Two files were created inside the data directory, while only a single file was created inside the temporary directory. What happens is that the test.py
code is written to check for files inside the temporary directory where only a single file is stored, but instead, the code causes it to go back to the original directory.
That is why the expected_length
variable is given a value of 1
, but when it is compared with actual_length
, the test fails.
We can patch the global variable to solve this issue using a monkey patch.
At first, a parameter monkeypatch
needs to be added to the function collect_files
like this:
def test_collect_files(tmp_path, monkeypatch):
Now what needs to be done now is that the global variable will be patched by using the monkey patch:
def test_collect_files(tmp_path, monkeypatch):
# given
temp_data_directory = tmp_path / "data"
temp_data_directory.mkdir(parents=True)
temp_file = temp_data_directory / "file1.txt"
temp_file.touch()
monkeypatch.setattr(pipeline, "DATA_DIR", temp_data_directory) # Monkey Patch
expected_length = 1
Monkey patching in Python has a function setattr
, which allows assigning a new value to the DATA_DIR
variable inside the pipeline module. And the new value for DATA_DIR
is assigned to temp_data_directory
.
If the test is executed again, it is passed because the global variable is patched, and it uses temp_data_directory
instead.
platform win32 -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0
rootdir: C:\Users\Win 10\PycharmProjects\Monkey_Patch
collected 1 item
test.py . [100%]
================================== 1 passed in 0.02s ====================================
Conclusion
This article focuses on monkey patching in Python and explains in detail the practical uses of monkey patching. The reader will be able to implement monkey patching easily.