The Undocumented SWIG
Building High Performance Integrated Python Extensions
For those who have never ventured into the dark underworld of the Python C‑Extension API, you may believe that it is as fluid and rewarding as the rest of the Python ecosystem. I regret to inform you that this is not the case. Line 13 of The Zen of Python says:
There should be one – and preferably only one – obvious way to do it.
The C‑Extension API is an excellent example of what happens when one completely ignores that advice. There are two (incompatible) ways to export a module, a half-dozen documented ways to parse an argument list, and no less than nine options for calling a method. The documentation for this mess is excellent by C Library standards, but falls woefully short of the gold standard set by the rest of the Python docs.
Thankfully there is a better way, the excellent Simplified Wrapper and Interface Generator, better known as SWIG. In this three part series we’ll take a crash course in typical SWIG usage, discuss some advanced features like typemaps, templates, and ownership semantics, and then do a deep dive into using the SWIG runtime header to allow for tight, seamless integration of C/C++ code written specifically to accelerate Python modules.
Introducing SWIG
SWIG is the 8th Wonder of the Software World, it takes an incredibly complicated job and makes it a transparent part of your build process. SWIG cleanly integrates C and C++ routines into any of a dozen target languages using their native ABIs and foreign function interfaces. For many real-world use cases, not trivial example code, SWIG can do this out-of-the-box with barely any configuration whatsoever.
Unlike the C‑Extension API, SWIG has top-notch documentation full of example code and extensive reference material. This post is not a substitute for that documentation, it’s here to rapidly get the reader up to speed with the bare-minimum required to follow along with the other posts in the series. With that said, let’s begin our journey.
Figure 1 contains two files. The first is a simple C++ header containing a POD struct. All C‑esque code for these examples will be C++, but the exact same principles hold when working with pure C. The second file is called the interface file, and it’s how we’re going to instruct SWIG to build all the necessary code to interact with Python.
All code examples can also be found in this companion repository, along with build files.
Before we explore the interface file further, try building these files to see what happens. For Python the SWIG command to use is:
swig -c++ -python -py3 Agent.i
The switches here do what you expect, configuring SWIG to accept C++ as input (instead of C), and produce a Python Extension (specifically Python 3) as output. The extensions consists of two files, CAgent.py and Agent_wrap.cxx.
If you start Python and try to import CAgent
right now, you’ll get an import
error for a module called _CAgent. CAgent.py is a proxy for the actual
extension, Agent_wrap.cxx, which we still need to build. How you choose to
build this is up to you and your workflow, the following command will build the
extension if you’re using the CMakeLists.txt included with the companion repo:
cmake . && cmake --build .
Now open a Python REPL in the same folder that you’ve built the extension and
import CAgent
. You can create an AgentUpdate
object and play with it, just
like a native Python class. Do AgentUpdate
objects act the way you expect them
to? What are the differences from a normal Python object? Hint: Check out the
__dict__.
Interrogating the Interface
Now we’ll explore that interface file in more depth. SWIG interface files are typically quite trivial, and our interface file is barely going to change at all in this entire series, but that doesn’t mean they’re not powerful. Rather, SWIG itself is so powerful that we rarely need to leverage the many capabilities of interface files very much.
Starting with the first line:
%module CAgent
The module directive gives the resulting Python module it’s name. I typically prefix SWIG-generated modules with “C” to make them easy to tell apart at a glance and easy to add to .gitignore.
%{
#include "Agent.hpp"
%}
All code between %{
and %}
directives is included literally in the
generated wrapper. This is typically used to include headers necessary to build
the wrapper, which we do here.
%include <stdint.i>
%include <std_string.i>
The %include
directive in SWIG works the same way #include
does in C/C++,
the preprocessor places a copy of the include’d file into the unit. Here
we’re including standard SWIG typemaps
for interacting C++ strings and the standard integer types.
This raises the awkward question of “What is a typemap?” For now, I’m going to quote SWIG’s documentation:
Let’s start with a short disclaimer that “typemaps” are an advanced customization feature that provide direct access to SWIG’s low-level code generator. Not only that, they are an integral part of the SWIG C++ type system (a non-trivial topic of its own).
Suffice to say the concept of typemaps is outside the scope of this crash course. We need these two includes because they allow us to transparently interact with standard integers and C++ strings, but that’s as much as this post is going to explore them.
%include "Agent.hpp"
The final %include
takes all the declarations from Agent.hpp and places
them in our interface file. SWIG parses these declarations and generates
wrapper code based on them.
As mentioned earlier, there are far more powerful directives available than the ones explored here. Additionally, SWIG has a library of support files that build yet more advanced functionality on top of those directives. Rather than trying to learn all of SWIG in one fell swoop, it’s best to just learn on the go.
Getting Classy
Normally this is the part of the tutorial where we add a second layer of complexity to the material introduced in the first couple sections. But thanks to SWIG, there is no additional complexity. Classes, methods, and functions all work identically to the basic POD we’re already familiar with.
Figure 2 creates a proper class with a method; it also adds an implementation file, Agent.cpp, as a matter of good C++ practice but not necessity. SWIG only needs to see declarations, not definitions, so it doesn’t care about this file. The result builds and acts the way you expect it to without any changes to the interface file.
Again I encourage you to play with the resulting CAgent module, or even to
modify the SecretAgent
’s C++ source code. For code that only wants to call
into C/C++, and does not need to call back into Python, this is as
complicated as SWIG gets for most use cases.
As a fun exercise, Figure 3 uses these same techniques to add a combat function
to the SecretAgent
class, with an enum return type.
Of note, the combat_result
member enum is translated to a set of member
variables for the Python class, which are mapped to globals defined in the
underlying shared library. This means they’re accessed almost identically to
the enum members in C++.
What’s Next
None of the techniques discussed in this post are Python specific, they can be
applied to any of the target languages that SWIG supports. In the next part
we’ll talk a little more about typemaps and using them to interact with more
complex types than integers and strings. This involves calling Python.h
specific functions, and will begin our descent into the less traveled corners
of SWIG usage.
The images used in this post are public domain, made available thanks to the invaluable work of Liam Quin at fromoldbooks.org