Python and C

Vincent Bernat November 1, 2010

Note

This article was published in GNU/Linux Magazine n° 132 in 2010 in French. It is translated here to English with an LLM.

Python is a rich language with an extensive standard library covering a wide range of needs. Yet it is not uncommon to reach for an external module. For example, if your application needs a PostgreSQL database, you can use the third-party module psycopg2. Whether or not they are in the standard library, modules fall into two families: native modules, written entirely in Python, and extensions, written in another language, typically C. This article looks at the different ways to write a Python extension.

An extension or a native module?
The turtle
Writing the Python extension
Conclusion

An extension or a native module?#

First, why choose to write an extension rather than a native module? There are advantages and disadvantages to both approaches.

A native module requires no compilation and no additional tools. It is portable and works identically regardless of the OS: on Microsoft Windows as on a GNU/Linux distribution. There is no need to learn another language! By writing your module in Python, you benefit from its qualities: richness, dynamism, and the many existing modules and extensions to build upon.

But it is not always possible or desirable to write a native module. The most common case, and the one we explore later, is using a library written in another language such as C. You need to bridge this library and Python. Another reason is speed. The most common Python interpreter (CPython) is not yet fast. If speed matters, you may need to rewrite certain parts in a more static but faster language like C. Finally, you may need to access low-level functions that are hard to reach from Python, for example to control an electronic device on an uncommon interface. This last example is increasingly less relevant as Python gains extensions to fill this need, such as USB device control.

The turtle#

To illustrate this, imagine you just received a robot turtle. It is a small robot designed to reproduce the behavior of the LOGO language turtle. The robot can interpret a very large number of commands. You can ask it to move forward, turn left, lower the pen, change the pen, and so on. It interfaces with a PC to receive the appropriate commands. The interface uses a command bus over a serial port.

C library#

This turtle comes with a C library for controlling it with a few functions. It is a simple library. We use this library rather than trying to reimplement it.

The C interface#

Here is the interface provided with the library:

#ifndef _TORTUE_H
#define _TORTUE_H

/* Codes d'erreur */
#define TURTLE_ERROR_NO_ERROR      0
#define TURTLE_ERROR_COMMUNICATION 1
#define TURTLE_ERROR_INVALID_VALUE 2
#define TURTLE_ERROR_NOT_PRESENT   3

/* Object opaque représentant une tortue */
struct turtle;

/* Fonctions disponibles */
struct turtle* turtle_init(int port);
int            turtle_send(struct turtle *, const char *);
int            turtle_close(struct turtle *);
int            turtle_error(struct turtle *);
const char*    turtle_model(struct turtle *);
long int       turtle_status(struct turtle *);

#endif

The library can control multiple turtles simultaneously. They are connected on a bus and identified by a number. The first turtle gets number 1, the next one number 2, and so on.

The turtle_init() function initializes a turtle. It returns an opaque structure that other functions use to identify which turtle to work with. If the requested turtle does not exist, this function returns NULL. The interface does not allow obtaining more information about the error at this level.

The following functions return -1 or NULL on error. You can then get more information using the turtle_error() function. There are only 3 possible errors.

The turtle_send() function sends a command to the turtle. It returns 0 on success. You provide it with a string like FORWARD 10 to move it forward or LEFT 50 to make it turn. The robot interprets the command itself.

To shut down the turtle and free all associated resources, use the turtle_close() function. It returns 0 on success.

The turtle_model() function returns a string indicating the turtle’s model. Finally, the turtle_status() function returns an integer encoding the turtle’s state. This state depends on the model, and the C library provides no indication of its meaning. For more details on this value, you need to read the specifications of the turtle model you are using.

The implementation#

Unless you have a Turtle 3000 dealership nearby, it is not easy to find the turtle described above. To facilitate experimentation, we provide a minimalist implementation of this library to use throughout the article.

This implementation can control only 3 turtles. An error is returned if you try to control another one. The third turtle is also defective: you cannot send commands to it. Each turtle is a different model.

#include <stdio.h>
#include <stdlib.h>
#include "tortue.h"

FILE *output = NULL;

struct turtle {
  int index;  /* Index de la tortue */
  int error;  /* Dernière erreur de la tortue */
};

struct turtle*
turtle_init(int port)
{
  struct turtle *t;
  if ((port < 1) || (port > 3))
    return NULL;
  t = malloc(sizeof(struct turtle));
  if (!t) return NULL;
  t->index = port;
  t->error = 0;
  fprintf(output, "Open turtle %d\n", t->index);
  fflush(output);
  return t;
}

int
turtle_send(struct turtle *t, const char *command)
{
  if (t->index == 3) {
    t->error = TURTLE_ERROR_COMMUNICATION;
    return -1;
  }
  fprintf(output, "Command for turtle %d: %s\n", t->index, command);
  fflush(output);
  return 0;
}

int
turtle_close(struct turtle *t)
{
  free(t);
  return 0;
}

int
turtle_error(struct turtle *t)
{
  return t->error;
}

const char*
turtle_model(struct turtle *t)
{
  switch (t->index) {
  case 1: return "T1988";
  case 2: return "T2000";
  default: return "T3000";
  }
}

long int
turtle_status(struct turtle *t)
{
  switch (t->index) {
  case 1: return 458751;
  case 2: return 812;
  default: return 0;
  }
}

void __attribute__ ((constructor))
my_init() {
  output = fdopen(3, "a");
  if (!output) output = stderr;
}

We can compile our library with the following commands:

$ gcc -O2 -Wall -fPIC -shared -Wl,-soname,libtortue.so.1 \
      -o libtortue.so.1.0.0
$ ln -s libtortue.so.1.0.0 libtortue.so.1
$ ln -s libtortue.so.1 libtortue.so

This library has a small quirk that makes testing easier. If file descriptor 3 exists, it prints diagnostic messages there. You can intercept these messages to verify the proper functioning of the library.

Before moving on to Python, let us try using our library with a simple test program.

/* sample.c */
#include <assert.h>
#include <stdlib.h>
#include "tortue.h"

int main() {
  struct turtle *t;
  assert((t = turtle_init(1)) != NULL);
  assert(turtle_send(t, "GO 10") == 0);
  assert(turtle_send(t, "LEFT 50") == 0);
  assert(turtle_send(t, "GO 40") == 0);
  assert(turtle_close(t) == 0);
  return 0;
}

Let us compile and run.

$ gcc -Wall -O2 -o sample sample.c -L. -ltortue
$ LD_LIBRARY_PATH=. ./sample
Open turtle 1
Command for turtle 1: GO 10
Command for turtle 1: LEFT 50
Command for turtle 1: GO 40

Everything seems to work as expected!

The Python interface#

So you now have a superb turtle and everything needed to program it easily in C. But you intend this turtle for an audience that wants to program it in Python rather than C. Two solutions exist: reimplement the library as a native Python module (by reading its source code or by listening on the serial port) or create a Python extension. We take the second approach.

Before diving into the extension code, let us define the target interface. Broadly, we want a Turtle object with a send() method and model and status attributes. Errors should become exceptions.

To verify that the interface suits our needs, it is useful to write a few lines using the extension before rushing into the code. This is where you realize whether the interface is complete and practical. Another approach is to write unit tests from the start. This way, we can verify that the code meets our expectations and that the different implementations are equivalent. Let us dive right into writing these tests!

#!/usr/bin/python

import unittest
import os

r, w = os.pipe()
os.dup2(r, 200)
os.close(r)
os.dup2(w, 3)
os.close(w)
output = os.fdopen(200)

from tortue import Turtle
from tortue import TurtleException

class TestTurtle(unittest.TestCase):

    def test_turtle(self):
        t1 = Turtle(1)
        self.assertEqual(output.readline().strip(), "Open turtle 1")

    def test_send(self):
        t1 = Turtle(1)
        output.readline()
        t1.send("GO 10")
        self.assertEqual(output.readline().strip(), "Command for turtle 1: GO 10")

    def test_model(self):
        t1 = Turtle(1)
        output.readline()
        self.assertEqual(t1.model, "T1988")
        t2 = Turtle(2)
        output.readline()
        self.assertEqual(t2.model, "T2000")

    def test_status(self):
        t1 = Turtle(1)
        self.assertEqual(t1.status["ready"], True)
        self.assertEqual(t1.status["distance"], 458)
        self.assertEqual(t1.status["speed"], 75)

    def test_exception(self):
        t3 = Turtle(3)
        output.readline()
        self.assertRaises(TurtleException, t3.send, "GO 10")

if __name__ == "__main__":
    unittest.main()

We start with a small manipulation of file descriptors to read what comes out of descriptor 3. This lets us verify the proper functioning of our extension. We then import the extension we want to test and run the 5 tests that validate our interface.

The Python implementation#

Our C library is very simple. We could have written it directly as a Python module. That is not the point of the exercise, but to show that the tests work even before writing our first version of the extension, we will write the Python version. To do this, we create a tortue directory containing a file __init__.py with from native import *. The purpose of this file is to direct our tortue module to the right version of the extension. In this directory, we also place the native module code in the file native.py.

import os
import sys

from status import TurtleStatus

try:
    output = os.fdopen(3, "a")
except OSError:
    output = sys.stdout

class Turtle:
    def __init__(self, nb):
        print >> output, "Open turtle %d" % nb
        output.flush()
        self.nb = nb
        if nb == 1:
            self.model = "T1988"
            self.status = TurtleStatus(self.model, 458751)
        elif nb == 2:
            self.model = "T2000"
            self.status = TurtleStatus(self.model, 812)
        else:
            self.model = "T3000"
            self.status = TurtleStatus(self.model, 0)

    def send(self, cmd):
        if self.nb == 3:
            raise TurtleException("communication error")
        print >> output, "Command for turtle 1: %s" % cmd
        output.flush()

class TurtleException(Exception):
    pass

This module calls an additional module that decodes the robot’s state based on its model and the numerical state value. We want to obtain the state as a dictionary, which is more readable than a numerical value. Since the decoding may evolve with new models, it seems simpler to code it in Python rather than including it in the extension. This will also have educational value later! The contents of the status.py file are as follows:

class TurtleStatus(dict):

    def __init__(self, model, status):
        if model == "T1988":
            dict.__init__(
                self,
                {
                    "ready": (status % 10 == 1),
                    "distance": status / 1000,
                    "speed": (status / 10) % 100,
                },
            )
        else:
            dict.__init__(
                self, {"ready": (status & 2) != 0, "distance": (status & 0xFF0) >> 8}
            )

Once all this is in place, we can run our tests and everything should pass!

$ python turtletest.py
----------------------
Ran 5 tests in 0.000s

OK

But we did not use the C library. If we try to control real turtles, not much will happen. It is time to write the extension using the C library.

Writing the Python extension#

There are several methods for writing a Python extension. We look at four: using ctypes, using SWIG, using Pyrex, and finally writing the extension directly in C with the Python/C API.

Using ctypes#

The ctypes module lets you interface with a library and call its functions. Its use was already detailed in an article by Victor Stinner in a previous special issue. The great advantage of this approach is manipulating the library in its compiled form. This yields an extension that requires no compilation. The downside is that there is no safety net: if you make a mistake or if the binary interface of the library changes, you get a segfault from the Python interpreter.

Using the ctypes module is straightforward. You load the library into memory with CDLL() and then call the functions as methods of the obtained object. Behind the scenes, the module figures out the C function’s signature from the way you invoke the Python method.

$ LD_LIBRARY_PATH=. python
Python 2.6.6 (r266:84292, Sep 14 2010, 08:45:25)
[GCC 4.4.5 20100909 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes import *
>>> libtortue = CDLL("libtortue.so.1")
>>> t1 = libtortue.turtle_init(1)
Open turtle 1
>>> libtortue.turtle_model(t1)
-1600443770

But not everything can happen by magic. In some cases, you need to help the module. By default, C functions are expected to return an int. That is why we get -1600443770 in the example above: a pointer to a character string cast to an integer. We need to help ctypes by specifying the return types.

>>> libtortue.turtle_init.restype = c_void_p
>>> libtortue.turtle_model.restype = c_char_p
>>> t1 = libtortue.turtle_init(1)
Open turtle 1
>>> libtortue.turtle_model(t1)
'T1988'

Of course, you must not make mistakes. Otherwise, the consequences are immediate. This is why developing an extension with ctypes is fragile and should only be used to build a first prototype.

>>> libtortue.turtle_model(1)
sh: segmentation fault  LD_LIBRARY_PATH=. python

The ctypes module allows many other manipulations on C structures. But we do not need them for our extension.

from ctypes import *
from status import TurtleStatus

libtortue = CDLL("libtortue.so.1")
libtortue.turtle_init.restype = c_void_p
libtortue.turtle_model.restype = c_char_p
libtortue.turtle_status.restype = c_long

class Turtle(object):
    def __init__(self, nb):
        t = libtortue.turtle_init(nb)
        if t == 0:
            raise TurtleException("unable to create turtle %d" % nb)
        self.t = c_void_p(t)

    def send(self, cmd):
        result = libtortue.turtle_send(self.t, cmd)
        if result != 0:
            raise TurtleException("got error %d" % libtortue.turtle_error(self.t))

    def __getattribute__(self, attr):
        if attr == "model":
            return libtortue.turtle_model(self.t)
        if attr == "status":
            s = libtortue.turtle_status(self.t)
            if s == -1:
                raise TurtleException("got error %d" % libtortue.turtle_error(self.t))
            return TurtleStatus(self.model, s)
        return object.__getattribute__(self, attr)

    def __del__(self):
        libtortue.turtle_close(self.t)

class TurtleException(Exception):
    pass

Name this new file mctypes.py, then modify the __init__.py file to import this file instead of native.py. The tests should pass without any problems!

Note how simple the integration is. The send() method calls the C send() function and converts any error into an exception. We also use the __getattribute__ method, which intercepts attribute accesses to dynamically provide the correct model and state.

The resulting module is fragile, though. If an error creeps in, the program terminates with a segfault, which is unusual in Python. This is the nature of interfacing Python and C, but the problem is aggravated by the absence of compilation that would detect certain errors, including in code paths not exercised by the tests.

Using SWIG#

SWIG stands for Simplified Wrapper and Interface Generator. Its goal is to quickly build a Python extension from an interface file (our tortue.h).

Let us dive right in. To use SWIG, you need to write an interface file. SWIG will read it to generate a C extension that can then be compiled. Here is the shortest interface file we can write:

%module swig
%{
#include "../tortue.h"
%}

%include "../tortue.h"

An interface file contains C directives as well as SWIG-specific directives starting with the % sign. The %module directive names the extension we want to build. The block delimited by %{ and %} lets us include arbitrary code in the generated C extension. We use it to include our library’s header, as we did for the sample C program. The %include directive includes an arbitrary file in our interface file. Its format must be understandable by SWIG. Our header file tortue.h is simple enough for SWIG to parse directly. If it were too complex or contained symbols we do not want to expose, we would have had to write the function declarations by hand.

We now compile the interface file into a C file. This C file is then compiled into a Python extension usable from the interpreter.

$ cd tortue
$ swig -python swig.i
$ gcc -Wall -O2 -c -fPIC swig_wrap.c $(python-config --cflags)
$ gcc -shared swig_wrap.o -L.. -ltortue -o _swig.so
$ cd ..

$ LD_LIBRARY_PATH=. python
>>> from tortue import swig
>>> [f for f in dir(swig) if f.startswith("turtle_")]
['turtle_close', 'turtle_error', 'turtle_init', 'turtle_model', 'turtle_send', 'turtle_status']
>>> a = swig.turtle_init(1)
>>> swig.turtle_send(a, "GO 10")
Command for turtle 1: GO 10
0

SWIG converted each declaration into a Python function that can be used like the corresponding C function. This yields an interface identical to what you can obtain directly with the ctypes module, but with better argument checking. There is still the possibility of a segfault, but in most cases, errors are detected correctly:

>>> swig.turtle_model(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in >module<
TypeError: in method 'turtle_model', argument 1 of type 'struct turtle *'

The work is not done, though. The SWIG-generated extension does not respect our interface at all. But since the obtained interface is nearly identical to the one from the ctypes module, we take the mctypes.py file and copy it to mswig.py. Only the beginning changes:

import swig as libtortue
from status import TurtleStatus

class Turtle(object):
    def __init__(self, nb):
        t = libtortue.turtle_init(nb)
        if t == 0:
            raise TurtleException("unable to create turtle %d" % nb)
        self.t = t

The rest is strictly identical. Modify __init__.py to use this module and rerun the tests. Everything should pass correctly.

SWIG has features to produce a directly usable extension without the additional wrapper we had to write here. For example, it can generate exceptions based on function return codes. It can also transform structures into objects. Another feature lets you modify the generated functions by adding code. This last feature would be needed to return a TurtleStatus instance when accessing the status attribute. We do not detail all these features here because they require some knowledge of the Python API, and SWIG is generally not used to directly produce a high-level interface from a low-level C library. Let us see how to obtain a Turtle object rather than a collection of functions:

// swig.i
%module swig
%{
#include "../tortue.h"

  typedef struct {
    struct turtle *t;
  } Turtle;

  Turtle *new_Turtle(int port) {
    Turtle *t;
    t = malloc(sizeof(Turtle));
    t->t = turtle_init(port);
    return t;
  }

  void delete_Turtle(Turtle *t) {
    turtle_close(t->t);
    free(t);
  }

  void Turtle_send(Turtle *t, char *cmd) {
    turtle_send(t->t, cmd);
  }

  const char *Turtle_model_get(Turtle *t) {
    return turtle_model(t->t);
  }

  const int Turtle_status_get(Turtle *t) {
    return turtle_status(t->t);
  }
%}

typedef struct {
  %extend {
    Turtle(int);
    ~Turtle();
    void send(char *cmd);
    %immutable;
    const char *model;
    const int status;
  }
} Turtle;

In the C section, we declare a new Turtle structure containing only the turtle identifier. In the SWIG section, we use the %extend directive to transform this structure into a class and add methods and attributes respecting our interface. We thus have a constructor (equivalent to __init__), a destructor (equivalent to __del__), the send function, and the two model and status attributes. The latter are marked as %immutable to indicate that we will not write a function to modify them. SWIG expects to have the corresponding functions defined in the C section.

This new extension does not pass our tests. Exception handling is missing. We would need to declare a new exception and use it in each method—this requires Python API functions that we cover later. The status attribute is an integer whereas we expected a TurtleStatus instance. We would need to import the module and instantiate the class—again requiring Python API functions.

SWIG is very interesting when you have a C library with many functions to convert, like an OpenGL library. Another advantage of SWIG is its compatibility with a large number of Python versions, from version 2.0 up to version 3.

When going further, you need to call certain Python API functions directly. Advanced use of SWIG can quickly become difficult. For these reasons, libraries using SWIG are typically low-level: they provide a Python equivalent of C functions, possibly with exception handling. A more Pythonic interface is then added by writing an additional Python module on top of the SWIG extension. No knowledge of the Python API is truly necessary to use SWIG.

Using Pyrex#

You can combine the freedom of the ctypes module with the safety of a compiler. This solution is called Pyrex, a compiler from Python code to C that can directly use C data.

Pyrex takes Python code as input and transforms it into C code. Some limitations exist, but existing Python code can generally be transformed by Pyrex. Pyrex also knows how to use C data and functions, making it possible to mix C and Python code, somewhat like the ctypes module. The extension we are going to write is actually very close to what can be done with the ctypes module:

cdef extern from "../tortue.h":
  struct turtle
  turtle* turtle_init(int port)
  int turtle_send(turtle *, char *)
  int turtle_close(turtle *)
  int turtle_error(turtle *)
  char* turtle_model(turtle *)
  long int turtle_status(turtle *)

from status import TurtleStatus

cdef class Turtle:

    cdef turtle *t

    def __cinit__(self, port):
        cdef turtle* t
        t = turtle_init(port)
        if (t == NULL):
            raise TurtleException("unable to create turtle %d" % port)
        self.t = t

    def send(self, cmd):
        cdef int result
        result = turtle_send(self.t, cmd)
        if result != 0:
            raise TurtleException("got error %d" % turtle_error(self.t))

    def __getattr__(self, attr):
        cdef int s
        if attr == "model":
            return turtle_model(self.t)
        if attr == "status":
            s = turtle_status(self.t)
            if s == -1:
                raise TurtleException("got error %d" % turtle_error(self.t))
            return TurtleStatus(self.model, s)
        return object.__getattribute__(self, attr)

    def __dealloc__(self):
        turtle_close(self.t)

class TurtleException(Exception):
    pass

At the top of the file, we import the functions we need. The syntax is neither Python nor C. You cannot import a bunch of functions without enumerating them one by one. We first declare the struct turtle structure, subsequently referenced by the name turtle. We then define each function following the C prototype as closely as possible. Pyrex does not like the const keyword, so we omit it. Once defined this way, these C functions can be called anywhere in the code.

Without further specification, all variables are assumed to contain Python objects. Where possible, Pyrex handles the automatic conversion of C data into Python objects. For data that cannot be converted or for which conversion is not desired (for performance reasons), you must declare it with the cdef keyword. Note that the entire class uses cdef because we want to store the pointer to our turtle in the class, and you cannot store C data in a regular Python object. The __init__ and __del__ methods become __cinit__ and __dealloc__ (keeping in mind that you should avoid doing too many things in the latter). If the C library used an integer rather than a structure to identify turtles, it would have been possible to use a regular Python class. Finally, note the curious mix of using __getattr__ and __getattribute__ for a class not inheriting from object. This is a Pyrex peculiarity.

Let us try to compile this code after modifying __init__.py to use the extension written with the help of pyrex:

$ pyrexc pyrex.pyx
$ gcc -shared -fPIC -O2 -Wall $(python-config --cflags) pyrex.c \
      -L.. -ltortue -o pyrex.so
$ (cd .. ; LD_LIBRARY_PATH=. python turtletest.py)
.....
----------------------
Ran 5 tests in 0.000s

OK

Pyrex thus combines the advantages of SWIG (compilation, function parameter checking) with the advantages of the ctypes module (mixing with Python code). But there is no automatic generation of all functions from a library, so converting a large library this way is tedious.

Using the Python/C API#

One last option: using the Python/C API to write the entire extension directly in C. Beyond the educational aspect, there are practical reasons to consider this approach. First, tools like SWIG may require writing more C code than desired, code that demands a good understanding of the Python API. Second, if you encounter a segfault or other bug with Pyrex or SWIG, you need to read and understand the generated C code. Finally, you may prefer hand-written code for conciseness, speed, or flexibility. Note that the Python standard library does not accept automatically generated code.

Memory management#

In C, the programmer manages memory. You allocate memory in one place and must remember to free it in another. The programmer is solely in charge and must adopt a methodology to avoid forgetting to deallocate unused memory—either manually or using a mechanism such as a garbage collector. In Python, memory management is automatic. The interpreter maintains a counter for each object, increments it when a variable references the object, and decrements it when no longer referenced. When the counter reaches zero, the object is freed.

When writing an extension in C, the objects you create and manipulate can be passed to Python code. You must therefore use this counter mechanism for each object you manipulate: increment the counter to keep an object and decrement it when you no longer need it.

The whole difficulty lies in knowing whether it is necessary to touch an object’s counter or whether another function has already taken care of it. Fortunately, there are some conventions so that the exercise feels natural with a bit of practice. Everything revolves around the notion of owning a reference to an object. When you own a reference to an object, you must, when you no longer need it, release it by decrementing the object’s counter or transfer ownership to another function. Conversely, when you want to use an object, you must ensure that you actually own the reference to the object you are manipulating.

When you call a function and it returns an object, there are two possibilities. Either the function returns a new reference to the object, i.e., delegates ownership of the reference to the caller, in which case you should not increment the object’s counter if you want to keep it, but you must decrement it when you no longer need it. Or the function lends a reference to the object. This is the opposite case. You must increment the counter if you wish to keep this reference (in order to own your own reference), and if not, there is no need to decrement the counter.

Conversely, when you pass a reference to an object to a function, it can behave in two ways. Either it steals the reference to the object, meaning the caller no longer owns the reference, or it does not steal it and the caller must dispose of the reference if they no longer wish to use the object.

Let us start with the simple cases. Unless otherwise stated in the documentation, a called function does not steal the reference to an object. If the function needs to keep a reference to the object, it will increment the object’s counter itself. The caller must therefore dispose of the reference themselves in most cases. The two well-known functions that steal the reference are PyList_SetItem() and PyTuple_SetItem(), which place the object in a list or tuple.

Regarding the transfer or borrowing of the reference, a general rule is that creating an object results in transferring ownership of the corresponding reference to the caller, while querying a property is merely a borrow. For example, if you create a new list with PyList_New(), ownership of the returned reference is transferred to the caller. The caller does not need to increment the object’s counter. In this case, we say the caller obtains a new reference. On the other hand, when you look up an element of a list with PyList_GetItem(), the caller borrows the returned reference. If they wish to keep this reference, they must increment the object’s counter themselves. When in doubt, the documentation indicates each time a function returns a Python object whether ownership is transferred (new reference) or borrowed in the documentation. C functions called from Python must return new references.

It is not always necessary to increment the counter before using an object. For example, if this object is an argument to a function called from the Python interpreter, it is guaranteed that the reference will be valid for the entire lifetime of the function. It is therefore only necessary to increment the counter when you wish to keep this reference beyond the function’s lifetime. When the reference comes from a function from which you borrow it, you should be more careful. It is possible that this reference becomes invalid before the function ends. Indeed, certain functions can trigger code that frees the reference in question. It is therefore preferable, in such cases, to increment the object’s counter.

So, how do you manipulate this counter? Several macros exist. Py_INCREF increments the counter, Py_DECREF decrements it, and Py_XDECREF does the same provided the object is not NULL (in which case the macro does nothing).

API overview#

Memory management is central to using the Python API properly. Reread the previous section if it seems unclear. The rest of the Python API is much simpler to understand.

From your extension’s C code, you receive Python objects, manipulate them, and return them. Everything that comes from or goes to the Python interpreter is a Python object, including integers and strings. When you manipulate a Python object in C, you have a reference to it in the form of a PyObject * pointer. Not all the objects you manipulate are equivalent, but they are always represented as a PyObject * pointer.

Functions are prefixed according to the type of object they handle. For example, PyList_ functions manipulate lists, while PyInt_ functions manipulate integers. For each type, you typically have functions to create an object, check its type (is this argument actually an integer?), and convert it to another type. Everything you can do from the Python interpreter has an equivalent function in the API.

Two functions come up regularly. The first is PyArg_ParseTuple(), which verifies that a function’s arguments match the expected types and stores them in variables, in a single call. Did the caller provide an integer, then a list, then optionally another integer? If so, store each value in the right variable. The second is Py_BuildValue(), which makes it easy to construct simple objects like an integer, or a tuple containing an integer and a string.

Error handling#

Python has an exception mechanism. When an error occurs in an extension, it must raise an exception. The Python API handles errors in a straightforward way. The general rule is that when a function that should return a reference to an object returns NULL instead, an exception is generated. In particular, if a Python API function returns NULL and you do not wish to handle this error case, also return NULL and the exception will be propagated with the appropriate error message. When a function does not return a Python object, refer to the documentation to see how error cases are handled. In all cases, you can call PyErr_Occurred() to find out whether you are currently in an error state.

When you want to generate an exception, you must not only return NULL (or -1 in the case of most functions returning an integer) to the interpreter, but also indicate which exception you want to raise. To do this, you can use functions starting with PyErr_ such as PyErr_NoMemory(), PyErr_SetString(), or PyErr_FormatString().

Writing the extension#

Let us ease into writing our extension. Here is a first version containing its initialization and the declaration of the TurtleException:

#include <Python.h>

static PyObject *TurtleException;

PyMODINIT_FUNC
initpythonc(void)
{
    PyObject *m;

    m = Py_InitModule("pythonc", NULL);
    if (!m)
        return;

    TurtleException = PyErr_NewException("pythonc.TurtleException", NULL, NULL);
    if (!TurtleException)
      return;
    Py_INCREF(TurtleException);
    PyModule_AddObject(m, "TurtleException", TurtleException);
}

The Python.h header contains the declarations needed for the Python API. We then declare the TurtleException object, usable throughout the extension. The function marked with PyMODINIT_FUNC initializes the module. A module is also a Python object. We initialize it with Py_InitModule(). The first parameter is the module name and the second is the set of module methods. Currently, we have no methods. Note the error handling specific to the API. If Py_InitModule returns NULL, an error occurred. Module initialization is somewhat special since the function does not return an object. We exit the function. The Python interpreter detects the error and raises an exception.

Let us return to our TurtleException object. The PyErr_NewException() function allows us to create a new exception. Again, we check whether NULL was returned, and if so, there is no point going further. Following the logic described earlier, PyErr_NewException() should transfer a new reference on the created object to us. The documentation confirms this. So why do we increment the object’s counter? The next function, PyModule_AddObject(), whose role is to add an object to the module (under a name of your choice), steals the reference to the object. But we want to keep this exception because we will use it in the rest of the extension. If the interpreter decides to remove the exception from the module (with del pythonc.TurtleException, for example), we could lose the only reference to the exception and the object would be deallocated.

Let us compile our new extension:

$ gcc -shared -fPIC -O2 -Wall \
     $(python-config --cflags) pythonc.c -L.. -ltortue \
     -o pythonc.so

$ (cd .. ; LD_LIBRARY_PATH=. python )
>>> from tortue import pythonc
>>> dir(pythonc)
['TurtleException', '__doc__', '__file__', '__name__', '__package__']
>>> raise pythonc.TurtleException
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pythonc.TurtleException

We must then add our Turtle class to the module. To do this, we first need to declare a structure that will represent an instance of the class and therefore the corresponding Python object. It will notably contain the counter for tracking the number of active references on the object, as well as anything you deem useful to associate with each instance. In our case, our structure is declared as follows:

#include "../tortue.h"

typedef struct {
  PyObject_HEAD
  struct turtle *t;
} TurtleObject;

The PyObject_HEAD macro takes care of including everything necessary for this structure to behave as a Python object, including the reference counter. Thus, a variable of this structure type can be cast to a PyObject variable. We then add the elements needed for each instance. In our case, it is the reference to the turtle.

We must then define all possible operations on our object as well as its essential characteristics. To do this, we define a variable of type PyTypeObject that will represent the class:

static PyTypeObject TurtleType = {
  PyObject_HEAD_INIT(NULL)
  0,                            /*ob_size*/
  "pythonc.Turtle",             /*tp_name*/
  sizeof(TurtleObject),         /*tp_basicsize*/
  0,                            /*tp_itemsize*/
  (destructor)Turtle_dealloc,   /*tp_dealloc*/
  0,                            /*tp_getattr*/
  0,                            /*tp_setattr*/
  0,                            /*tp_compare*/
  0,                            /*tp_repr*/
  0,                            /*tp_as_number*/
  0,                            /*tp_as_sequence*/
  0,                            /*tp_as_mapping*/
  0,                            /*tp_hash */
  0,                            /*tp_call*/
  0,                            /*tp_str*/
  0,                            /*tp_getattro*/
  0,                            /*tp_setattro*/
  0,                            /*tp_as_buffer*/
  Py_TPFLAGS_DEFAULT,           /*tp_flags*/
  "Turtle objects",             /*tp_doc*/
  0,                            /*tp_traverse*/
  0,                            /*tp_clear*/
  0,                            /*tp_richcompare*/
  0,                            /*tp_weaklistoffset*/
  0,                            /*tp_iter*/
  0,                            /*tp_iternext*/
  0,                            /*tp_methods*/
  0,                            /*tp_members*/
  0,                            /*tp_getset*/
  0,                            /*tp_base*/
  0,                            /*tp_dict*/
  0,                            /*tp_descr_get*/
  0,                            /*tp_descr_set*/
  0,                            /*tp_dictoffset*/
  (initproc)Turtle_init,        /*tp_init*/
  0,                            /*tp_alloc*/
  PyType_GenericNew,            /*tp_new*/
};

For now, our type does not do much. We give it a name, the size of the corresponding object, and a documentation string. It has no methods or attributes yet. But we must explain how to create a new instance using Turtle_init(). This function serves as __init__(). Let us look at its code:

static int
Turtle_init(TurtleObject *self, PyObject *args, PyObject *kwds)
{
  int port;

  if (!PyArg_ParseTuple(args, "i", &port))
    return -1;

  self->t = turtle_init(port);
  if (!self->t) {
    PyErr_Format(TurtleException, "unable to create turtle %d", port);
    return -1;
  }

  return 0;
}

Note that this function does not return a PyObject *, just like a Python class constructor. Other than that, the function’s signature is standard. All functions are called with three arguments: a pointer to the instance, a pointer to a list of positional arguments, and a pointer to a dictionary of keyword arguments. Generally, functions return a Python object. That is not the case here.

We start by examining the arguments received. We want our constructor to be called with a single argument: an integer representing the turtle’s port. The role of PyArg_ParseTuple() is precisely to verify that the positional arguments provided are exactly 1 in number and represent an integer. If we were expecting a Python object, we would have used O instead of i. We would then also need to check ourselves that the provided object is of the expected type. For example, if we expected a list, the PyList_Check() function would confirm whether we actually have one. If the user provides more than one argument or one that is not an integer, PyArg_ParseTuple() will generate an exception and return NULL. In this case, we propagate the exception by returning -1. Note that if the user provides keyword arguments, those are ignored. We could replace the call to PyArg_ParseTuple() with a call to PyArg_ParseTupleAndKeywords() to be stricter.

We then call our C library to initialize the turtle. If initialization fails, we generate an exception using PyErr_Format() and return -1. Otherwise, 0 is returned to indicate success.

We must also define Turtle_dealloc(), called when the object is deallocated. In this function, we call turtle_close():

static void
Turtle_dealloc(TurtleObject *self)
{
  if (self->t)
    turtle_close(self->t);
  self->ob_type->tp_free((PyObject*)self);
}

If we compile and run our tests at this point, the first test passes. We are on the right track! Now we add the send() method. We must first write the corresponding function:

static PyObject *
Turtle_send(TurtleObject *self, PyObject *args, PyObject *kwds)
{
  const char *cmd = NULL;
  int result;
  if (!PyArg_ParseTuple(args, "s", &cmd))
    return NULL;

  result = turtle_send(self->t, cmd);
  if (result != 0) {
    PyErr_Format(TurtleException, "got error %d", turtle_error(self->t));
    return NULL;
  }
  Py_INCREF(Py_None);
  return Py_None;
}

Here again, we start by checking our arguments. We expect exactly one string. If that is not the case, we propagate the exception. We then call the turtle_send() function from our library and check the result. If it is not satisfactory, we generate an exception. Otherwise, we must return None. A Python function that does not return a result returns None. You must not return NULL, as that would correspond to an error case! Also note that we increment the counter of the Py_None object, which is an object like any other (but of which only one copy exists). Remember, by convention, methods must return a new reference to the interpreter. We therefore need to increment the counter of any Python object returned.

We then register this new method. All methods of an object are referenced in a single structure:

static PyMethodDef Turtle_methods[] = {
    {"send", (PyCFunction)Turtle_send, METH_VARARGS,
     "Send a command to the turtle"
    },
    {NULL}  /* Sentinel */
};

We provide the method name, the function to call (whose signature must match a C function called from the Python interpreter), the arguments it expects (here a variable number of positional arguments but no keyword arguments), and a documentation string. In the TurtleType definition, we replace the tp_methods member with this structure (instead of 0).

After compilation, only the tests for the model and status attributes fail. Let us write the accessor functions, starting with the simpler attribute:

static PyObject *
Turtle_getmodel(TurtleObject *self, void *closure)
{
  return PyString_FromString(turtle_model(self->t));
}

The model accessor is straightforward. It creates a new Python string from the value returned by turtle_model(). The PyString_FromString() function gives us a new reference. We pass this reference to the caller, so there is no need to touch the counter. Note that Py_BuildValue() would also work here.

All attribute accessors are grouped in a single structure:

static PyGetSetDef Turtle_getseters[] = {
        {"model",
         (getter)Turtle_getmodel, NULL,
         "model", NULL},
        {NULL}                  /* Sentinel */
};

For each attribute, we provide a name, then the function to get the attribute’s value, the function to set it (NULL in our case), the documentation string, and additional data that would be passed as the last parameter of the functions. In the definition of the TurtleType type, we replace the value of the tp_getset member with this structure.

After compilation, only one last test fails! We need to add the status attribute. Remember, this attribute returns a TurtleStatus instance. We therefore need to import the corresponding module and instantiate the class—that is, call Python code from our C code. Here is how to import the module:

    StatusModule = PyImport_ImportModule("tortue.status");
    if (!StatusModule)
      return;

Add this code at the end of the module initialization. PyImport_ImportModule() returns a new reference. We do not add this reference to the module, so we retain ownership. No need to increment the counter. StatusModule is declared as a static PyObject * global variable. We can now use this module in the status accessor:

static PyObject *
Turtle_getstatus(TurtleObject *self, void *closure)
{
  long int status = turtle_status(self->t);
  const char *model = turtle_model(self->t);
  PyObject *cstatus, *istatus;
  if (status == -1) {
    PyErr_Format(TurtleException, "got error %d", turtle_error(self->t));
    return NULL;
  }
  cstatus = PyObject_GetAttrString(StatusModule, "TurtleStatus");
  if (!cstatus)
    return NULL;
  istatus = PyObject_CallFunction(cstatus, "sl", model, status);
  Py_DECREF(cstatus);
  if (!status)
    return NULL;
  return istatus;
}

We first store the model and current state in two variables. If turtle_status() returns an error, we raise an exception. Then, we retrieve in cstatus the TurtleStatus class from the module we imported earlier. Since a module is a regular object, we use the generic PyObject_GetAttrString() function, which returns a new reference to the object in question. Note that afterwards, we will no longer need this reference, so as soon as it has been used, we release it by decrementing the counter.

We now have a reference to the class we wish to instantiate. Instantiation actually consists of treating the class as a function and executing it. The Python API provides the PyObject_CallFunction() function for this purpose. We give it the reference we have to the class, the format of the arguments (a string and a long integer), and the arguments themselves. PyObject_CallFunction() takes care of everything for us: it converts our variables into Python objects and instantiates the class. On failure, we get NULL and propagate the exception. On success, we get a new reference to the instance we are interested in and transfer it to the caller.

To finish, we need to add the reference to this function in the structure where we declared the previous attribute. It becomes:

static PyGetSetDef Turtle_getseters[] = {
        {"model",
         (getter)Turtle_getmodel, NULL,
         "model", NULL},
        {"status",
         (getter)Turtle_getstatus, NULL,
         "status", NULL},
        {NULL}                  /* Sentinel */
};

After compilation, all tests pass. Our module is functional! The Python API is rich and we have only scratched the surface. This should give you the foundation to go further.

Automating compilation#

Most solutions presented in this article require a compilation step. We ran commands manually to obtain the different extensions. The distutils module provides everything needed to automate this compilation and integrate it into Python’s module build and installation system.

For an extension written directly in C, you can add the ext_modules argument to the setup() function call to compile your extension. In our case:

from distutils.core import setup, Extension
setup(name="tortue", version="1.0",
      ext_modules=[Extension("tortue.pythonc",
                             libraries = ['tortue']
                             sources = ['tortue/pythonc.c'])])

SWIG and Pyrex offer extensions to distutils that achieve a similar result. Consult their documentation.

Conclusion#

Interfacing a C library with Python code is not that complicated. Depending on your preferences, many solutions are available. SWIG generates a Python equivalent of your C library; it is then up to you to add some Python to make the result more Pythonic. Pyrex lets you mix Python code with C data and functions while maintaining some safety. Knowledge of the Python API is not required to build advanced extensions. The ctypes module achieves a similar result without access to the library’s source code. It requires some precautions but gets results quickly that can then be consolidated with the other tools. Finally, for the more adventurous, building an extension with the Python/C API directly remains accessible and gives you complete freedom.