Misadventures in Python Packaging: Optional C Extensions

photos/python-packaging-img.jpg

I began an unlikely adventure into Python packaging this week when I made what I thought were some innocuous modifications to the source distribution and setup.py script for the peewee database library. Over the course of a day, the setup.py more than doubled in size and underwent five major revisions as I worked to fix problems arising out of various differences in users environments. This was tracked in issue #1676, may it always bear witness to the complexities of Python packaging!

In this post I'll explain what happened, the various things I tried, and how I ended up resolving the issue.

What happened?

Peewee is a Python library that contains three optional C extensions, written in an intermediate language (Cython), which are converted to C and then compiled into shared libraries. Prior to 3.6.0, Peewee did not ship with the C source files generated from the Cython code, and so the build process would only attempt to compile the C extensions if you had already installed Cython. Presumably this was a good arrangement for most people, as I don't recall receiving many reports of issues.

Shortly after releasing 3.6.0, which included the C sources, I received a ticket (#1676) indicating a user was seeing a fatal error from the compiler when attempting to install peewee:

fatal error: sqlite3.h: No such file or directory

My first attempt at fixing this was to use the ctypes.util.find_library() function to detect whether libsqlite3 was available, and only then would the SQLite-specific extensions be built. After this, I pushed a new release and let everyone know the bug had been fixed.

What do you mean you don't have a compiler?

I received a new report from another user saying that they were seeing a different error because they don't have a C compiler installed. I dug around in the distutils code searching for a way to detect if a compiler was installed, but ended up not finding anything that I felt confident would work on Linux, Mac and Windows. I ended up temporarily adding the following code, which checks for the existence of a compiler by compiling a small C source file:

def have_compiler():
    from distutils.ccompiler import new_compiler
    from distutils.errors import CompileError
    from distutils.errors import DistutilsExecError
    import tempfile
    import warnings
    fd, fname = tempfile.mkstemp('.c', text=True)
    f = os.fdopen(fd, 'w')
    f.write('int main(int argc, char** argv) { return 0; }')
    f.close()
    compiler = new_compiler()
    try:
        compiler.compile([fname])
    except (CompileError, DistutilsExecError):
        warnings.warn('compiler not installed')
        return False
    except Exception as exc:
        warnings.warn('unexpected error encountered while testing if compiler '
                      'available: %s' % exc)
        return False
    else:
        return True

Now there were two layers of checks:

I pushed a new release and once again informed everyone that the bug had been fixed.

If at first you fail, try again

It didn't take long before a new report came up: a user was reporting that they had a C compiler but they didn't have the Python headers, and so couldn't compile the extension. I also received a fresh report that the original "missing sqlite3.h" error was still occurring for some users. I had assumed that if Python were installed, the headers would be as well. Similarly, if libsqlite3 were available, then the sqlite3.h would be present. Apparently this is not the case on many distributions. Back to the drawing board... How to detect the presence of a header file as well?

I got inspiration from the simplejson project, which, like Peewee, has an optional C extension. It does a very simple thing: first it tries to build the project with the C extensions, and if that fails, it falls-back to a pure-python installation. Given all the problems I was having, this seemed like the best approach, so I removed the have_compiler() function and just wrapped the setup() in a conditional.

The first attempt looked something like this:

def _do_setup(c_extensions, sqlite_extensions):
    if c_extensions:
        ext_modules = [speedups_ext_module]
        if sqlite_extensions:
            ext_modules.extend([sqlite_udf_module, sqlite_ext_module])
    else:
        ext_modules = None

    setup(
        name='peewee',
        # ... other arguments ...
        ext_modules=cythonize(ext_modules))


if extension_support:
    try:
        _do_setup(extension_support, sqlite_extension_support)
    except (CompileError, DistutilsExecError, LinkError):
        print('#' * 75)
        print('Error compiling C extensions, C extensions will not be built.')
        print('#' * 75)
        _do_setup(False, False)
else:
    _do_setup(False, False)

When I went to test this on a docker image that didn't have a compiler installed (I was getting smarter by this point) I found that the installation aborted if the first call to _do_setup() failed. I thought I had been catching the appropriate exceptions, but it turns out that the distutils build_ext command will raise a SystemExit exception upon failure and so I'd have to catch that if I wanted to try again.

Catching a SystemExit seemed extreme. Referring back to simplejson, I saw that it implemented a custom build_ext command class which raised a custom error class. I had wondered why they did this the first time I looked at the code and now it made sense: this allowed them to circumvent distutils raising a SystemExit.

The code now looked like this:

class BuildFailure(Exception): pass

class _PeeweeBuildExt(build_ext):
    def run(self):
        try:
            build_ext.run(self)
        except DistutilsPlatformError:
            raise BuildFailure()

    def build_extension(self, ext):
        try:
            build_ext.build_extension(self, ext)
        except (CCompilerError, DistutilsExecError, DistutilsPlatformError):
            raise BuildFailure()

def _do_setup(c_extensions, sqlite_extensions):
    # everything the same except for the inclusion of my custom command class.
    setup(
        # ...
        cmdclass={'build_ext': _PeeweeBuildExt})

if extension_support:
    try:
        _do_setup(...)
    except BuildFailure:  # NOW we can catch the build failure!
        # ...

I tested this new script and finally it appeared to be working!

Everyone's happy but me

At this point we are at release 3.6.4, and people were reporting that the project was installing successfully again. I am very grateful to them for their persistence in uncovering these issues and their patience while I fixed them. The end-result, though, isn't very aesthetically satisfying.

In a fit of pique, I decided to make one final addition to the script. I was bothered by the fact that my SQLite3 detection was flawed and wanted a more robust way to differentiate whether a build failure was due to a general inability to compile Python C extensions, or specifically missing the SQLite headers.

So I removed the ctypes.util.find_library() function (which didn't work anyways) and replaced it with a small function that actually attempted to include "sqlite3.h" and link against libsqlite3. The function looks like this (inspired by this StackOverflow answer):

def _have_sqlite_extension_support():
    import shutil
    import tempfile
    from distutils.ccompiler import new_compiler
    from distutils.sysconfig import customize_compiler

    libraries = ['sqlite3']
    c_code = ('#include <sqlite3.h>\n\n'
              'int main(int argc, char **argv) { return 0; }')
    tmp_dir = tempfile.mkdtemp(prefix='tmp_pw_sqlite3_')
    bin_file = os.path.join(tmp_dir, 'test_pw_sqlite3')
    src_file = bin_file + '.c'
    with open(src_file, 'w') as fh:
        fh.write(c_code)

    compiler = new_compiler()
    customize_compiler(compiler)
    success = False
    try:
        compiler.link_executable(
            compiler.compile([src_file], output_dir=tmp_dir),
            bin_file,
            libraries=['sqlite3'])
    except CCompilerError:
        print('unable to compile sqlite3 C extensions - missing headers?')
    except DistutilsExecError:
        print('unable to compile sqlite3 C extensions - no c compiler?')
    except DistutilsPlatformError:
        print('unable to compile sqlite3 C extensions - platform error')
    else:
        success = True
    shutil.rmtree(tmp_dir)
    return success

You can view the full setup.py in all it's baroque glory on GitHub.

What have we learned?

Here are some assumptions you might want to check if you're packaging a library with optional C extensions:

Comments (0)


Commenting has been closed, but please feel free to contact me