How we improved Python Packaging & Distribution

Published on January 27, 2014, by Sebastian


Over the past 5 months, we invested a lot of time into making the installation of RhodeCode Enterprise as stable, simple and fast as possible. With this blog post I want to share more in detail what we learned and what we changed. I hope that it may help other Python developers in improving the way they bundle and distribute their applications. Especially when their audience not necessarily consists 100% of experienced Python developers but also the normally busy developer, project manager or team lead who just wants to install, upgrade and run a Python application which makes his and his team’s life easier.

But let me start at the beginning: summer 2013. Marcin and I just decided to release RhodeCode Enterprise and he and his team were from then on busy in adding requested features, improving the UI, and so on. On the other side of the table was me, the more business-and-UX-oriented guy. Over the past 2 years I started to feel comfortable with Python and developed several projects with it or lead the development in the CTO role. On the other side, I am still less than 20% as skilled in Python as Marcin (RhodeCode’s creathor & CTO) is - and that started to become a problem as I dove into making RhodeCode a commercial success.

Python Packaging is Confusing

The main issue for me was not RhodeCode itself, the issue was that I was not even able to install it, compile dependencies, add packages and adjust settings under Linux reliably! Not speaking of my desperate attempts to install RhodeCode under Windows: in best case an adventure, in reality a nightmare. If me, the not-so-technically-unskilled CEO of the company, is failing to achieve an installation of our own product, how should any new busy corporate user with maybe no Python skills at all just get started with RhodeCode Enterprise? And then even pay for it?!

confusing packaging

As I digged deeper, I learned that the issues were even more complex. When I started with Python I was confused by all the different options of packaging like PyPI, Pip, setuptools, virtualenv, bdist, eggs, and easy_install. Stackoverflow was full of cries for help and it seemed to me that the whole Python packaging landscape was evolving since some tools become deprecated, some seemed to have merged and other were just too hard for me to learn within a reasonable amount of time.

Marcin and I decided that we needed dramatical improvements in stability, simplicity and performance of the RhodeCode installation and upgrade process for all major platforms and so I started the RhodeCode Installer project. It iteratively tackled the Python-related installation and upgrade issues of RhodeCode.

PyPI Availability

Until August 2013, RhodeCode was purely hosted as installable package on PyPI, the Python Package Index. PyPI is in the end a bunch of servers where you upload a Python package/application.

I think PyPI is non-profit and maybe that’s the reason why the PyPI servers are unavailable at least once a day (often I run into connection issues, so it may be more often). This is not great if you are in the middle of an installation and some of the downloaded packages just fail. Additionally PyPI had some SSL issues and other technical problems. Marcin fixed the server availability issues by moving away from using the official PyPI server towards hosting on our own PyPI mirror server pyramidpypi. These servers held all the necessary dependencies of RhodeCode Enterprise as cached version and the Installer downloaded from there from now on more reliably.

Pip & Sandboxing

PyPI is accompanied by the tool Pip which can be used to download, compile and install packages which are hosted on PyPI, a mirror server or the file system.

Here are 3 example Pip calls:

 pip install SomePackage==1.0
 pip install --upgrade SomePackage
 pip uninstall SomePackage

Looks easy, right? Unfortunately, that process has some major flaws which you just run into when you have many thousands of new and existing users which express their frustration about the installation/upgrade in a lot of support tickets. The second most common installation issue our users run into was incompatible Pip versions on their systems.

In the past, RhodeCode was installed in a Python application sandbox by a tool called Virtualenv which put all necessary packages, including Pip itself, into a subfolder of the project. As more and more RhodeCode users started to upgrade their installations they run into odd compatibility issues with Pip and for me, as outsider, it felt as if Pip changed too strongly on every new major release (1.2, 1.3, 1.4). We also had issues with old Virtualenv versions and in general our users often ended up with broken or totally mangled sandboxes after upgrading their RhodeCode installations.

We fixed these compatibility issues in a drastical way: we delete the whole application sandbox!

Starting with RhodeCode Installer version 0.6.0 we are doing the following steps on each(!) installation or upgrade of RhodeCode Enterprise:

  1. delete the existing virtualenv application sandbox
  2. download a fully tested pip and virtualenv installer from our own servers
  3. create a new sandbox
  4. download RhodeCode and all dependencies from our PyPI mirror server
  5. compile and install all dependencies for the platform
  6. optionally run database migrations, etc.

Since we are now removing any old, uncompatible version of Pip or Virtualenv we do not run into any issues anymore. Actually every upgrade of RhodeCode Enterprise is now treated like a fresh installation of all packages; just the database migration is done additionally.

Introducing Wheel

By introducing above’s 2 major changes we already dramatically improved the stability and simplicity of the installation of RhodeCode Enterprise. The RhodeCode Installer automates all necessary steps and hides all the complexity behind them. But we were still not fully satisfied since the installation still sometimes failed due to undocumented changes in 3rd party packages. Also the installation under Linux still took 12 - 15 minutes which was inacceptable from my opinion.

awesome wheel What we needed was a better way of packaging and shipping fixed package versions where we really knew that RhodeCode Enterprise would perfectly work with. In the meantime, I spend a lot of time in improving our Windows installation (see below) and became spoiled by the speed and reliability of our new way of installing under Windows. For Linux, I wanted something similar and wanted to remove the need to download uncompiled packages and then let the user wait 15 minutes until they were all compiled on his platform.

Wheels entering stage.

With RhodeCode Installer 0.7.0 we are introducing the installation of packages by using pre-compiled packages called wheels. Wheel is a relatively new project in the Python world which makes the distribution of pre-compiled packages far, far easier.

We learned over the last months that the vast majority of our users either use Windows or Linux, whereby Linux is used on 32bit and 64bit servers with Python 2.6 or Python 2.7 installed. For Windows we already have an even better solution than Wheels, so we "just" needed to pre-compile/build all required packages under the 4 Linux platform combinations.

Starting with RhodeCode Installer version 0.7.0 we are doing the following steps on each installation or upgrade of RhodeCode Enterprise:

  1. delete the existing virtualenv application sandbox
  2. download a fully tested pip and virtualenv installer from our own servers
  3. create a new sandbox
  4. download RhodeCode and all dependencies as pre-compiled packages (wheels) from our server
  5. install all packages
  6. optionally run database migrations, etc.

By removing the need for compiling on most of the platforms we could reduce the whole installation process from 15 minutes to under 1 minute which is a time saving of more than 95%! But the same as important is that the installation on the platform is now really always working since it was fully tested by us. Optionally you can still install RhodeCode Enterprise in build mode by running the Installer with the "--build" flag.

But let's dive more into the details of wheels. Here is the way we are bundling our package. We are running this (or a similar) procedure on each of the 4 platform combinations:

 # example of Wheels creation for 32bit Python 2.7 platform
 # ---------------------------------------------------------

 pip --version  # upgrade if necessary
 pip install wheel

 # install RhodeCode Enterprise to get all compile dependencies (break after apt-get calls)
 mkdir ~/rhodecode
 cd ~/rhodecode
 curl -O https://rhodecode.com/dl/rhodecode-installer.py
 sudo python rhodecode-installer.py

 # create pre-compiled wheels of all stable online packages
 sudo pip wheel --wheel-dir=/tmp/wheels_rce psycopg2 MySQL-python python-ldap
 sudo pip wheel --wheel-dir=/tmp/wheels_rce https://path_to_our_stable_source_code
 sudo pip wheel --wheel-dir=/tmp/wheels_rce https://path_to_our_stable_tools

 # download the wheels into the shared wheels_rce_linux folder for all platforms
 scp root@1.2.3.4:/tmp/wheels_rce/* ~/wheels_rce

 # create a zip from the whole folder and upload it to our server

And this is what the RhodeCode Installer is running to install the wheels after having downloaded and unzipped them and after having created a sandbox with a fully tested Pip with added wheels support:

PATH_TO_SANDBOX/pip --use-wheel --no-index --find-links=/tmp/wheels_rce rhodecode psycopg2 MySQL-python python-ldap rhodecode-tools

This line will move all required packages for the current platform into the RhodeCode Enterprise application sandbox without the need of compiling!

Windows

The installation under Windows was a tedious, error prone task and required a lot of patience. The user needed to download and install Visual Studio 2008, compile all dependencies, install many 3rd party packages by hand and then wire everything together to make RhodeCode run.

We completely re-thought this installation process and switched to a fully bundled offline installation. We started to use Innosetup which creates these typical Windows installation wizards and use it to distribute all requirements and under Windows pre-compiled packages in a single bundle. Innosetup even let it adjust the Windows PATH environment variable which is necessary for Python to run properly.

Thanks to Innosetup and our pre-compiling we could reduce the 30 minute installation to a 1 minute installation where you just click 5 times "Next" to install all dependencies! RhodeCode Enterprise itself is included in the bundle and the Installer is just needed to initially setup the database.

Summary

Packaging and reliably distributing a complex Python application towards a multi-platform audience is tricky but with the help of an own PyPI server, fixed versions of Pip and Virtualenv, Python Wheels, an Installer which hides the complexity and Innosetup bundles under Windows we achieved a far more stable, simple and fast installation and upgrade of the Python application RhodeCode Enterprise.

Thanks to all which sent us Installer feature requests and issue reports for certain platforms, stacks and network settings. We will continue to improve the Installer and already have some new ideas how to make it faster and even more convenient.

Please keep on sending us your feedback and thanks again for your ongoing support!

Sebastian