- An Introduction to the Virtualenv Sandbox
- Distutils: Packaging, Metadata and Pushups
- Roadmap of Topics
- What is the Problem?
- A Bit of History
- What is a Distribution?
- What is a Distribution? (cont'd)
- Distribution: Source versus Binary
- distutils: The Intended Audience
- distutils: Command-Line Invocation
- distutils: Usage Defaults Configuration
- Invocation: Installing from Source
- Invocation: Building a Binary Distribution
- Invocation: Building a Binary (formats)
- Invocation: Building a Source Distribution
- Applying distutils to your software
- Project Directory Layout - Single File
- Project Directory Layout - Package
- Project Directory Layout - Multi-Package
- setup.py - Basic Metadata
- setup.py - Source Module(s)
- setup.py - Extension Module(s)
- setup.py - Executable Scripts
- setup.py - Package-Relative Datafiles
- setup.py - Arbitrary Datafiles
- setup.py - Extra Metadata (ownership)
- setup.py - Extra Metadata (descriptive)
- setup.py - Extra Metadata (classifiers)
- setup.py - Expressing Dependencies
- Sources and Uses of Metadata
- MANIFEST for Source Distributions
- About Package Servers in General
- Package Servers: About The Cheeseshop
- Package Servers: Posting Metadata
- Package Servers: Stashing Credentials
- Package Servers: Posting Distributions
- Features Missing from Distutils
- For More Information
- Buildout: Precision Assembly, Repeatability, Islands
- Roadmap to Talk
- The Benefits of Buildout
- What is Buildout?
- How does Buildout Work?
- The Concepts of Buildout
- Concepts: What is a Specification?
- Concepts: What is a Part?
- Concepts: What is a Recipe?
- Getting Started with Buildout
- Getting Started: Installing Buildout
- Getting Started: Project Directory Structure
- Getting Started: Project Directory Structure (cont'd)
- Getting Started: Specification File (syntax)
- Getting Started: Specification File (references)
- Getting Started: Specification File (includes)
- Getting Started: Controlling Versions
- Getting Started: About Caches
- Getting Started: How Distributions Are Found
- Getting Started: The Command-Line
- Getting Started: Configuration of Defaults
- About Recipes
- Recipes: Egg Installation and Scripts
- Recipes: Customizing the Building of Eggs
- Recipes: Generating Scripts for Eggs
- Recipes: How Eggs Are Activated Within Scripts
- Recipes: Baking Paths into Python Interpreter Scripts
- Recipes: Pulling Subversion Files into Buildout
- Recipes: Using a Non-Egg Archive of Python Source
- Use Case: Starting a New Project
- Use Case: Picking Up a Buildout Project
- Use Case: Buildout Around Other Projects
- Use Case: Making Your Project Buildout-Aware
- Use Case: Distributing Project as Egg
- Use Case: Production Deployment (non-RPMs)
- Use Case: Production Deployment (RPMs)
- For More Information
An Introduction to the Virtualenv Sandbox
| Author: | Jeff Rush <jeff@taupro.com> |
|---|---|
| Copyright: | 2008 Tau Productions Inc. |
| License: | Creative Commons Attribution-ShareAlike 3.0 |
| Date: | March 13, 2008 |
| Series: | Python Eggs and Buildout Deployment - PyCon 2008 Chicago |
A short intro to sandboxing your Python development work using the virtualenv tool and as a prerequisite, the steps for setting up EasyInstall to download it.

Roadmap of Topics
- An Introduction to the Virtualenv Sandbox
- Distutils: Packaging, Metadata and Pushups
- Roadmap of Topics
- What is the Problem?
- A Bit of History
- What is a Distribution?
- What is a Distribution? (cont'd)
- Distribution: Source versus Binary
- distutils: The Intended Audience
- distutils: Command-Line Invocation
- distutils: Usage Defaults Configuration
- Invocation: Installing from Source
- Invocation: Building a Binary Distribution
- Invocation: Building a Binary (formats)
- Invocation: Building a Source Distribution
- Applying distutils to your software
- Project Directory Layout - Single File
- Project Directory Layout - Package
- Project Directory Layout - Multi-Package
- setup.py - Basic Metadata
- setup.py - Source Module(s)
- setup.py - Extension Module(s)
- setup.py - Executable Scripts
- setup.py - Package-Relative Datafiles
- setup.py - Arbitrary Datafiles
- setup.py - Extra Metadata (ownership)
- setup.py - Extra Metadata (descriptive)
- setup.py - Extra Metadata (classifiers)
- setup.py - Expressing Dependencies
- Sources and Uses of Metadata
- MANIFEST for Source Distributions
- About Package Servers in General
- Package Servers: About The Cheeseshop
- Package Servers: Posting Metadata
- Package Servers: Stashing Credentials
- Package Servers: Posting Distributions
- Features Missing from Distutils
- For More Information
- Buildout: Precision Assembly, Repeatability, Islands
- Roadmap to Talk
- The Benefits of Buildout
- What is Buildout?
- How does Buildout Work?
- The Concepts of Buildout
- Concepts: What is a Specification?
- Concepts: What is a Part?
- Concepts: What is a Recipe?
- Getting Started with Buildout
- Getting Started: Installing Buildout
- Getting Started: Project Directory Structure
- Getting Started: Project Directory Structure (cont'd)
- Getting Started: Specification File (syntax)
- Getting Started: Specification File (references)
- Getting Started: Specification File (includes)
- Getting Started: Controlling Versions
- Getting Started: About Caches
- Getting Started: How Distributions Are Found
- Getting Started: The Command-Line
- Getting Started: Configuration of Defaults
- About Recipes
- Recipes: Egg Installation and Scripts
- Recipes: Customizing the Building of Eggs
- Recipes: Generating Scripts for Eggs
- Recipes: How Eggs Are Activated Within Scripts
- Recipes: Baking Paths into Python Interpreter Scripts
- Recipes: Pulling Subversion Files into Buildout
- Recipes: Using a Non-Egg Archive of Python Source
- Use Case: Starting a New Project
- Use Case: Picking Up a Buildout Project
- Use Case: Buildout Around Other Projects
- Use Case: Making Your Project Buildout-Aware
- Use Case: Distributing Project as Egg
- Use Case: Production Deployment (non-RPMs)
- Use Case: Production Deployment (RPMs)
- For More Information
Installation Concerns
In this class we'll use virtualenv to create sandboxes for our exercises, to avoid disrupting the system installation of Python.
To simplify the installation of virtualenv, we'll first install EasyInstall, bundled with setuptools, which I'll cover much more in-depth in the section on eggs.
You need a relatively modern version of Python, and access to the Internet for retrieving the necessary files. Because we'll be installing this system-wide (i.e. the sandboxing tools cannot themselves be inside the sandbox), you also need administrator privileges on your system.
These two packages, EasyInstall and Virtualenv, are being installed into the system site-packages directory, of the instance of Python with which you invoke it. This is true of a lot of tools we'll use today.
For those with multiple versions of Python on their system, to distinguish beween them, the tools are installed with a suffix of the version of Python used.
About Virtualenv
Written by Ian Bicking, virtualenv allows you to set up an isolated Python environment whose libraries do not affect programs outside it, making it a good choice for experimenting with new packages or to deploy different programs with conflicting library requirements.
To maintain the isolation, Python programs must be run from the "bin/" subdirectory ("Scripts/" under Windows).
Normally your virtualenv sandbox is isolated from your system Python, but requests for modules not found within the sandbox flow through to the system Python. This means if you install extra software into the system Python it will automatically become available in all virtualenv sandboxes.
There is an option, --no-site-packages that changes this behavior to exclude the system Python from all failing searches, for stronger isolation at the expense of having to explicitly install more dependencies.
Steps: Installing EasyInstall then Virtualenv
Using your favorite tool, download ez_setup.py into a temporary directory and run it. This will download and install the appropriate setuptools egg for your Python version, and create a new system command easy_install.
Note: Windows users; do NOT put ez_setup.py inside your Python installation. Use a temporary directory elsewhere.
The "sudo" command is the Unix-way of running the "easy_install" command with administrator privileges.
Creating and Using Sandboxes
To create a sandbox, run the virtualenv command and pass to it the pathname of a new directory in which you want it to reside.
If you want stronger isolation from the system Python, use the --no-site-packages option to omit the system packages from the search path for your sandbox.
To run within the sandbox, run the Python interpreter in the bin directory. If you prefer to avoid typing the "bin/" prefix, virtualenv provides the "activate" command to rewrite the search paths for your shell session. The command "deactivate" reverses these changes.
Note: the "activate" command for virtualenv has nothing to do with the concept of activating or de-activating a Python egg, which means placing it onto the sys.path using a .pth file.
Note: On Windows you shouldn't create a virtualenv sandbox in a path with a space in any name.
Directory Layout of a Sandbox
Note: On Windows, the directory names are slightly different, with "Scripts/" being used for "bin/", and "Lib/" being used for "lib/".
Warning: Virtualenv is currently incompatible with a system-wide distutils.cfg and per-user ~/.pydistutils.cfg. If you have either of these files, virtualenv will put the easy_install command into the bin/ directory specified in that config file, rather than into the sandbox where it belongs.
Distutils: Packaging, Metadata and Pushups
| Author: | Jeff Rush <jeff@taupro.com> |
|---|---|
| Copyright: | 2008 Tau Productions Inc. |
| License: | Creative Commons Attribution-ShareAlike 3.0 |
| Date: | March 13, 2008 |
| Series: | Python Eggs and Buildout Deployment - PyCon 2008 Chicago |
An introduction to the distutils module that provides a standard way of building, distributing and installing one or a group of Python modules, across platforms. distutils has shipped as part of the Python standard library since version 1.6.

Roadmap of Topics
- An Introduction to the Virtualenv Sandbox
- Distutils: Packaging, Metadata and Pushups
- Roadmap of Topics
- What is the Problem?
- A Bit of History
- What is a Distribution?
- What is a Distribution? (cont'd)
- Distribution: Source versus Binary
- distutils: The Intended Audience
- distutils: Command-Line Invocation
- distutils: Usage Defaults Configuration
- Invocation: Installing from Source
- Invocation: Building a Binary Distribution
- Invocation: Building a Binary (formats)
- Invocation: Building a Source Distribution
- Applying distutils to your software
- Project Directory Layout - Single File
- Project Directory Layout - Package
- Project Directory Layout - Multi-Package
- setup.py - Basic Metadata
- setup.py - Source Module(s)
- setup.py - Extension Module(s)
- setup.py - Executable Scripts
- setup.py - Package-Relative Datafiles
- setup.py - Arbitrary Datafiles
- setup.py - Extra Metadata (ownership)
- setup.py - Extra Metadata (descriptive)
- setup.py - Extra Metadata (classifiers)
- setup.py - Expressing Dependencies
- Sources and Uses of Metadata
- MANIFEST for Source Distributions
- About Package Servers in General
- Package Servers: About The Cheeseshop
- Package Servers: Posting Metadata
- Package Servers: Stashing Credentials
- Package Servers: Posting Distributions
- Features Missing from Distutils
- For More Information
- Buildout: Precision Assembly, Repeatability, Islands
- Roadmap to Talk
- The Benefits of Buildout
- What is Buildout?
- How does Buildout Work?
- The Concepts of Buildout
- Concepts: What is a Specification?
- Concepts: What is a Part?
- Concepts: What is a Recipe?
- Getting Started with Buildout
- Getting Started: Installing Buildout
- Getting Started: Project Directory Structure
- Getting Started: Project Directory Structure (cont'd)
- Getting Started: Specification File (syntax)
- Getting Started: Specification File (references)
- Getting Started: Specification File (includes)
- Getting Started: Controlling Versions
- Getting Started: About Caches
- Getting Started: How Distributions Are Found
- Getting Started: The Command-Line
- Getting Started: Configuration of Defaults
- About Recipes
- Recipes: Egg Installation and Scripts
- Recipes: Customizing the Building of Eggs
- Recipes: Generating Scripts for Eggs
- Recipes: How Eggs Are Activated Within Scripts
- Recipes: Baking Paths into Python Interpreter Scripts
- Recipes: Pulling Subversion Files into Buildout
- Recipes: Using a Non-Egg Archive of Python Source
- Use Case: Starting a New Project
- Use Case: Picking Up a Buildout Project
- Use Case: Buildout Around Other Projects
- Use Case: Making Your Project Buildout-Aware
- Use Case: Distributing Project as Egg
- Use Case: Production Deployment (non-RPMs)
- Use Case: Production Deployment (RPMs)
- For More Information
What is the Problem?
We want a mechanism standardized within the Python community for building, packaging, distributing, and installing one or more modules that may consist of Python source, compiled C source and bundled data files.
A solution should do all this in a manner that works across operating system platforms, plays nicely with existing packaging technologies, and provides for an easily extensible set of distribution file formats and special processing commands.
A Bit of History
Scattered work on a solution to meet these requirements had been underway for quite a few years but in 1998 started to come together under Greg Ward at IPC7. This led to a version of distutils shipping with Python 1.6.
There is an old version linked to from the distutils-sig page but it is out-of-date. The official version is that which ships with Python.
The setuptools module that is the basis for eggs and the buildout deployment tool both make heavy use of distutils and leverage its concepts.
What is a Distribution?
distutils are tools for distributions so let's define what one is.
A "distribution" is a single downloadable resource made up of one or more modules, each of which may be a single .py file, a directory of modules organized into a Python package, or a C/C++ extension. These modules may be independent or unrelated sibling packages - no relationship is assumed.
It may contain executable Python scripts which get installed as additional system commands, depending upon the operating system.
And it may contain data files, such as icons or reference material.
All of which are intended to be installed or uninstalled together.
Note: Collections of unrelated modules are given their own directory by distutils, and a path configuration (.pth) file to add it to sys.path at run-time.
What is a Distribution? (cont'd)
A distribution always has a project name, version number and a controlling setup.py file in the root of the distribution source tree.
Internally a distribution may be pure or platform-neutral, indicating it can be used unchanged across operating systems. If it has at least one C/C++ extension module, it is considered non-pure.
distutils generates distributions in two flavors, source distributions for sharing with other developers, and binary distributions for non-developers.
A binary distribution is an actual or potential sys.path entry, along with its metadata. Distributions can be activated or deactivated, by being placed onto sys.path or not, usually by use of .pth files. We'll cover this mechanism in a lot more detail.
Distribution: Source versus Binary
Binary distributions are intended for installation into environments that lack development tools such as a C/C++ compiler, and usually come in the form of existing package technologies, such as RPMs, .debs, .msi, .dpg.
Besides being identified by a project name and version, the Python version used in its build is relevant as well, because of differing binary APIs.
Binary distributions by their nature get "installed"; they are past the "build" and "package" stages.
Metadata about its characteristics, its ownership and dependencies is retained past the build process into the installation itself, so that othat packages that depend upon it can know of its presence and version.
Source distributions are targeted at developers who will typically have development tools. While pure distributions won't require them, non-pure ones will.
Source distributions are basically some form of compressed archive containing all the necessary source to build and install the project.
Being not yet compiled, they can be identified solely by the project name and version, and are fed into the "build" stages to get either installed or packages.
They are a source of metadata about distributions, which originates from keywords passed to the setup() function in setup.py and, for RPMs, from entries in the setup.cfg file.
distutils: The Intended Audience
distutils is intended to be used by several different audiences.
The non-developer who just wants to install some software, either on an individual desktop or perhaps corporate-wide using automated tools.
A packager who collects and organizes useful software, builds it for specific environments and makes it available in repositories.
The original developer who wants to make available his work in as easy to use form at possible.
An installer can install a binary distribution (distutils is out of picture re RPMs) or can act as packager and build, then install from a source distribution can control where it gets installed. He also may require admin privs for system-wide places
A developer writes the setup.py script, that supplies the options that the installer cannot know: distribution meta-data, which modules and extensions are present in the distribution, and where they go in the space of Python modules.
distutils: Command-Line Invocation
distutils makes its appearance in a project via a customized Python source file named setup.py placed in the project root directory.
A minimal such file looks like this. To interactively follow along with me, you may want to create such a file if you don't have one handy from an existing project.
Invocation of distutils is command-line driven as follows. You change into the directory of the project and pass setup.py explicitly to the Python interpreter.
Some of the key global options, their meaning pretty much self-explanatory, are --verbose, --quiet, --dry-run, and --help.
distutils accepts one or more commands per invocation, each with a variable number of arguments. The boundaries are found by matching arguments against commands in the mapping.
To obtain a list of defined commands, use the --help-commands option. This list can vary since distutils supports extension through addition of new commands.
Each command has its own help which can be accessed like this.
To get internal processing details while distutils is working, define the DISTUTILS_DEBUG=yes environment and invoke setup.py. Any error will now give you a traceback telling you more about the who and why.
distutils: Usage Defaults Configuration
distutils reads configuration related to command invocation from up to three optional locations, in the following order, with the last options read taking precedence.
system-wide: a distutils.cfg file within the distutils module directory; no such file is provided by default
per-user: a .pydistutils.cfg (POSIX) or pydistutils.cfg (non-POSIX) file in the user's $HOME directory
per-project: a setup.cfg file in the same directory as the setup.py file
The format of configuration files is that accepted by the ConfigParser module, with named sections and name = value assignments. Besides default options both globally and for particular commands, there are powerful things you can do such as creating aliases to sets of commands + options.
The command to build an RPM, the bdist_rpm command, needs additional metadata that is specific to RPMs. The setup.cfg file is used in this case to supply that data:
distribution-name=Red Hat Linux group=Development/Libraries etc.
You can provide default arguments in your config file.
Invocation: Installing from Source
Invocation: Building a Binary Distribution
Invocation: Building a Source Distribution
The sdist command extracts metadata and writes it to a file by the name of PKG-INFO in the top directory of the generated zipfile or tarball. This file is a single set of RFC822 headers parseable by the rfc822.py module. http://www.python.org/dev/peps/pep-0241/
This PKG-INFO file is what gets POST'd to the index server.
setup.py - Basic Metadata
The setup.py file is just Python source, that invokes a single function setup(), with a variety of keyword arguments. This allows you to bring to bear on the configuration problem the full power of Python, without having to learn another configuration-specific language.
The setup.py file should not be marked executable as it is generally invoked with explicit reference to a Python interpreter. This is because the binary formats generated and the directory locations written to are specific to that instance of interpreter.
In general it is best to obtain the version of your distribution from within your source code, to avoid duplication of information.
setup.py - Source Module(s)
The "py_modules" keyword provides a list of modules, specified NOT by filename but module name.
Such names are relative to the setup.py file itself. To locate modules within non-package subdirectories, use the "package_dir" keyword, which is a mapping of module names to directory paths. These paths are written in the Unix convention, i.e. slash-separated. A module name of empty string is the root package of all Python packages.
All packages must be explicitly listed; distutils will not recursively scan your source tree or package hierarchy looking for any directory with an __init__.py file. The setuptools module provides a find_packages() function however that does.
setup.py - Extension Module(s)
setup.py - Executable Scripts
Scripts are files containing Python source code, intended to be started from the command line. The "scripts=" keyword specifies a list of paths to the scripts.
These scripts get installed in the system command area and, because they are not renamed by version, may conflict if other distributions use the same name. This makes it difficult to install multiple versions of a distribution.
distutils takes care of marking the scripts executable for POSIX.
If the first line of the script starts with #! and contains the word "python", distutils will adjust the first line to refer to the current interpreter location. This value can be overridden with the --executable option to the setup.py invocation..
setup.py - Package-Relative Datafiles
The package_data keyword is a mapping from package name to a list of pathnames (relative to the package) of files to copy into the package.
The path names may contain directory portions; any necessary directories will be created in the installation.
setup.py - Arbitrary Datafiles
The data_files keyword is for placement of datafiles unrelated to any specific Python package.
You can specify any destination directory for a file, but no mechanism is provided to rename it.
setup.py - Extra Metadata (descriptive)
setup.py - Extra Metadata (classifiers)
The classifiers keyword is a list of strings, representing official tags used by the Cheeseshop, derived from the trove concept of discrimination. This list of classification values has been merged from FreshMeat and SourceForge (with their permission).
The official list at any time can be retrieved from the Cheeseshop with the --list-classifiers option to the register command.
About Package Servers in General
To make the existence of distributions visible to others in an automated form, suitable for dependency resolution, there are package servers.
For Perl and PHP, respectively, there are the CPAN and PEAR package servers. The one for Python is named the Cheeseshop or sometimes the Package Index (PyPI).
A package server may serve one or more kinds of information:
An index server holds records containing metadata about many different distributions, including a URL where the actual distribution files may be found. Index servers are cross-project in nature.
An upload server receives distribution files, both source and binary, along with metadata, from developers and makes them available for download from a centralized place.
A link server holds HTML about a specific or related set of projects, which has links sprinkled on it that point to actual distribution files. Link servers are usually browseable project sites, and are used in connection with buildout.
Package Servers: About The Cheeseshop
The PyPI server is both an index server and an upload server. It is developer discretion whether to keep actual distributions on it or just rely upon the URL in the metdata to point to a project website.
distutils only knows how to push up to PyPI, both metadata with the "register" command and distributions (source and binary) with the "upload" command. Later we'll see how setuptools and buildout add the capability to pull from PyPI.
User accounts on PyPI can be obtained by visiting the site in a browser, or by pushing up metadata for a project and being prompted by setup.py.
The source to PyPI is available for running your own, such as behind a corporate firewall.
Package Servers: Posting Metadata
Here we see use of the register command for pushing metadata for a distribution up to a package server, by default PyPI.
Sending data to a package server requires a username and password. distutils will prompt for an existing one, permission to create a new one or reset your password and have PyPI email a new, random one to you.
Within PyPI, entries are uniquely identified by the (projectname, version).
A package server such as PyPI, besides checking the correctness of metadata, enforces a minimum set of fields.
Package Servers: Stashing Credentials
Upon completion, the register command asks if you want to save the username and password entered into a local configuration file.
This file is NOT one of the distutils configuration files but one specific to PyPI.
Or you can place the information in the file yourself.
Oddly, the "repository" field is only used with the upload command which we cover next, not the register command. To convince register to use a repository other than PyPI, add something like this to one of the distutils configuration files.
Package Servers: Posting Distributions
The "upload" command is used to upload actual distribution files. Since there are many potential distribution flavors and formats, the choice of what to upload is given earlier in the command-line.
Uploads can be signed with a GnuPGP key by adding the --sign option. There is also an --identity option that supplies a user ID to pass to the GnuPG tools.
To upload a binary distribution instead or in addition to a source distribution, use the bdist command.
Uploads can also be performed manually by visiting the PyPI website.
Submitting a distribution file automatically submits the metadata.
A new release of a package hides all previous releases, wrt listings and searches. You can manually override this by visiting the PyPI website, until the next submission of metadata.
For More Information
the Distutils-SIG and Mailing List
"Distributing Python Modules" (guide for developers)
"Installing Python Modules" (guide for sysadmins)
Distutils Cookbook - Collection of Recipes
Community Wiki for Distutils (links to useful info)
Source to Python Package Index
"Cleaning Up PyBlosxom Using Cheesecase"
Buildout: Precision Assembly, Repeatability, Islands
| Author: | Jeff Rush |
|---|---|
| Copyright: | 2008 Tau Productions Inc. |
| License: | Creative Commons Attribution-ShareAlike 3.0 |
| Date: | March 13, 2008 |
| Series: | Python Eggs and Buildout Deployment - PyCon 2008 Chicago |
A follow-on to the setuptools talk introducing the buildout tool that uses parts specifications to repeatably bring together specific combinations and versions of Python eggs, along with non-Python elements, into controlled islands of development and deployment.

Roadmap to Talk
- An Introduction to the Virtualenv Sandbox
- Distutils: Packaging, Metadata and Pushups
- Roadmap of Topics
- What is the Problem?
- A Bit of History
- What is a Distribution?
- What is a Distribution? (cont'd)
- Distribution: Source versus Binary
- distutils: The Intended Audience
- distutils: Command-Line Invocation
- distutils: Usage Defaults Configuration
- Invocation: Installing from Source
- Invocation: Building a Binary Distribution
- Invocation: Building a Binary (formats)
- Invocation: Building a Source Distribution
- Applying distutils to your software
- Project Directory Layout - Single File
- Project Directory Layout - Package
- Project Directory Layout - Multi-Package
- setup.py - Basic Metadata
- setup.py - Source Module(s)
- setup.py - Extension Module(s)
- setup.py - Executable Scripts
- setup.py - Package-Relative Datafiles
- setup.py - Arbitrary Datafiles
- setup.py - Extra Metadata (ownership)
- setup.py - Extra Metadata (descriptive)
- setup.py - Extra Metadata (classifiers)
- setup.py - Expressing Dependencies
- Sources and Uses of Metadata
- MANIFEST for Source Distributions
- About Package Servers in General
- Package Servers: About The Cheeseshop
- Package Servers: Posting Metadata
- Package Servers: Stashing Credentials
- Package Servers: Posting Distributions
- Features Missing from Distutils
- For More Information
- Buildout: Precision Assembly, Repeatability, Islands
- Roadmap to Talk
- The Benefits of Buildout
- What is Buildout?
- How does Buildout Work?
- The Concepts of Buildout
- Concepts: What is a Specification?
- Concepts: What is a Part?
- Concepts: What is a Recipe?
- Getting Started with Buildout
- Getting Started: Installing Buildout
- Getting Started: Project Directory Structure
- Getting Started: Project Directory Structure (cont'd)
- Getting Started: Specification File (syntax)
- Getting Started: Specification File (references)
- Getting Started: Specification File (includes)
- Getting Started: Controlling Versions
- Getting Started: About Caches
- Getting Started: How Distributions Are Found
- Getting Started: The Command-Line
- Getting Started: Configuration of Defaults
- About Recipes
- Recipes: Egg Installation and Scripts
- Recipes: Customizing the Building of Eggs
- Recipes: Generating Scripts for Eggs
- Recipes: How Eggs Are Activated Within Scripts
- Recipes: Baking Paths into Python Interpreter Scripts
- Recipes: Pulling Subversion Files into Buildout
- Recipes: Using a Non-Egg Archive of Python Source
- Use Case: Starting a New Project
- Use Case: Picking Up a Buildout Project
- Use Case: Buildout Around Other Projects
- Use Case: Making Your Project Buildout-Aware
- Use Case: Distributing Project as Egg
- Use Case: Production Deployment (non-RPMs)
- Use Case: Production Deployment (RPMs)
- For More Information
The Benefits of Buildout
In the prior talks on distutils and then setuptools, we focused on creating distributions of reusable modules, more in the sense of libraries. Buildout takes us in a different direction, using those packaging capabilities to bring together sets of distributions into whole applications in a controlled manner.
These applications can be deployed as self-contained source releases and RPMs in ways that facilitate operation by experienced Unix system administrators. Prior to deployment however, buildout is a useful tool in the development phase as well.
The buildout tool is based on the premise that installing distributions into the system instance of Python is, for a developer, a bad thing that leads to conflicts and unknown interactions with packages not under control of buildout. For this reason, buildout relies upon sandboxes or "islands of development", similar to how virtualenv work. In fact it can be used along with virtualenv.
Buildout is based on the idea of engineering blueprints; that an architect can rigorously specify the parts that go into an assembly and construct a product in a repeatable fashion. The word "buildout" comes from the manufacturing industry and refers to a specification of a set of parts and instructions on how to assemble them.
Note that parts could still behave differently due to changes in parts of the environment, such as system libraries, not controlled by the buildout.
Unlike the packaging tools covered previously, buildout encompasses not just Python software but non-Python elements such as configuration sets, multiple programs, Apache instances, database servers and so forth.
As a result of this, it is NOT necessary to eggify your software base to use buildout.
And buildout, while relying upon a package repository such as the Cheeseshop, is also able to function offline from the net from collections of parts within a cache directory.
What is Buildout?
buildout was conceived by Jim Fulton of Zope Corporation in 2006 and, while often used with the Zope web framework, is completely independent of it.
It draws from a extensible collection of recipes in driving the assembly process and leverages setuptools and eggs in managing Python packages.
Because of its architectural focus, the audience for buildout is more toward developers than the end-user.
buildout is a course-grained build system, differing from fine-grained approaches such as Make, scons and distutils. Those systems focus on individual files and use rules to determine how to compute one from another. Buildout works with larger elements such as applications, configuration files and databases, and uses configuration instead of rules to fit them together.
Rule systems are better used where the sheer number of many low-level elements require taking advantage of regularities to reduce complexity. Configuration systems are better at specifying the one-off relationships when you have relatively few high-level elements.
In one sense it is a better Make but works at a higher level than Make, dealing with large components rather than individual files.
buildout is not ideal for informal experimentation, in that it requires explicit specification of the parts used in an application. This is done in a declarative manner, in that the architect says "what to use", not "how to do it". This makes tweaking to control the low-level process difficult.
Part of using buildout is understanding that anything built by buildout is controlled by it. Temporary hacks to created files will be thrown away on the next build. To make a permanent change, it is necessary to update the buildout configuration.
There can be exceptions to this such as the recipes that manage checkouts. They don't remove checkouts to avoid losing user data. Similarly, the zc.recipes.filestorage recipe doesn't remove data directories it creates on uninstall.
How does Buildout Work?
buildout is a tool that, each time it is run, ...
To accomplish this, buildout ...
Concepts: What is a Specification?
A specification is a text file that itemizes the parts that go into an assembly, names the recipes used by the various parts, and provides to those recipes an open-ended form of configuration.
When stored in a software control repository, it can reproduce an exact deployment or development scenario, upon being checked out and having a build operation invoked upon it.
The format of a specification corresponds to that accepted by ConfigParser, a standard Python module. A specification can be given in a single such file or factored into multiple ones.
Most commonly, there will be multiple specifications for a project, say one for development, one for testing and another for field deployment.
Specifications do more than just list the parts involved. They can place constraints on acceptable versions and provide details on where to automatically download them from, whether the Cheeseshop or project-specific websites.
If a part is removed from a specification, it is uninstalled from the deployment tree. If a part's receipe or configuration changes, the part is uninstalled and reinstalled.
Concepts: What is a Part?
A part is simply something to be managed by a buildout.
It can be almost anything, such as a Python package, a program, a directory, or even a configuration file.
It has a name unique within the specification and its own directory within which it can scribble anything. These scribbles can be referenced by other parts.
Within buildout, a part is an object with an open-ended set of attributes.
It may be installed, updated and uninstalled, over a series of builds. If a part reference is removed from a specification, upon the next invocation of Buildout, it is uninstalled from the deployment tree. If a part's recipe or configuration changes, the part is uninstalled and reinstalled.
A part can reference other parts within the same specification, accessing their attributes, configuration and private directory.
Each part is defined by a recipe, which contains the logic to manage them, along with some data used by that recipe specific to that part.
Concepts: What is a Recipe?
buildout itself is constructed out of recipes, which are objects that know how to install, update and uninstall a type of part.
Receipes are themselves eggs and when one is referenced in a specification, buildout will automatically locate and install the recipe in the buildout environment.
A recipe can contain multiple sub-recipes, accessible as distinct egg entrypoints.
A set of starter recipes ships with the buildout, in the egg named zc.recipe.egg.
The Cheeseshop contains many add-on recipes, if you search for "recipe" in the name or keyword field.
Getting Started: Installing Buildout
The buildout software can be installed system-wide, using easy_install or locally under a project, by running the "bootstrap.py" that is bundled with most existing projects.
The "bootstrap.py" command will:
- create support directories, like bin, eggs, and work, as needed,
- download and install the zc.buildout and setuptools eggs,
Here is an example of setting up an existing project that uses buildout. Note that it takes a while to download and build everything it needs.
The full URL for the example is:
svn://svn.zope.org/repos/main/Sandbox/baijum/z3hello/trunk z3hello
Getting Started: Project Directory Structure
A project directory usually contains a "bootstrap.py" script to help a new developer set up the tree after checking out a project. The file is optional.
The specification for the entire project defaults to "buildout.cfg" but there are often others, such as "deployment.cfg" and "production.cfg".
In the "bin/" directory are the executable scripts that buildout generates from entrypoints within distributions.
The "develop-eggs/" directory holds egg links for software being developed in the buildout. We separate "develop-eggs/" and "eggs/" to allow egg cache directories to be shared across multiple buildouts. For example, a common developer technique is to define a common eggs directory in their home that all non-develop eggs are stored in. This allows larger buildouts to be set up much more quickly and saves disk space.
And the "parts/" directory is contains code and data managed by buildout, or more precisely the recipes that make it up.
If you look hard, you will also find a hidden file named ".installed.cfg", which is where buildout keeps its state of what is currently installed. Do not tamper with it.
And if you did not change the default locations of the cache directories for eggs and tarballs, there will be an "eggs/" and "downloads/" directory. A difference between the two is that those in "eggs/" will be referenced "in-place" while those in "downloads/" will be unpacked into a subdirectory of "parts/".
Getting Started: Project Directory Structure (cont'd)
And of course there are the other files and directories about which buildout is not concerned.
There is usually a "README.txt" file because several tools complain if it is not there. If the build is itself an egg (and not all are), there will also be "setup.py" and "setup.cfg" files.
And there is often a "src/" directory under which the source of your own eggs or checkouts reside.
If the build represents a Zope instance, there may also be a "var/" directory to hold the instance data such as a ZODB, and a "products/" directory to contain Zope Products, which are used in Zope 2.
A question that usually arises with a project is which parts to check into a version control system and which are automatically generated and managed by buildout.
Obviously the two distribution cache directories should not be checked in.
Nor should the "bin/" directory into which buildout places generated scripts, the "develop-eggs/" directory which is really just a collection of egg-links that point into your "src/" directory for work under development, or the "parts" directory under which recipes store somewhat transient data belong to the part they manage.
And if you're running Zope, it is not common to check the "var/" directory in, unless your policy is to store frozen ZODB databases.
And last, the ".installed.cfg" file that buildout uses to keep track of the state of parts should not be checked in. buildout will generate it as needed upon the next build operation.
Getting Started: Specification File (syntax)
Specification files are in the format accepted by the ConfigParser Python module, with variable-definition and substitution extensions. Such a file is broken into [sections]?, where each part has their own section and name.
Within sections are "option = value" lines. A value can be spread across multiple lines by indenting it.
The "buildout" defines the buildout section and is the only required section in the specification file. It is options in this section that may cause other sections to be used.
The "parts = <space-delimited names>" option lists the parts that go into an assembly. Parts that depend on other other parts not specified here will automatically be identified and pulled in as well.
Each part is then further described under its section. The first option described for every part is "recipe=", which identifies the plugin used to manage it. All other options under a part description are dependent upon what that recipe accepts. For the curious, options are passed as keyword arguments to recipe objects.
The recipe "zc.recipe.cmmi" is one that understands how to download a tarball and perform the common sequence of commands: ./configure; make; make install". That installation occurs into the "parts/" directory, into a subdirectory named after the part. The recipe takes a "url=" option that tells it from where to download the archive.
Notice that installation and configuration are treated as separate operations. This is a good policy to folow for buildouts, to among other things, enhance specification reusability in different environments (development, testing, deployment).
The recipe "tau.recipe.odbc" accepts a multiline value and writes it into a file of the name as the option. The value can contain any text, as long as it is indented in the specification.
Getting Started: Specification File (references)
Within a specification file, parts can reference attributes of other parts, such as the "location" of their parts directory. Any "option = value" field can be referenced in this way.
Parts declarations are processed in the order they appear in the specification file, so avoid circular references.
Parts referenced in this manner automatically become dependencies of the reading part. It is the same as putting its name in the buildout parts= option.
Getting Started: Specification File (includes)
Getting Started: Controlling Versions
buildout offers several degrees of control over the versions of parts used for assemblies. These options can be specified either in the per-user $HOME/.buildout/default.cfg or in a per-project buildout specification file. Some policies make more sense in one than the other.
The default mode of operation for buildout is to always try to find the latest distributions that match requirements. Often going over the network, this lookup operation can be very time consuming. The newest option can disable this, so that buildout will use the currently installed eggs as long as they meet the requirements. It also lends a certain stability to the development environment. The -N command-line option also disables it.
When searching for new releases is enabled, the newest available release is used. This isn't usually ideal, as you may get a development release or alpha releases not ready to be widely used. The prefer-final option controls whether to only use the latest final or stable releases.
In buildout version 2, final releases will be preferred by default. You will then need to use a false value for prefer-final to get the newest releases.
In order to give more control over the precise version of distributions used, a versions option can be specified in the buildout section that points to a section that itemizes the versions to be used.
To populate this section, running buildout in verbose mode will print the versions selected of the various distributions.
To insure no versions slip past and are picked automatically, the allow-picked-versions can be used to disable the automatic process and generate an error, giving absolute control over version selection.
Getting Started: About Caches
Normally, when distributions are installed, if any processing is needed, they are downloaded from the internet to a temporary directory and then installed from there. A download cache can be used to avoid the download step. This can be useful to reduce network access and to create source distributions of an entire buildout.
buildout supports two cache locations: one for eggs, and one for tarball archives. Without specifying these options, the default is to use directories "eggs" and "downloads" within each project directory tree.
A cache can be used as the basis of application source releases. In an application source release, we want to distribute an application that can be built without making any network accesses. In this case, we distribute a buildout with download cache and tell the buildout to install from the download cache only, without making network accesses. The buildout install-from-cache option can be used to signal that packages should be installed _only_ from the download cache.
The offline option is related, in that it tells buildout whether it is allowed to search distribution repositories on the network.
Getting Started: How Distributions Are Found
To find distributions, buildout uses the search mechanism built into setuptools, and allows specification of places, in addition to the Cheeseshop, in which to look.
To use an index server other than the Cheeseshop, specify its URL with the --index-url (or index-url = URL) configuration option. There is no provision to have multiple index servers.
NOTE: buildout searches those sites given with --find-links after it searches an index server like the Cheeseshop. setuptools searches in the opposite order.
For installing on non-networked machines, a link server can be represented as simply a directory of eggs or source packages, pointed to with the --find-links* command-line option.
Getting Started: The Command-Line
Any option you can set in the configuration file, you can set on the command-line. Option settings specified on the command line override settings read from configuration files.
| -c config_file | Specify path to the buildout configuration file to be used. This defaults to the file named "buildout.cfg" in the current working directory. |
| -o | Run in off-line mode. This is equivalent to the assignment "buildout:offline=true". |
| -n | Run in newest mode. This is equivalent to the assignment "buildout:newest=true". With this setting, which is the default, buildout will try to find the newest versions of distributions available that satisfy its requirements. |
| -D | Debug errors. If an error occurs, then the post-mortem debugger will be started. This is especially useful for debugging recipe problems. |
Getting Started: Configuration of Defaults
buildout always looks for an initial configuration file under the $HOME directory and loads it before the assembly specification file. The syntax of the two files is identical; anything that can go into a specification file can go into a defaults file.
Notice from this that there are no system-wide settings, like there was with setuptools.
Besides parts information, buildout settings can also go into the per-project assembly specification.
About Recipes
zc.recipe.egg
Installs one or more eggs, along with their dependencies. It installs their console-script entry points with the eggs needed included in their paths.
zc.recipe.testrunner
Generates scripts to run project-specific unit tests over a collection of eggs. The eggs must already be installed (using the zc.recipe.egg recipe).
zc.recipe.zope3checkout
Installs a checkout from the Zope 3 repository.
zc.recipe.zope3instance
Sets up a server instance for running Zope 3.
zc.recipe.filestorage
Create an empty instance of ZODB filestorage and generates a configuration clause in the style of ZConfig for using it.
Recipes: Egg Installation and Scripts
The zc.recipe.egg recipe installs one or more eggs, with their dependencies. It has four sub-recipes that can be references by adding a colon and their name to the recipe= line. The default sub-recipe is "scripts".
The eggs option accepts one or more distribution requirements, one per line. Acceptable versions can be specified. Any dependencies of the named eggs will also be installed.
It is also possible to specify a part-custom "find-links=" list of places to look for distributions as well as the location of a specific index server such as the Cheeseshop.
Recipes: Customizing the Building of Eggs
The ":custom" sub-recipe of zc.recipe.egg provides for custom building of an egg from its source distribution. Sometimes a distribution has extension modules that need to be compiled with special options, such as the location of include files and libraries.
In this example, we have a part representing a non-Python library that needs to be built using the "./configure; make; make install" dance.
And then a part that uses that library to build a Python extension module. Notice how the second part references the location into which the first part was installed.
There is a ":develop" sub-recipe that is similar to ":custom", except that it operates upon develop-eggs that you may be working on. The resulting eggs are placed in the develop-eggs directory because the eggs are buildout specific.
Recipes: Generating Scripts for Eggs
The zc.recipe.egg:scripts recipe scans those eggs specified with "eggs=" for entrypoints of the group "console_scripts" and, for each one found that appears in "scripts=", generates a script, usually in the "bin/" directory, that invokes it. If there is no "scripts=" option, all found entrypoints have a script generated for them.
The "eggs=" option also controls the set of distributions that will be "baked into" or activated within those specific scripts.
The "scripts=" option also permits aliasing a script, by providing an alternate name, after the second '=', for the script file itself. In this case the "rst2" entrypoint will be invoked from a script file named "s5".
The "extra-paths=" option provides directories to be added onto the sys.path for the particular scripts.
If a distribution referenced doesn't use setuptools, it may not have declared in its metadata any entry points. In that case, entry points can be specified in the recipe data, using the "entry-points=" option.
Recipes: How Eggs Are Activated Within Scripts
This is an example of a script generated by buildout, showing how it bakes specific distributions into each script and then invokes code within the egg.
Notice how it differs from a script generated by setuptools, which is more declarative with __requires__ and version constraints, and defers to the entrypoint lookup mechanism.
Recipes: Baking Paths into Python Interpreter Scripts
buildout provides for the generation of scripts to provide an interactive Python prompt with the specified eggs and their dependencies already activated, which is very useful for debugging specific programming scenarios.
This is similar to a script, but uses the "interpreter=" option instead of the "scripts=" option.
Recipes: Pulling Subversion Files into Buildout
This is an example of how to pull into a buildout non-egg content stored under version control. The "iw.recipe.subversion" recipe accepts a list of URLs from which to checkout files and a destination directory name. Those directories are placed under the "parts/clipart-svn/" directory.
In the second part we see a server of some type that knows how to deliver those files to a client, and how it manages to reference those checked-out files.
Recipes: Using a Non-Egg Archive of Python Source
Sometimes a needed distribution comes as a zipfile of just .pyc files, particularly for a proprietary package such as mxODBC. They're not an actual egg, just a directory tree of files to be used as-is.
The "hexagonit.recipe.download" downloads archives in a variety of compression formats and unpacks them underneath the "parts/mxODBC_installation/" directory.
The second part can then reference this directory tree explicitly.
Use Case: Starting a New Project
The first case shows how to start a new buildout, without using virtualenv.
It is suggested that every project that makes use of buildout come bundled with a bootstrap.py file to make it easier for the next developer to get started. bootstrap.py installs the setuptools and buildout distributions into the project directory.
The actual URL for fetching the "bootstrap.py" file is:
http://svn.zope.org/checkout/zc.buildout/trunk/bootstrap/bootstrap.py
The second case show how to start a buildout within a virtualenv sandbox, with complete isolation from the system site-packages.
Use Case: Picking Up a Buildout Project
This is an example of picking up a project that is already packaged for use with buildout.
This particular project already has a "develop=" line in its buildout.cfg that points to the setup.py in the project root. This means that, within the buildout, the package will already be a develop-egg, so that one can begin making changes to the source immediately and have it reflected in the runtime behavior without having to build/install it each time.
The actual URL for the example project is:
svn://svn.zope.org/repos/main/grokapps/Adder Adder
Use Case: Buildout Around Other Projects
Often you run across a package or two that are not buildout-aware but you want to experiment with them inside a buildout sandbox.
This example sets up a sandbox and then brings the outside package into it under the "src/" directory. It may be a checkout or if this buildout is itself going to be stored under version control, the outside package can be a Subversion "extern" checkout. In this manner, checking out the buildout will pull down all the developmental pieces.
Within our buildout, we tell buildout to treat the outside package as a "develop-egg", and reference its distribution name as "modulex".
Use Case: Making Your Project Buildout-Aware
To package your project so that it is buildout-aware, drop a minimal "buildout.cfg" file in the project root, next to the setup.py file.
A common usage of buildout is to support development of a single package along with running tests.
The "develop = ." says to find the "setup.py* file in the current directory and activate it as a development egg, so that I can make changes to it and re-run the tests as I work.
The value of a "develop=" option can be more than one directory, each of which has its own setup.py file.
The location, name and such of this package are provided in that setup.py file, and could be any number of Python packages arranged in any directory structure I choose.
To experiment with an example of this pattern of usage:
$ svn co http://svn.zope.org/zc.ngi/trunk/ zc.ngi
Use Case: Distributing Project as Egg
Use Case: Production Deployment (non-RPMs)
This example uses the "zc.sourcerelease" recipe to cause an entire buildout, including dependencies, to be bundled into a tarball.
Note that the tarball does NOT include an actual Python interpreter, which must already be installed on the destination system to run the "install.py" script.
Use Case: Production Deployment (RPMs)
This example shows how to produce a RPM for installation. It uses the "zc.sourcerelease" recipe to first produce a tarball, and then a hand-made RPM .spec file to turn that into an RPM.
A key part of this for an application like Zope or ZODB is separating a build into software parts and configuration parts.
The software parts are assembled when the source release/rpm is built.
The configuration is done post-install, by invoking scripts within the %build section of the RPM .spec file. The ZODB and Zope 3 recipes were specifically designed to support this separation.
For More Information
Primary Buildout Home Page
EuroPython 2007: Philipp v. Weitershausen
Minitutorial: Introduction to zc.buildout