A Quick Diff between Distutils and Distutils2

Distutils2, also known as Packaging in the Python 3.3 standard library, is
an improved fork of Distutils. Here’s a short review of the main
differences between both codebases.

User Interface

setup.py vs. setup.cfg

All the information that used to be given as parameters to the
distutils.core.setup function in a setup.py script is now moved to the
setup.cfg file. This includes metadata like name, version and author,
Python modules including packages and C/C++ extensions, and data files
accompanying the code.

This important change in favor of static information was done in order to let
a number of packaging tools get access to the distribution information without
having to run untrusted Python code. There are a number of setup scripts in
the wild doing unholy things, which is clearly a hazard for a tool that just
wants to get the project name and version from the distribution. Another way
of putting this is to say that each distribution had its own installer, which
isn’t optimal. This long-awaited change also makes Packaging more similar to
other static info-based packaging tools.

For projects that used to do complicated things in their setup scripts,
hooks can be used. They are Python functions that can be located in an
already-installed module (a collection of build helpers) or in a
project-specific private module and that can customize command, metadata
or distribution objects before or after a command is run. For the
transitional years before Packaging is widespread and used as baseline by
common packaging tools instead of distutils/setuptools, authors of complicated
setup.py scripts will have to refactor their code so that the customization
they want to perform is usable as hooks as well as setup.py code.

setup.py vs. pysetup

Packaging-based projects having no more setup.py scripts anymore, a new
command-line tool is needed. This tool is a Python command-line script named
pysetup. Instead of running python setup.py sdist, you’ll now execute
pysetup run sdist. If you’re asking why run is necessary, it’s because
pysetup provides actions in addition to commands, and therefore needs a way
to avoid name conflicts between actions and commands. Commands are the usual
distutils blocks used to build, distribute and install a project; actions are
a new interface to new Packaging functionality: listing and removing installed
distributions, searching for a project in PyPI or another index, downloading
and installing a project and its dependencies, and so on.

File-level Changes

Without going over all the changes done to the codebase in one year, the
following is an overview of the movement of files between the distutils and
distutils2 codebases.

Changed Modules

Modules with compiler classes are moved to a compiler subpackage, where the
extension module has also been moved. The cmd module has moved to the
command subpackage.

The core module was changed in depth and renamed to run to reflect its new
duties, namely implementing pysetup. One could say that there is no core
anymore: packaging is a collection of public modules as well as a packaging
tool; the fact that pysetup uses run and dist heavily does not make them
more important or more cared for than the other modules.

The filelist module was renamed to manifest and extended to be more useful
when dealing with manifest files.

The version and versionpredicate modules were consolidated into one module,
version. It implements PEP 386, a specification built upon common
setuptools-based practice.

The install command was renamed to install_dist to avoid conflicts with the
new install module. (This may be changed back, as the conflict is no more:
the files live in different directories and don’t conflict on the command
line thanks to the run action.)

The build, build_py and build_scripts command support build-time 2to3
conversion of Python files and doctests. Taken and improved from distribute.

Removed Modules

archive_util, config, debug, dep_util, dir_util, emxccompiler,
log, spawn, sysconfig and text_file have been removed. Some of them
were replaced with Python 2.4+ standard components such as subprocess and
logging; in other cases, one or two still useful functions were moved to the
util module. sysconfig is a top-level module in the 2.7 and 3.2+ standard
libraries. Similarly, the shutil module in these versions also hosts the
functions formerly present in archive_util for everyone to use.

fancy_getopt is due to be replaced with an optparse-based solution. The
API for defining command classes will be preserved, this change is just an
internal refactoring to remove a maintenance burden.

The bdist_rpm command was removed and started a new life as a new project,
py2rpm, so that it can evolve quicker than Python to follow the
modifications of the policies and toolchains of the various RPM-based systems.
The install_egg_info command is gone; it implemented a de-facto standard for
installed distribution, which is now a de-jure standard as PEP 376 and
implemented by the superseding command install_distinfo.

New Modules

The create module is an interactive helper that can create setup.cfg
files (pysetup create) and distutils-compatible setup.py scripts (pysetup
generate_setup
) that get info the setup.cfg at runtime. It started its life
as mkpkg and has seen a lot of improvements since then.

A flexible test command is finally one of the standard commands. It
defaults to using unittest/unittest2 discovery if available, but can be hooked
with any test runner and test suite as needed. Taken from setuptools and
improved.

The upload_docs command was taken from setuptools and integrated into
distutils2 with a few improvements.

A new config module (sharing the name of but having different code than the
former distutils config module) has been added to find and parse
configuration files.

The following modules are designed as building blocks for other packaging and
installation tools.

database implements PEP 376, together with the new install_distinfo
command. This code was originally intended to land in the pkgutil module,
but this was not necessary anymore with the inclusion of Packaging in the
standard library.

depgraph provides a class representing a graph of dependencies between
releases. It does not solve conflicts, but provides base infrastructure for
an installer to solve them.

The former dist.DistributionMetadata has a new name and an improved API as
metadata.Metadata. It supports all three versions of the metadata
specification, can read and write METADATA/PKG-INFO files, can convert its
contents for use with setup.cfg or PyPI, and supports a convenient mapping
interface. Its companion module markers deals with PEP 345 environment
markers.

pypi connects to project indexes such as PyPI to get information and
download distributions.

install uses depgraph and pypi to provide basic functions to download,
install and remove distributions.


In a future article, I’ll tell how distutils and distutils2 work inside out.

Advertisement
Posted in Python development | Tagged , , | 1 Comment

Final Report: A Distutils2 Summer of Code

Summary of my Google Summer of Code

Adding a configure Command

This first task gave me the occasion to study the distutils2 codebase. I saw a lot of good ideas in 4Suite’s DistExt modules and took some inspiration from their config command. At the end of GSoC, the configure command is written, tested and has basic documentation. This command accepts every option that build and install do (failing if there are conflicts in the install options, like install itself does), and writes them in a cache file. The build and install command take options from this file, which saves typing. It’s also possible to run configure build install in the same command line (the internal behavior will get cleaner thanks to some changes I have in mind; Alexis keeps telling me I should document the four or five successive designs I followed for this command.). There is still a nasty bug to fix.

An additional goal is to include the cache file created by the configure command in source or built distributions, so that for example plugins can get information about the build of the project they extend. It would be something similar to the sysconfig module used to get information about the compilation of the Python interpreter, but for Python projects. It would also allow using distutils to install but something else to build, as long as the cache format contains enough information for both parties. (This feature was not implemented during GSoC, for lack of a written description and lack of time.)

Moving Setup Information to a Declarative Format

This second task was as motivating as the first since it shared the goal to make users’ and developers’ lives easier. The first step was to make it possible to describe the metadata in setup.cfg instead of setup.py. The proof of concept was ridiculously easy to write, thanks to the nice distutils2.dist module. For the second step, replacing setup(modules=…, scripts=…) by new sections and fields in setup.cfg, we missed a specification, so I took some time to write a design document describing it all. I have received useful feedback, and peer review is still ongoing.

As is now the rule in distutils2, I’ve started refactoring useful functions into a new module, which will be used by our higher-level code and can also be useful for third-party programs. Those support functions that are not yet written are mostly string operations and a bit of I/O, so it’s not hard but still important enough to deserve thorough tests and good documentation.

With the implementation of those new config file sections, setup scripts will effectively be obsolete. A new module or script shipped with distutils2 (therefore in the standard library when we merge back) will be the new way to run commands. This script is a two-liner that I haven’t committed anywhere yet since it’s, well, two lines.

The resources section was out of scope for this GSoC, since Tarek had said it would require a new PEP. I will volunteer to write the sample implementation required by the PEP when time comes to start it.

Looking Back

In those twelve weeks, I have learned a great deal about distutils2’s internals, most importantly the commands and Distribution class, but also the standalone modules like version or metadata. Looking into 4Suite’s DistExt was also useful, there are gems there that we should take, polish and include. Reading other students’ changesets helped me learn about things that I don’t really know, like dependency graphs or networking. Teamwork was also an opportunity to learn tools like code review sites, use email effectively or dive in the arcane of merge tools.

I really enjoyed test-driven coding, either with unit tests or “run and look at the output” manual tests. The big refactor I did to move code from the build and install commands to configure would not have been possible without all the unit tests for those commands to make me confident. Python is also excellent since it’s natural to write a docstring after the function declaration, so you write doc before the code without thinking about it or having to make it a religion. Last, I enjoyed doing lots of miscellaneous fixes, cleanups and improvements throughout the code base. Fixing pep8 or pyflakes warnings is easy and gives me an occasion to have a glance at modules I don’t know.

Unplanned Tasks

Some time before midterm, Tarek gave push rights to his main repository to Alexis and me, to have more manpower for reviews and pulls. I found myself in the position of repository gatekeeper, answering to review requests, replying to nearly each thread in the mailing list, and pulling from other students’ clones when I judged something ready. It was strange at first to have this power and duty, but I grew confident, I tried to be responsive and fair, and it seems I did a good job. I particularly liked sprinting remotely with the nice people from Montreal Python, being able to answer their questions, and rewarding their volunteer work with a push into the main repo right at the end of the sprint.

Synchronizing the documentation was not the easiest task. In the standard library, some improvements to the documentation of distutils had been removed because of the feature revert and freeze. Ali Aafshar did a bit of Subversion exploring, I completed his work with some missing bits, and we had a starting point. Alexis and I had chatted about the many goals of distutils2 and made plans to reorganize some parts of the docs to address the needs of users wanting to install something, developers wanting to distribute their project, and packaging developers wanting to reuse parts of distutils2 or extend it. Some moves have taken place, there is still a lot to do, but a solid foundation is here.

This extra work I did explains why my mentor was satisfied with me before the final deadline even though my tasks were not finished.

The Future

We have work ahead of us, new features to add, documentation to improve, and I am staying in the project as a volunteer. Reaching out to the Python communities and advocating distutils2 won’t be our easiest task, so the more people the better. While working on distutils2, I’ve also been following Python’s bug tracker, doing triage and reviews, and reading a number of Python mailing lists, with the occasional reply. The feeling of being a part of the Python community and being able to give back is warm and fuzzy, especially when you’re given acknowledgment and trust.

I’m still awed and thrilled to be a Python core developer. I have started making patches to fix bugs in distutils, freeing Tarek to focus on distutils2. Remote teamwork was good, and now that I’ve met Alexis and other fellow Pythonistas in real life and had such good times, I know that even though the university will have to get priority this year, I will stay involved.

Thanks

First, I want to thank Google for this incredible experience we’ve lived. Their Open Source department and especially Carol Smith are doing a great job and deserve warm applause.

I am also deeply grateful to the Python Software Foundation. This summer has been amazing, and I’m very happy to be part of a project with such excellent people, to work on such an excellent programming language.

Tarek Ziadé has been a terrific over-mentor. He took the time to phone me to answer a ton of questions, he helped us keep on track with the weekly meetings, he gave trust and praise. Thank you for everything.

C. Titus Brown is modest and says that he’s not done much to mentor me. Facts don’t matter; I needed guidance more than once, and he was here to give it. Sincere thanks to you.

Many people have supported me: Shashwat Anand, without whom I would not have applied at all, roommates, friends and family. The job has not been easy each week, but it was great to be given a chance even without formal computer science or engineering background, and to get evaluations based on work. It also meant something at a personal level; I’ll talk about that in another blog post.

I’ve saved the place of honor for the other distutils2 students: Alexis, Josip, Konryd, Zubin, Mouad (not in the distutils2 group but keeping in touch weekly). I’ve found the atmosphere in our group (students, volunteers, mentors, experts) excellent, without personal conflicts or flamewars. We worked and had fun, we helped each other, we cared about each other. Sharing this experience with you all was a pleasure and an honor.

Posted in GSoC | Tagged , | 2 Comments

Weekly Report #11: Twice Awesome

Weekly report for the twelfth week of the Google Summer of Code

This week I’ve kept on working on several things.

I have pushed the new docs organization and some improvements to the main repo. Some mkpkg improvements as well as changesets from Konrad are also pushed.

The proposal for the new config file has been sent and is quietly collecting feedback and being edited in reaction. I haven’t started the new config module like I had planned, but it will not take much time or effort to do, it’s mostly string manipulations and a bit of refactor in dist. I have been reading distutils-sig archives from last year, there has been a lot of interesting discussion about setup.cfg.

I sent a report about each of my tasks to my mentor last Monday, and asked him if I should focus on tests and docs as advised by Google or if I could just do a normal work week. He answered that I was doing fine, that what I did besides coding (reviews, merges, doc sync) more than balanced the unfinished code, and that I could do whatever I wanted with no fear about the final evaluation :) So I’ve had a cool week with no stress, I’ve worked as usual and slept a bit more. The deadline is just the end of the paid work, but certainly not the end of my involvement in distutils2. I already have plans for sprints and a nice todo list for August and September. GSoC is sort of already over in my head.

The configure command has some tests that pass and one that fails. I’ll look into it shortly.

My blog is still lagging; for some reason I prefer replying to email and writing code, but I really have to update the blog Really Soon Now™. [edit: done!]

Tarek has proposed to python-dev that I get commit rights to Python’s Subversion repository, to help him maintain the old distutils so that he can focus on distutils2. This week I’ve got the reply. I am now a Python core developer and will co-maintain distutils. Awesome!

Posted in GSoC | Tagged , | 2 Comments

Weekly Report #10: Efficient Multitasking

Weekly report for the eleventh week of the Google Summer of Code

This week I’ve been working on several things, catching up on nearly everything that I was late for.

I attended the Montreal-Python sprint, giving comments and feedback while working on my config file document, and pushed their changes upstream at the end.

I refreshed and pushed most of my changes that had been waiting for three weeks: various cleanups, touchups and fixes. I also switched to using named branches for configure and killsetup, which is a very convenient way to work.

I finished my document about new sections in config files. I’ve had some feedback from Alexis and Fred about the form, so I’ll put another half hour [edit: how optimistic I was!] on it and it’ll be time to request votes and let a thousand arguments blossom.

I replied to review requests from Josip, Alexis, Konrad and Zubin, and pulled the changes that were ready upstream. I’m becoming confident with reviews (thanks Dan!) and push rights, and since Tarek has not shouted yet I guess I’m doing good :)

I decided not to do the docs import and reorganization to prevent merge headaches for my fellow students (even if in the end I did most of the merges ;), which helped a bit. Now that the big pypi rename is done, docs are top priority.

I worked again on configure and moved code from build and install into configure, as discussed with Tarek on the phone all those weeks earlier. This was not as easy as it sounds, but it was a funny debugging experience! Work is ongoing.

I’m starting this week with another sprint, with this task list:

  • update my blog [edit: heh]
  • send an email to call for feedback and votes on the new config file proposal (static metadata and friends)
  • add distutils1 docs to d2 (with the same organization as seen on distutils2.notmyidea.org)
  • add todos in the docs to make sure we update them
  • start working on the new config module (utility functions to support existing and new sections in config files, i.e. implementation of my document)
  • continue working on configure (I need to talk with Tarek)
  • continue improving mkpkg
Posted in GSoC | Tagged , | Leave a comment

Weekly Report #9: Feeling Low

Weekly report for the tenth week of the Google Summer of Code

Report Email

I’m starting to get depressed with those reports, since I have to say that I’ve had problems with concentration/focus again and did not do much last week. I’ve made progress on my document describing new sections in config files, I wanted to have it done Monday but failed.

I’ve had other proposals about code on the ML, without much feedback.

I’m sprinting tonight, and I’ll have a precise tasklist to race the lateness and do some overdue things (finish the static metadata doc, add the docs to the main repo, review changesets).

Reactions

During the weekly meeting, other students cheered me up. I’m also thankful to Fred Drake for a bit of useful private discussion. I decided to have my email client shut off for long periods and be careful with IRC, and also talked about organizational techniques with the Montreal people during the sprint. My mentor offered helpful suggestions in email and told me that writing a design document instead of code was not wasted time—to which I agreed, my point being only that I shouldn’t have been blocked two weeks on that. A few days after the email, I said that the document was ready and that I’d need just a few days to do everything that was late.

Posted in GSoC | Tagged , | Leave a comment

Weekly Report #8: Midterm

Weekly report for the ninth week of the Google Summer of Code

This week I made very little progress on the tasks listed in my last report.

I took part virtually in the Montreal sprint, answered questions, made suggestions, reviewed and pushed their changesets to the main repo.

I have produced more words than code, hopefully not useless words (reviews and discussions about fellow students’ tasks, discussions with the wider fellowship about various important subjects), but my focus should shift more to code.

I applied the distinction Alexis and I had discussed for the documentation (see preview at http://distutils2.notmyidea.org/). This will make the doc easier to navigate for everyone and will also ease merging with Python. If I don’t get negative feedback, I’ll apply it to Tarek’s clone.

In short, I have not been idle, but clearly not focused and productive enough.

I passed the midterm evaluation. The automatic email from Google said “Please contact your mentor to discuss the results of your evaluation and to plan your goals and development plan for the rest of the program.” Well, I suggested “Same thing, continued, but better. :)”, to which Titus replied “yep ;)”. I like our understanding :)

Posted in GSoC | Tagged , | Leave a comment

Weekly Report #7: Ecstatic Metadata

Weekly report for the eighth week of the Google Summer of Code

What I did

  • Continued improving documentation and minor things
  • Added code to support the metadata section in setup.py
  • Started to add tests for that
  • Had good chats about static metadata with clever people, on IRC and the ML
  • Included Alexis in my discussion with Dan Buch, which lead to them getting our Review Board instance up and running again (kudos!)
  • Worked on the proposals in Carl Meyer’s sample-distutils2-project (completes PEP 345, and got reviewed and approved by lots of people at PyCon)
  • Got added to the authorized pushers for Tarek’s repo. Alexis did too.
  • Pushed changesets from Alexis and Jeremy to the main repo
  • surprise Pulled and merged documentation from all students and
    mentors clones into one location, hosted by Alexis on http://distutils2.notmyidea.org/

What I failed

  • My tests use helper functions which have bugs :) I have to fix them before I can actually benefit from the tests
  • I have written much less code than I wanted to
  • I haven’t edited the history of my changesets of last week to make clean diffs out of them and push them (I have more than 30 changesets, I should collapse some of them and reorder them for easier reading)

What I will do next

  • Take part in tonight’s Montréal Hacking sprint
  • Write a ton of tests for static metadata (setup.py-setup.cfg equivalence)
  • Review/add tests for the global section in setup.cfg
  • Write those damn emails and blog posts I should have done two weeks ago
  • Not forget that configure is not finished yet
Posted in GSoC | Tagged , | Leave a comment

Weekly Report #6: New Energy, Old Issues

Weekly report for the seventh week of the Google Summer of Code

This week has started with a phone call with Tarek which answered most of the open questions I add about a lot of things. Thank you for being so helpful and clear!

I have reevaluated my schedule, since I was not sure whether the tasks listed on the wiki were for the whole summer or for the start. Tarek confirmed that the list could be changed, but in my case I think it won’t move: configure is supposed to do a lot more than was explained on the bug report, and killsetup is of course not trivial, so my first idea that I had to do them in six weeks is updated to twelve. :)

The work I’ve been doing this week is mostly small stuff:

  • fixed build of hashlib (all backported modules (hashlib, _hashlib,
    _md5, etc.) now live under d2._backport, preventing C API version
    mistmatch warnings with 2.5+)
  • added try/except blocks to make sure all file handles are closed as
    soon as possible in all Python VMs
  • improved documentation of d2.tests.support (you should use it in your
    tests!)
  • fixed some tests that wouldn’t run directly (python test_thing.py)
  • coordinated fixes to other parts of the tests with Alexis (esp. 2.7
    compatibility)
  • lost sanity trying to find why test_upload_docs does not remove all
    the temporary directories it creates.

I had the great pleasure of welcoming Alexis for a mini-sprint. Code is fun, code with people is better! I look forward to more face-to-face meetings and coordinated work.

I have to confess that I’m late. Apart from being sick yesterday, I have no excuse for that, especially given the fact that all the questions that were blocking me are answered. Actually, I am not good at working alone. When in a team or with a boss, I do stuff, but alone I slack off and can’t force myself to work regularly. I have to fix that.

Posted in GSoC | Tagged , | Leave a comment

Weekly Report #5: Slowdown

Weekly report for the sixth week of the Google Summer of Code

We’ve been sending weekly emails and having weekly IRC meeting but my weblog has not been updated; fixing that now by copying the emails.

This week I’ve continued editing docs for the metadata in setup.cfg feature. I am not technically late, since I had planned to do this in a week or more, but I’ve been slow this week. I explain it with lack of self-discipline, which is not helped by the heat, and for a small part, disappointment of not getting answers to all my questions. That said, I know how busy you all are, and I remain enthusiastic :) I’ll just put the parts where I need feedback on standby and make progress on killsetup.

I’ve also made various changes in my general repo, and already asked for pulling. Alexis has gotten a few changesets from me too.

Posted in GSoC | Tagged , | Leave a comment

Mini-Sprint

Alexis traveled through my city last week-end and came say hi yesterday. We wanted to hold a mini-sprint with people on IRC, here’s a write-up.

We started with a task list and did not finish it :)

First item was fixing the merge between Alexis and upstream. Three-way merges are tricky, and Alexis unintentionally left out some things when he merged. I shared some hard-earned knowledge of vimdiff and we both learned things, e.g. why it is useful that “hg merge” does not commit, and why the “base” pane in three-way merges is not useless. Josip said that merge conflicts are causing him pain too, so I’ll write another piece to share all of that.

Second item was moving _hashlib to _backport._hashlib. It’s just a nicety, not something really important, but we’d gain some knowledge about Extension and build_ext in the process. He or I will do it soon.

Third item was fixing the bug with unittest from Python 2.7. This was caused by a nasty version issue: Alexis thought unittest2 was still monkey-patching os.path to support relpath, but this has been fixed thanks to Konrad, and the fix was straightforward. I shall finish it and ask Tarek for pulling quickly, I just have a nasty loader/makeSuite compatibility kerfluffle to unravel.

Alexis said that we should not only post weekly reports on our weblogs, but also talk about things we’ve learned and explain choices we make. I definitely agree, such posts would be great to get knowledge out of brains into documentation. How many understand how distutils2.core.setup() does its work of parsing config files, command-line options, configuring commands and running them? We all started our summer by reading the code and trying to understand it. It’s totally obvious what Command.set_undefined_options does once you’ve browsed to code to understand it, but then we forget how much time it took. Turning our weeks of learning into documents that would take minutes to read would be a huge win for future distutils2 hackers. Luckily, we still have in our community people who have followed distutils since its inception, so we have a lot of gathered knowledge to turn into doc. Let’s kill the idea that the code is impossible to maintain and improve!

We were not a lot off-topic :) We spoke about operating systems, vim configuration, pythonicity, default arguments, boolean contexts, good bad English, fun advanced Python features, the pressure of the midterm, the fear that distutils2 won’t be ready when it is merged into the stdlib and its development slowed down, Monty Python and the Holy Grail, mentors, window managers, terminals and consoles…

It was not really a sprint since we did not code much, and neither did we talk much on the IRC room, but it was a great learning and bonding time! Looking forward to more face-to-face meetings with fellow Pythonistas.

Read Alexis’ summary for great plans for the documentation!

Posted in GSoC | Tagged , , | 1 Comment