May 31, 2008

SOC Report: Week 1

Well the first week of summer of code has finished. This week I spend my time evaluating and testing the various options available to (semi)automatically wrap C code (libsyncml) so that it is accessible from Python. My priorities when evaluating the options go something like,

  1. Capability - the tool should be able to (semi) automatically wrap a large majority of the libsyncml api. Any customizations required in order to make the wrapping more complete should be readable and maintainable by people other than myself.

  2. Documentation availablitly. Follows from #1, can I actually learn and use the tool within the SOC duration.

  3. The wrapping tool is actively developed.

  4. Does not introduce additional runtime dependencies other than the library being wrapped.

  5. Minimal compile time dependencies when creating the bindings.

  6. Community service value (i.e does the selection and use of the tool bring a positive benefit to the FOSS ecosystem greater than the actual library being wrapped).

The following is a list of available options I looked at (see cython for more explanation)

  • Pyrex Produce very nice and clean C file, which you just compile to .so and that's it. Allows to wrap almost any C and C++ code. IDL is python-ish.

  • Cython The same as Pyrex, but some new nice features added

  • SWIG The defacto standard I guess. SWIG is one of the oldest and most mature methods of wrapping C or C++ code into Python (SWIG works for other target languages as well). SWIG produces a C file from an IDL, which gets compiled to a .so, but then it also produces a Python wrapper on top of this. Because Python wrappers are written for you, if their design is not exactly what you want, you end up doing more work to create your final Python API.

  • SIP Similar to SWIG, but only aimed at wrapping C and C++ to Python. Unlike SWIG there is no Python wrapper. Used by PyQT and PyKDE.

  • Boost.Python Writes C++. Not evaluated due to the additional dependencies required.

  • Ctypes Ctypes is included standard in Python 2.5. The IDL is typically a python class hiding the ctypes calls, making the API more pythonic. It allows one to call library functions defined in shared object libraries directory from interpreted Python code.

  • Py++ It generates Boost.Python wrappers. Not evaluated.

  • f2py It's mostly for wrapping fortran files, but it can also wrap C files, even though it's not a very well-known feature. Not evaluated

  • PyD This works like boost.python, but for the D language. Not evaluated.

  • Interrogate This works similar to SWIG. It created dynamic link libraries that can be used both from python and c++ via the Python C API. No other files are needed. Its not very well documented but is used in several commercial mmorpg's and is native to the Panda3d engine. Not evaluated.

  • Robin Insufficient documentation to evaluate.Similar approact to swig, sans the intermediate IDL.

  • PyBindgen The IDL is itself python, and it generates clean readable dependency free C code. Designed for wrapping C++, but has some support for wrapping C libs.

  • pygobject (codegen.py and h2defs.py) The Gobject way, and the way I am most familiar. Unfortunately, in order to wrap the libsyncml library I would first need to wrape it in GObject.

Conclusions

The libsyncml library uses the Gobject mainloop, and custom error types. In order to integrate this with pygtk applications It would need to link to Pygobject/C, and propogate the error types to exceptions.

Somewhat unsurprisingly, the weak point in almost all of these approaches is there documentation. While I like the look of PyBindgen, it is a nightmare to build, and docs are sparse. The SWIG IDL is hairy, and one must also maintain pythonic wrappers to make a nice library. Pyrex and friends do not seem suited to the integration of libsyncml and pygobject without additional C glue

At this stage I am leaning towards SWIG, for community service value (others can come along after and make C# wrappers for instance), its availability of documentation, and even if the IDL is quirky, others are familiar with it.

Distributed Version Control Systems and visibility of development

My opinion on the 'best' DVCS is not relevant. What I am concerned about is that if GNOME does not pick one, and/or provide some sort of hosting or method to track other peoples development branches then the visible activity level, and subsequently health of the whole project will suffer.

The premise here is that centralized version control systems make it easy to follow what developers are working one, and the activity level of development, via the svn-commits mailing list for example.

I can only offer anecdotal evidence here, but I think that the visibility a projects development is just as important as the actual rate of development being done.

  • If developers cannot see what other people are hacking on, then there is the potential for duplication of work, or conflicting implementations.

  • If users do not see people actually doing work, then there is a tendency to assume the project is 'abandoned' or dead. The only thing worst than a 'dead' project is being proclaimed as such when one is not.

I consider the plethora of ways one can follow what developers are doing part of the problem, not part of the solution.

Who has time to follow planet, IRC, github, repo.or.cz, freedesktop git, launchpad.net bzr, mailing lists, twitter, $COMPANY gitweb, $PERSONAL gitweb, $DISTRO viewvc and gnome.org/$USER_HOME_DIRĀ  to see what people are working on.

This post is not meant to be Reductio ad absurdum, its just a slight generalization of why I read planet.gnome.org/svn-commits, etc, etc.

  • Part of the reason is to see what other hackers are up to.

  • Part is academic, to learn techniques and design from some of the great hackers on here.

  • Part is flagrant procrastination.

  • The small remaining part is the keep the voice in my head thats says "you should be using KDE, it appears to be more actively developed" at bay.

Conclusions, if any;

  • planet.ubuntu.com seems to have excellent visibility of active development, even if it doesn't have as many developers as other distributions.

  • freedesktop (via planet.freedesktop.org and http://gitweb.freedesktop.org/) seems to have excellent visibility of development (many people put git branches in their home directories, which are subsequently picked up by gitweb).

  • I am not advocating activity over productivity (obviously we are all free to use the tools which allow us to be the most productive, not just appear the most active). I just think that public FOSS development is an interesting space, in many ways the developers of the products are the marketers of it.

  • GNOME used to have the balance of visibility about right, but I think we are losing that with all this dilution.

Change scares me. That is all.