Friday, November 16, 2007

The Apache Cloud

The heavy concentration of Apache developers here at the ApacheCon US in Atlanta got me thinking about how the various Apache projects are related and whether we could come up with some ways to visualize the existing (and emerging) community patterns.

As a quick first step I just took the committer lists of all Apache projects (excluding meta-projects like Jakarta or Incubator) and ran it through an ad-hoc Perl script that identified any pairs of projects that have five or more committers in common. Running those relationships through Graphviz produced the following diagram:

Interesting stuff... Of course the committer lists are not a very accurate source of information as many committers are no longer active in the projects they once contributed to, so I perhaps should be looking at svn commit logs instead, but as a first approximation the above diagram is already quite nice.


  1. Better is to write another script that generates an asf-authorization file for a date range. *makes a mental note to do such a thing if no one else does*

  2. (1) I wonder how the graph would change if you made it using less 5 committers.

    (2) I think that constraining the list to active commiters tells a different story than the complete history. The complete history shows how people have moved around. Also, if limited to to current commiters the dormant projects would simply disappear.

    Rick Hillegas's talk on "Saucer Separation" at Apachecon showed an SVN log reader as part of a toolkit in the next Java DB release. The real cool part is his toolkit allows sql queries directly on these logs...

  3. Jackrabbit is missing too, because it doesn't have 5+ committers in common with any other project.

    There's a version of the graph at with 4+ common committers as the threshold.