Jukka Zitting: Jackrabbit

Showing posts with label Jackrabbit. Show all posts

Friday, May 14, 2010

Buzzword conference in June

Like the Lucene conference I mentioned earlier, Berlin Buzzwords 2010 is a new conference that fills in the space left by the decision not to organize an ApacheCon in Europe this year. Going beyond the Apache scope, Berlin Buzzwords is a conference for all things related to scalability, storage and search. Some of the key projects in this space are Hadoop, CouchDB and Lucene.

I'll be there to make a case for hierarchical databases (including JCR and Jackrabbit) and to present Apache Tika project. The abstracts of my talks are:

The return of the hierarchical model

After its introduction the relational model quickly replaced the network and hierarchical models used by many early databases, but the hierarchical model has lived on in file systems, directory services, XML and many other domains. There are many cases where the features of the hierarchical model fit the needs of modern use cases and distributed deployments better than the relational model, so it's a good time to reconsider the idea of a general-purpose hierarchical database.

The first part of this presentation explores the features that differentiate hierarchical databases from relational databases and NoSQL alternatives like document databases and distributed key-value stores. Existing hierarchical database products like XML databases, LDAP servers and advanced filesystems are reviewed and compared.

The second part of the presentation introduces the Content Repositories for the Java Technology (JCR) standard as a modern take on standardizing generic hierarchical databases. We also look at Apache Jackrabbit, the open source JCR reference implementation, and how it implements the hierarchical model.

and:

Text and metadata extraction with Apache Tika

Apache Tika is a toolkit for extracting text and metadata from digital documents. It's the perfect companion to search engines and any other applications where it's useful to know more than just the name and size of a file. Powered by parser libraries like Apache POI and PDFBox, Tika offers a simple and unified way to access content in dozens of document formats.

This presentation introduces Apache Tika and shows how it's being used in projects like Apache Solr and Apache Jackrabbit. You will learn how to integrate Tika with your application and how to configure and extend Tika to best suit your needs. The presentation also summarizes the key characteristics of the more widely used file formats and metadata standards, and shows how Tika can help deal with that complexity.

I hear there are still some early bird tickets available. See you in Berlin!

Wednesday, September 23, 2009

Some graphics work for a change

I've recently spent some effort in improving the look of the Apache Jackrabbit website. I'm no designer, so the results aren't that great, but it's been a nice break from the regular project work. And I got to brush up my Photoshop and Gimp skills.

One part of the effort was creating an icon for the site. Previously the site used the feather icon used as the default on all Apache project sites, but I wanted a Jackrabbit-specific icon that helps me to quickly identify and access Jackrabbit pages among the numerous tabs I usually have open in my browser. The work is a good example of incremental improvements in action:

I started with a copy of the Jackrabbit logo with nice alpha-layered transparent background. It looked great until I noticed that some browsers lost the smooth alpha layer and instead resulted in a rather badly aliased icon seen above.

The straightforward solution was to add a white background as can be seen in step 2. That worked already pretty well in all browsers.

After a few days of watching the icon I found it a bit too blocky to my taste, so I tried to restore some of the nice transparency effect by rounding the corners a bit. I'm pretty happy with the result.

Of course, if you have design talent and think you can do better, go for it!

Tuesday, August 11, 2009

Apache Jackrabbit 1.6.0 released

The Apache Jackrabbit project has just released Jackrabbit version 1.6.0. This release will most likely be the latest JCR 1.0 -based Jackrabbit 1.x minor release before the upcoming Jackrabbit 2.0 and the upgrade to JCR version 2.0. The purpose goal of this release is to push out as many of the recent Jackrabbit trunk improvements as possible so that the number of new things in Jackrabbit 2.0 remains manageable.

The most notable changes and new features in this release are:

The RepositoryCopier tool makes it easy to backup and migrate repositories (JCR-442). There is also improved support for selectively copying content and version histories between repositories (JCR-1972).

A new WebDAV-based JCR remoting layer has been added to complement the existing JCR-RMI layer (JCR-1877, JCR-1958).

Query performance has been further optimized (JCR-1820, JCR-1855 and JCR-2025).

Added support for Ingres and MaxDB/SapDB databases (JCR-1960, JCR-1527).

Session.refresh() can now be used to synchronize a cluster node with changes from the other nodes in the cluster (JCR-1753).

Unreferenced version histories are now automatically removed once all the contained versions have been removed (JCR-134).

Standalone components like the JCR-RMI layer and the OCM framework have been moved to a separate JCR Commons subproject of Jackrabbit, and are not included in this release. Updates to those components will be distributed as separate releases.

Development preview: There are even more JSR 283 features in Jackrabbit 1.6 than were included in the 1.5 version. These new features are accessible through special "jsr283" interfaces in the Jackrabbit API. Note that none of these features are ready for production use, and will be replaced with final JCR 2.0 versions in Jackrabbit 2.0.

This release is the result of contributions from quite a few people. Thanks to everyone involved, this is open source in action!

Saturday, July 18, 2009

JCR 2.0 implementation progress

The JCR 2.0 API specified by JSR 283 has been in Proposed Final Draft (PFD) stage since March, and Apache Jackrabbit developers have been busy implementing all the specified new features and adding compliance test cases for them.

Both the Reference Implementation (RI) and the Technology Compatibility Kit (TCK) of JSR 283 will be based on Jackrabbit code, and we expect the final version of the specification to be released shortly after Jackrabbit trunk becomes feature-complete and the API coverage of the TCK reaches 100%. The following two graphs illustrate our progress on both these fronts.

First a track of all the JCR 2.0 implementation tasks we've filed under the JCR-1104 collection issue. The amount of work per each sub-task is not uniform, so this graph only shows the general trend and does not suggest any specific completion date.

jcr-20-implementation

The second graph tracks the TCK API coverage. We started with the JCR 1.0 TCK, so the first 300-400 method signatures were already covered with few changes to existing test code. Based on Julian's API coverage reports in JSR-2085, this graph tracks progress in covering the 100+ new method signatures introduced in JCR 2.0. Again, the graph is meant to show just a general trend and should not be used to extrapolate future progress.

JCR 2.0 TCK API coverage

Wan't to see JCR 2.0 in action? The latest Jackrabbit 2.0 alpha releases are available for download!

Thursday, January 22, 2009

Apache JCR Commons

In the Apache Jackrabbit project we've decided to create a new JCR Commons subproject for developing and managing the set of generic JCR tools that has grown over time around the core Jackrabbit content repository implementation.

The JCR Commons subproject will to some extent resemble the Apache Commons project, and I'm hoping to use some of the ideas put forward by Henri in his blog post about a "federated commons".

I'm hoping to flesh out the details of this new subproject over the next month or two. It would be nice to have releases of all the new JCR Commons components ready to be used as dependencies for the upcoming Jackrabbit 1.6 release.

Tuesday, December 16, 2008

Changing IT landscape

Here's the latest top five list of countries with most monthly visits to the jackrabbit.apache.org web site:

United States

Germany

China

France

India

India just replaced United Kingdom on the fifth place, and while China is still far from the United States and Germany, it's rapidly closing the gap.

It was very interesting to hear tidbits from the recent Apache Meet Up and BarCamp Beijing events. I heard rumours about a potential followup event next year. I hope I'll find a good excuse to attend...

Sunday, December 7, 2008

Apache Jackrabbit 1.5.0 released

Apache Jackrabbit 1.5.0, the latest and greatest release of the best content repository I know, is now available! Get it from the Jackrabbit web site or through the central Maven repository while it's hot!

The most notable changes since version 1.4 are:

The standalone Jackrabbit server component. The runnable
jackrabbit-standalone jar makes it very easy to start and run
Jackrabbit as a standalone server with WebDAV and RMI access.

Search performance improvements. The performance of certain kinds
of hierarchical XPath queries has improved notably.

Simple Google-style query language. The new GQL query syntax
makes it very easy to express simple full text queries.

Transaction-safe versioning. Mixing transactions and versioning
operations has traditionally been troublesome in Jackrabbit.
This release contains a number of improvements in this area and
has specifically been reviewed against potential deadlock issues.

Clustered workspace creation. A new workspace created in one
cluster node will now automatically appear also in the other
nodes of the cluster.

SPI improvements. The SPI layer introduced in Jackrabbit 1.4
has seen a lot of improvements and bug fixes, and is shaping
up as a solid framework for implementing JCR connectors.

Development preview: JSR 283 features. We have implemented
a number of new features defined in the public review draft of
JCR 2.0, created in JSR 283. These new features are accessible
through special "jsr283" interfaces in the Jackrabbit API. Note
however that none of these features are ready for production use,
and will be replaced with final JCR 2.0 versions in Jackrabbit 2.0.

See the release notes for all the details.

Wednesday, April 16, 2008

File system on steroids

Last week at ApacheCon EU I made a case for content repositories as a general solution for applications that are currently forced to fragment their storage needs due to the different limitations of traditional storage methods, mostly file systems and databases plus more recently cloud services on the network. See below for the presentation:

[slideshare id=352816&doc=file-system-on-steroids-1208198602056184-9&w=425]

It seems like the message was well received, after the presentation I got a lot of positive feedback from people who had previously thought of content repositories as something you'd only use for storing content in a content management system. Instead I see a content repository as a unifying storage layer that can be used for almost anything ranging from traditional content and data to configuration files, user account information, preferences, templates and scripts, source code and binaries, ad-hoc annotations, etc.

Tuesday, January 15, 2008

Apache Jackrabbit 1.4 is available!

I just announced the release of Apache Jackrabbit 1.4. The release is the result of about nine months of development since the 1.3 release, and contains 220 new features, improvements, and bug fixes (plus the 75 bug fixes that had already been backported to 1.3.x patch releases). This is by far the biggest Jackrabbit release to date.

The 1.4 release contains some cool new features:

Friendlier Jackrabbit webapp. The jackrabbit-webapp component now comes with a more polished user interface, better error handling, and improved repository connectivity for local and remote clients.

Object/content mapping framework. The jackrabbit-ocm component maps Java objects to JCR nodes and vice versa, making it possible to persist normal Java objects in a content repository.

Service provider interface for JCR. The jackrabbit-spi component defines an architectural layer below the JCR API. The SPI layer is designed specifically for remote access and outlines a way for us to avoid the performance limitations of JCR-RMI that works on top of JCR.

Optimized storage for binary content. The new DataStore feature in jackrabbit-core avoids all unnecessary copying of binary content and promises huge performance increases for versioning and copying operations. DataStore is a beta-level feature in Jackrabbit 1.4 and disabled by default.

Improved query engine. The jackrabbit-core component has been extended with new features like configurable indexing, synonym and similarity queries, and spell checking. Many typical queries are now noticeably faster than before thanks to numerous performance improvements.

Many thanks to the Jackrabbit development team and the entire community! I'm really proud and excited to be a member of the Apache Jackrabbit project.

PS. Interestingly enough, I built the final 1.4 release candidate exactly two years after I first volunteered to be the release manager for Apache Jackrabbit. The past two years have certainly been interesting time. :-)

Saturday, January 13, 2007

ApacheCon proposals

ApacheCon US 2006 was the first ApacheCon I attended, and I went there mostly to look around and get a feeling of the event. Encouraged by the good reception of my ad-hoc presentations there, I wanted to step up and propose some real sessions for the next ApacheCon. Thus, my proposals for ApacheCon Europe 2007 are:

Up to Speed with Java Content Repository API and Jackrabbit
Joint session with Alexandru Popescu. Targeted for people interested in content management with JCR and Jackrabbit.

Structure and Implementation of Apache Jackrabbit
A walktrough of the Jackrabbit internals. Not just for Jackrabbit developers but for anyone who is interested in seeing a reasonably complex codebase explained using various analysis and diagramming methods (like DSM).

I also proposed a half-day tutorial on JCR content management, and we'll probably arrange an informal Jackrabbit BOF during the event.

Friday, October 13, 2006

Introducing Apache Jackrabbit

The past week at the ApacheCon has been a great opportunity to introduce the Apache Jackrabbit project to the Apache community. I've spoken with a number of people about the project, found areas of possible co-operation, and met with potential Jackrabbit users from various projects and companies. A very successful week.

The Media & Analyst Training tutorial on Tuesday and the Breaking Through the Noise: Visibility for your Open Source Project session on Thursday, both by Sally Khudairi, gave a lot of ideas about better introducing the project to a large audience. The first step in applying those ideas in practice was making the introductory paragraph on the Jackrabbit web site a bit more accessible to first-time visitors. The updated introduction is:

Apache Jackrabbit is a fully conforming implementation of the Content Repository for Java Technology API (JCR). A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more. Typical applications that use content repositories include content management, document management, and records management systems.

I got scheduled for a Feather Cast interview with David Reid. The podcast is available as the Feather Cast Episode 17. I also gave two talks on Jackrabbit, one at the Incubator 15 minute talks (even though Jackrabbit already graduated) on Thursday and one at an ad-hoc press session on Friday. Check out my introductory Apache Jackrabbit slideshow.

Saturday, August 5, 2006

Apache logo guidelines

The Apache Jackrabbit website was recently given a nice face-lift, but the customized "Apache Software Foundation" logo image caused some controversy and was considered unacceptable for two reasons. The first complaint was that it doesn't contain the "The" of "The Apache Software Foundation", and second that the feather was not the standard ASF feather.

Both are valid concerns, and while the designer did add the "The", the feather issue is still not resolved. So for now we're back to using the ASF logo image (shown above) we used before. Unfortunately this image is considerably larger, and causes trouble to the new website layout. Instead of putting the site layout back to the drawing board I set out to find out the acceptable alternatives. Is it OK if we just replace the custom feather with a scaled and rotated version of the official feather? What are the branding and logo usage guidelines? What exactly is the official ASF feather?

Unfortunately there are no existing ASF logo guidelines, and in practice the logo usage varies wildly between Apache projects. In fact I couldn't even find out what actually is the official trademarked ASF feather, since the one in the above logo image is quite different from the version (shown right) used in a customized format for example on the "canonical" Apache HTTPD web site.

Apache Geronimo logo

There's also great variance in the ways different Apache projects display their affiliation with the ASF. The most notable example are the iBatis project, that mentions the foundation only at the very bottom of their front page, and mod_perl, that mentions the foundation but doesn't display any version of the feather logo. The Xerces and Geronimo projects hide the ASF feather into a blue background, and projects like Lenya, SpamAssassin, and TCL customize the ASF feather in various ways in their project logos. This variance is not necessarily bad, in fact it shows a good level of innovation in using the ASF brand :-), but I'm not sure if this wild practice is too good either.

Apache Lenya logo

It has been noted on various forums that Apache is a strong brand that really brings value and recognition to individual Apache projects. Thus I think it is in the foundation's interest to guard and strengthen that brand, including the visual identity bound to the Apache feather. The reaction to the custom feather introduced by the Jackrabbit site face-lift is a sign of such a drive, but the effort falls short if there are no guidelines to direct projects and external parties to the correct use of the feather logo.

Apache SpamAssassin logo

I think the following are the key questions to answer when creating logo usage guidelines for Apache projects:

Should an Apache project display the Apache Software Foundation logo?

If yes, what are the standard logo images to use?

Is it acceptable to customize the standard logo images?

If yes, what are the accepted customizations (scaling, rotating, drop shadow, text overlays, background color, background image, etc.)?

Apache TCL logo

There are also unanswered questions on whether and how an Apache project should incorporate the ASF feather into a project logo:

May an Apache project logo contain the Apache feather?

If yes, should an Apache project logo contain the Apache feather?

If yes, what is (are) the standard feather(s) to use?

Is it acceptable to customize the standard feather(s)?

If yes, what are the acceptable customizations?

Sunday, July 30, 2006

JUnit tests for known issues, part 3

My quest for a way to handle known issues as JUnit tests seemed already finished, when Marcel Reutegger, a committer of the Apache Jackrabbit project and a developer of the Technology Compatibility Kit (TCK) of JSR 170, displayed some serious JUnit-fu by pointing to the TestResult class in JUnit.

It turns out that JUnit creates a TestResult instance for each test being run and uses that instance to store the test results. It is possible to customize the TestResult being used by overriding the TestCase.createResult() method. You can then decide in TestResult.addFailure(Test, AssertionFailedError) and TestResult.addError(Test, Throwable) whether to skip some failure or error reports. This is what we ended up doing in Jackrabbit.

Digging deeper along these lines I found out that you could actually implement similar functionality also directly in a TestCase subclass, thus avoiding the need to override TestCase.createResult(). The best way to do this is to override the TestCase.runBare() method that gets invoked by TestResult to run the actual test sequence. The customized method can check whether to skip the test and just return without doing anything in such cases.

I implemented this solution as a generic JUnit 3.x ExcludableTestCase class, that you are free to copy and use under the Apache License, version 2.0. The class uses system properties named junit.excludes and junit.includes to determine whether a test should be excluded from the test run. Normally all tests are included, but a test can be excluded by including an identifier of the test in the junit.excludes system property. An exclusion can also be cancelled by including a test identifer in the junit.includes system property. Both system properties can contain multiple whitespace-separated identifiers. See the ExcludableTestCase javadocs for more details.

You can use this class by subclassing your test cases from ExcludableTestCase instead of directly from TestCase:

package my.test.package;
public class MyTestCase extends ExcludableTestCase {
    public void testSomething() {
        // your test code
    }
}

You can then exclude the test case with -Djunit.excludes=my.test.package, -Djunit.excludes=MyTestCase, or -Djunit.excludes=testSomething or a combination of these identifiers. If you've for example excluded all tests in my.test.package, you can selectively enable this test class with -Djunit.includes=MyTestCase.

You can also add a custom identifiers to your test cases. For example, if your test case was written for Bugzilla issue #123, you can identify the test in the constructor like this:

    public MyTestCase(String name) {
        super(name);
        addIdentifier("#123");
    }

Then you can exclude tests for this issue with -Djunit.excludes=#123.

Wednesday, July 19, 2006

JUnit tests for known issues, part 2

A few days ago I considered different options for including known issue test cases (ones that you expect to fail) in a JUnit test suite in a way that wouldn't make the full test suite fail. I decided to adopt a solution that uses system properties to selectively enable such known issue test cases. Here's how I implemented it for Apache Jackrabbit using Maven 1 (we're currently working on migrating to Maven 2, so I'll probably post Maven 2 instructions later on).

The first thing to do is to make the known issue tests check for a system property used to enable a test. The example class below illustrates two ways of doing this; either to make the full known issue test code conditional, or to add an early conditional return to skip the known issue. You can either use a single property like "test.known.issues" or different properties to allow fine grained control over which tests are run and which skipped. I like to use the known issue identifier from the issue tracker as the controlling system property, so I can selectively enable the known issue tests for a single reported issue.

public class ExampleTest extends TestCase {

    public void testFoo() {
        if (Boolean.getBoolean("ISSUE-foo")) {
            // test code for "foo"
        }
    }

    public void testBar() {
        if (!Boolean.getBoolean("ISSUE-bar")) {
            return;
        }
        // test code for "bar"
    }

}

Once this instrumentation is in place, the build system needs to be configured to pass the identified system properties to the code when requested. In Maven 1 this happens through the maven.junit.sysproperties setting in project.properties:

maven.junit.sysproperties=ISSUE-foo ISSUE-bar
ISSUE-foo=false
ISSUE-bar=false

This way the known issue tests will be skipped when normally running "maven test", but can be selectively enabled either on the command line ("maven -DISSUE-foo=true test") or by modifying project.properties or build.properties.

Sunday, July 16, 2006

Volunteering to mentor Graffito

Raphaël Luta, mentor of the incubating Apache Graffito project, recently asked for volunteers to help him mentor the Graffito project. Encouraged by my recent ASF membership and the fact that I had been keeping an eye on the project for quite a while, I decided to volunteer.

Before formal appointment as a mentor I've introduced myself on the Graffito mailing list and started getting more familiar with the project. I'm especially interested in the Object/Content Mapping (OCM) framework JCR-mapping that is included as a Graffito subproject, but also on the other parts of the project. Having worked on various content management systems for over ten years since our first pre-Midgard site experiments (including a custom SGML vocabulary mapped to early HTML with DSSSL) with the Greywolves, I'm still eager to learn new approaches like Graffito.

Graffito is both a framework for building content applications and a set of existing portlet components. Although the project is related to Apache Jetspeed-2, the portlets should work on any compliant portal engine. The framework components can also be used outside the portlet model. The project aims also for independence from the underlying content storage, using a generic "Graffito Core Model" and itsderivates as the abstract content model and "Content Server" abstraction for the content storage layer.

The current default content server implementation is based on Apache OJB and runs on top of relational databases. The JCR-mapping subproject is planned to be used for a similar task on top of JCR content repositories, especially Apache Jackrabbit. Just like OJB is not a subproject of Graffito, we've had initial discussion about possibly moving the JCR-mapping project outside Graffito. Making it a Jackrabbit subproject is an interesting alternative, especially if we want to target for an eventual JCR federation within the ASF, but for now I think it's best to get the communities together and see what patterns emerge before making final decisions on what to do with thesubproject.

Monday, July 10, 2006

UMLet 7

org.apache.jackrabbit.extractor class diagram

Ever since learning UML back in 1998 I've been looking for decent UML tools that best suit my rather ad-hoc diagramming style. Even though I've occasionally used them, I've never really enjoyed the heavyweight, round-tripping, IDE-integrated (even IDE-embedding!) modelling monoliths that most of the UML tools seem to evolve into sooner or later. My reasons for using UML are documenting existing code and discussing new ideas, almost never to actually implement anything. I usually also work in highly heterogeneous settings with co-developers using a wide variety of tools and development environments. Adapting to a do-all-be-all UML tool is in many cases simply impossible or at least quite difficult.

org.apache.jackrabbit.core.query TextFilter class diagram

Thus I've actively stayed away from the high-end offerings and focused more on the low-end alternatives like Dia and the most popular UML tool in the world, MS PowerPoint. However they never felt really natural, being either too inflexible or requiring too much manual work especially when rearranging diagrams. Luckily a few years ago, while doing my yearly lookout for better development tools, I stumbled upon UMLet, a lightweight open source UML diagram editor that has a rather original but very flexible and convenient user interface. It even works as a drop-in plugin for Eclipse.

org.apache.jackrabbit.core.query.lucene TextExtractor class diagram

A few weeks ago after upgrading to Eclipse 3.2, I went looking for an UMLet upgrade and was happy to find version 7 available for download. The new version has nice new features like color and transparency support, new diagram types, and various user interface improvements like improved mouse selection support. Warmly recommended.

The attached class diagrams were quickly created using UMLet 7 to describe the structure of a mid-sized patch I sent for consideration as part of the Jackrabbit issue JCR-415.

Saturday, July 8, 2006

JUnit tests for known issues

A few months ago I started working on the Jackrabbit improvement issue JCR-325. Following a good practice, the first thing I did was create a test case for the missing functionality. However, this breaks another good practice of always passing 100% of the test suite. This and a few other know issue tests are currently causing the Jackrabbit test suite to fail even without any real problems, making it more difficult to check whether a recent change or an experimental patch breaks things.

To fix the situation I started wondering if there was a JUnit version of the TODO blocks in Perl's Test::More. The problem is that JUnit can only report tests as successful or failing (or erroneous if they throw an exception), there is no way to easily mark test failures as TODOs. Googling around and asking the Jackrabbit mailing list produced some workarounds:

Use a system property to determine whether to perform or skip the known issue test cases.

Put the known issues tests in separate test case classes and exclude them from the test suite.

Use a JUnit addon to ignore marked test cases as explained in an article that discusses this same issue.

Use an alternative test framework like TestNG, that has this functionality built-in.

I didn't want to start changing the entire test suite or even tweaking the build environment, so the last two options were out. I also wanted to make the setup easily configurable so a developer can selectively enable testing for a known issue, thus the first alternative of using a system property looks like the best solution. It seems that the Apache OJB project has reached the same solution.

Thursday, June 29, 2006

ASF membership

A few days ago I was invited to become a member of the Apache Software Foundation! I'm greatly honored by the invitation that acknowledges my "commitment to collaborative open-source software development, through sustained participation and contributions within the Foundation's projects".

This is definitely one of the highlights of my career. :-)

Wednesday, February 15, 2006

The road to Jackrabbit 0.9

I recently volunteered to work as the release manager for the 0.9 and 1.0 releases of the incubating Apache Jackrabbit project. The 0.9 part of this task was finished yesterday when I officially announced the release of Apache Jackrabbit version 0.9. The process leading up to the release announcement has been an interesting and educating one.

I first got involved with the Jackrabbit project in November 2004 when I announced an initial version of the JCR-RMI layer I had been working on. Based on positive feedback I started getting more and more active on the mailing list and continued working on JCR-RMI. My work was rewarded with committer status almost exactly a year ago in February 2005. Since then I've been working on various parts of Jackrabbit although I have still to touch the deep internals of Jackrabbit core. I got another reward for my work when I was invited to join the expert group of JSR 283: Content Repository for Java^TM Technology API Version 2.0.

Talk of making a Jackrabbit 1.0 release started sometime in summer 2005 after the JSR 170 had stabilized the JCR API. There was however not too much activity towards a release so I decided to start pushing things in late September. This sparked some discussion and concerns about some of the open issues. The Jira roadmap for 1.0 started to grow bigger and bigger as more issues were filed. I finally solved this problem in November by setting up a separate 1.1 goal for the more complex issues.

Despite these efforts and a lot of ongoing development work there still wasn't too much happening toward an initial release. This situation changed in January when Roy suggested that a 1.0 release plan might be a good thing to have. I volunteered to act as the release manager and based on the resulting discussions I published the Jackrabbit 0.9 release plan in late January. The idea (as suggested by Roy) was to first make a 0.9 release to streamline the release process for a 1.0 release soon after 0.9.

After working a while with making the build environment better fit the release needs I announced a 0.9 release candidate. Making the release candidate had me reading through a number of release guidelines on the various Apache websites and refreshing my GnuPG skills. All Apache releases are digitally signed so I needed to set up my own key. I also got my key signed by Roy, so I expect my key to show up in the Apache web of trust in a few days.

There was only one issue that got fixed between the first release candidate and the version I posted for the official release vote. The release vote passed with no objections, but then Roy suggested a few packaging changes before endorsing the release as required by the Incubator policy. I made the changes, and since no source files were modified there was no need to restart the release vote. Roy then proceeded to request and receive approval for the release from the Incubator PMC.

After the approval we placed the release under cvs.apache.org where incubating projects are allowed to publish their releases. I then updated the Jackrabbit web site and sent the release announcement. The next step the more challenging 1.0 release on which I'll start focusing tomorrow.

Saturday, January 7, 2006

Analyzing the Jackrabbit architecture with Lattix LDM

Tim Bray pointed to David Berlind who pointed to the Lattix company. Lattix makes a tool called Lattix LDM that uses a Dependency Structure Matrix to work with software architecture. I watched the nice Lattix demo and decided to try the software out.

After receiving my Community license and struggling for a while to get the software running on Linux (need to include both the jars and the Lattix directory in the Java classpath!) I loaded the latest Jackrabbit jar file for analysis. The dependency matrix of the top-level packages after an initial partitioning is shown below:

The matrix contains all the package dependencies. A number in a cell of the matrix tells how many dependencies the package on the vertical column has on the package on the horizontal row. You can tell how widely a package is used by reading the cells on the package row. The package column identifies the other packages that the selected package uses or depends on. In general a healthy architecture only contains dependencies located below the diagonal.

The packages 2-6 form the general purpose Jackrabbit commons module, while the more essential parts of the Jackrabbit architecture are found within the core module. I grouped the commons packages and expanded the core module to get a more complete view of the Jackrabbit internals:

There was no immediate structure appearing, so I used the automatic DSM partitioning tool on the core module to sort out the package dependencies:

Jackrabbit core after initial partitioning

The value, config, fs, and util packages form a lower utility layer and the jndi package a higher deployment tool layer. The most interesting part lies between those layers, in the large interdependent section in the middle. The key to the architecture seems to be the main core package that both uses and is used by other packages in this section. I opened a separate view for examining the contents of the main core package:

Partitioning classes within the Jmain core package

The partitioning suggests that it might make sense to split the package in two parts. Without concern for semantic grouping, I just grouped the classes in the upper half as core.A and the classes in the lower half as core.B. This seems useful as the core.B package seems to be a bit better in terms of external dependencies:

Jackrabbit core after splitting the main core package

Running the package partitioning again, I got a bit more balanced results although the main section still is heavily interdependent:

Jackrabbit core partitioning after splitting the main core package

Looking at the vertical columns it seems like the main culprits for the interdependencies are the nodetype, state, version, and the virtual core.A packages. Both the nodetype and state package contain subpackages so I wanted to see if the dependencies could be localized to just a part of those packages:

Contents of the Jackrabbit state and nodetype packages

This is interesting, the interdependencies for the state package are for the main state package, while the nodetype interdependencies only affect the nodetype.virtual subpackage. I split both packages along those dependency relations,and partitioned the core module again:

Jackrabbit core partitioning after splitting the state and nodetype packages

The persistence managers in the state subpackages are now outside the main section just like the non-virtual nodetype classes. After a short while of further research on the dependencies I found that the partitioning of the main state package would suggest that the item state managers be split to a separate package:

After creating a new statemanager package for containing the item state managers, the partitioning of the core module starts to look better. The only remaining circular dependencies are for the virtual core.A and core.B packages:

Jackrabbit core partitioning after moving the state managers into a new statemanager packate

Looking at the virtual core.B package we find that only the NodeId, PropertyId, and ItemId classes depend on the state package:

In fact it seems that it might make sense to move the classes there. After doing that the core module partitioning looks even better:

Jackrabbit core partitioning after moving the ItemId classes to the state package

The only remaining source of cyclic dependencies is the virtual core.A package into which I wont be going any deeper at this moment. Even now the analysis seems to have provided a number of suggestions for reducing the amount of cyclic dependencies and thus the improving the design quality of the Jackrabbit core:

Split the main core package into subpackages

Move the nodetype.virtual package to a higher level

Move the state subpackages to a separate package

Make a separate package for the item state managers

Move the NodeId, PropertyId, and ItemId classes to the state package

Note that these suggestions are just initial ideas based on a quick walkthrough of the Jackrabbit architecture using a Dependency Structure Matrix as a tool. As such the approach only gives structural insight to the architecture, and for this short analysis I didn't much include knowledge about the semantic roles and relationships of the Jackrabbit classes and packages.