Jukka Zitting: 2006

Tuesday, December 12, 2006

Excluding the publish date from Maven 2 sites

The Maven 2 site plugin includes by default a "Last Published" note on each HTML page. This is a nice feature for sites that are directly deployed to the server, but a bit troublesome for projects like Jackrabbit that first commit the generated site to version control before publishing it. The "Last Published" note tags all generated pages as modified even if no content changes are made.

To solve this issue I looked at the <publishDate/> configuration option in the site descriptor, but the only documented options for specifying the date location are left, right, navigation-top, navigation-bottom, and bottom. There is no option to simply disable the date being inserted in the document.

Looking at the default page template I however noticed that it's possible to trick the site not to include the publish date by using a dummy location like "none":

<publishDate position="none"/>

Works for me. :-) I also requested documenting this to avoid the hack getting broken in some future Maven version.

Friday, October 13, 2006

Introducing Apache Jackrabbit

The past week at the ApacheCon has been a great opportunity to introduce the Apache Jackrabbit project to the Apache community. I've spoken with a number of people about the project, found areas of possible co-operation, and met with potential Jackrabbit users from various projects and companies. A very successful week.

The Media & Analyst Training tutorial on Tuesday and the Breaking Through the Noise: Visibility for your Open Source Project session on Thursday, both by Sally Khudairi, gave a lot of ideas about better introducing the project to a large audience. The first step in applying those ideas in practice was making the introductory paragraph on the Jackrabbit web site a bit more accessible to first-time visitors. The updated introduction is:

Apache Jackrabbit is a fully conforming implementation of the Content Repository for Java Technology API (JCR). A content repository is a hierarchical content store with support for structured and unstructured content, full text search, versioning, transactions, observation, and more. Typical applications that use content repositories include content management, document management, and records management systems.

I got scheduled for a Feather Cast interview with David Reid. The podcast is available as the Feather Cast Episode 17. I also gave two talks on Jackrabbit, one at the Incubator 15 minute talks (even though Jackrabbit already graduated) on Thursday and one at an ad-hoc press session on Friday. Check out my introductory Apache Jackrabbit slideshow.

Saturday, August 5, 2006

Apache logo guidelines

The Apache Jackrabbit website was recently given a nice face-lift, but the customized "Apache Software Foundation" logo image caused some controversy and was considered unacceptable for two reasons. The first complaint was that it doesn't contain the "The" of "The Apache Software Foundation", and second that the feather was not the standard ASF feather.

Both are valid concerns, and while the designer did add the "The", the feather issue is still not resolved. So for now we're back to using the ASF logo image (shown above) we used before. Unfortunately this image is considerably larger, and causes trouble to the new website layout. Instead of putting the site layout back to the drawing board I set out to find out the acceptable alternatives. Is it OK if we just replace the custom feather with a scaled and rotated version of the official feather? What are the branding and logo usage guidelines? What exactly is the official ASF feather?

Unfortunately there are no existing ASF logo guidelines, and in practice the logo usage varies wildly between Apache projects. In fact I couldn't even find out what actually is the official trademarked ASF feather, since the one in the above logo image is quite different from the version (shown right) used in a customized format for example on the "canonical" Apache HTTPD web site.

Apache Geronimo logo

There's also great variance in the ways different Apache projects display their affiliation with the ASF. The most notable example are the iBatis project, that mentions the foundation only at the very bottom of their front page, and mod_perl, that mentions the foundation but doesn't display any version of the feather logo. The Xerces and Geronimo projects hide the ASF feather into a blue background, and projects like Lenya, SpamAssassin, and TCL customize the ASF feather in various ways in their project logos. This variance is not necessarily bad, in fact it shows a good level of innovation in using the ASF brand :-), but I'm not sure if this wild practice is too good either.

Apache Lenya logo

It has been noted on various forums that Apache is a strong brand that really brings value and recognition to individual Apache projects. Thus I think it is in the foundation's interest to guard and strengthen that brand, including the visual identity bound to the Apache feather. The reaction to the custom feather introduced by the Jackrabbit site face-lift is a sign of such a drive, but the effort falls short if there are no guidelines to direct projects and external parties to the correct use of the feather logo.

Apache SpamAssassin logo

I think the following are the key questions to answer when creating logo usage guidelines for Apache projects:

Should an Apache project display the Apache Software Foundation logo?

If yes, what are the standard logo images to use?

Is it acceptable to customize the standard logo images?

If yes, what are the accepted customizations (scaling, rotating, drop shadow, text overlays, background color, background image, etc.)?

Apache TCL logo

There are also unanswered questions on whether and how an Apache project should incorporate the ASF feather into a project logo:

May an Apache project logo contain the Apache feather?

If yes, should an Apache project logo contain the Apache feather?

If yes, what is (are) the standard feather(s) to use?

Is it acceptable to customize the standard feather(s)?

If yes, what are the acceptable customizations?

Sunday, July 30, 2006

JUnit tests for known issues, part 3

My quest for a way to handle known issues as JUnit tests seemed already finished, when Marcel Reutegger, a committer of the Apache Jackrabbit project and a developer of the Technology Compatibility Kit (TCK) of JSR 170, displayed some serious JUnit-fu by pointing to the TestResult class in JUnit.

It turns out that JUnit creates a TestResult instance for each test being run and uses that instance to store the test results. It is possible to customize the TestResult being used by overriding the TestCase.createResult() method. You can then decide in TestResult.addFailure(Test, AssertionFailedError) and TestResult.addError(Test, Throwable) whether to skip some failure or error reports. This is what we ended up doing in Jackrabbit.

Digging deeper along these lines I found out that you could actually implement similar functionality also directly in a TestCase subclass, thus avoiding the need to override TestCase.createResult(). The best way to do this is to override the TestCase.runBare() method that gets invoked by TestResult to run the actual test sequence. The customized method can check whether to skip the test and just return without doing anything in such cases.

I implemented this solution as a generic JUnit 3.x ExcludableTestCase class, that you are free to copy and use under the Apache License, version 2.0. The class uses system properties named junit.excludes and junit.includes to determine whether a test should be excluded from the test run. Normally all tests are included, but a test can be excluded by including an identifier of the test in the junit.excludes system property. An exclusion can also be cancelled by including a test identifer in the junit.includes system property. Both system properties can contain multiple whitespace-separated identifiers. See the ExcludableTestCase javadocs for more details.

You can use this class by subclassing your test cases from ExcludableTestCase instead of directly from TestCase:

package my.test.package;
public class MyTestCase extends ExcludableTestCase {
    public void testSomething() {
        // your test code
    }
}

You can then exclude the test case with -Djunit.excludes=my.test.package, -Djunit.excludes=MyTestCase, or -Djunit.excludes=testSomething or a combination of these identifiers. If you've for example excluded all tests in my.test.package, you can selectively enable this test class with -Djunit.includes=MyTestCase.

You can also add a custom identifiers to your test cases. For example, if your test case was written for Bugzilla issue #123, you can identify the test in the constructor like this:

    public MyTestCase(String name) {
        super(name);
        addIdentifier("#123");
    }

Then you can exclude tests for this issue with -Djunit.excludes=#123.

Thursday, July 20, 2006

Jira tips and tricks

Having just filed a number of Graffito issues in the Jira installation at the ASF, I figured it might be useful if I shared some of the tips and tricks I've learned as a relatively heavy user of Jira. Here comes:

You can personalize your Dashboard page to show all sorts of custom reports and statistics. I've configured mine to show summary statistics for all the projects I'm involved with and to list all the open issues assigned to me along with some generally useful links.

When navigating back and forth over a number of related issues, click the "History" link on the top-right corner to access all the recently visited issues.

You can configure the number of issues listed per page by editing your preferences on your profile page. I've set my preferences to 100 issues per page to avoid having to page back and forth over long issue lists.

On the same preference editor you can also disable the feature that sends you an email notification of all the actions you've made on the web interface. I usually get those notifications anyhow through the issue mailing lists, so there's no need for duplicates.

The "Road Map" tab on a project page gives a nice overview of the TODO's for an upcoming release.

There's a "Subversion commits" tab on the issue pages that lists all commits whose commit message includes the key of the respective issue. There are even links to the ViewVC interface. Very useful! The "All" tab shows you both the issue comments, metadata changes, and Subversion commits on a single time-line.

When opening an issue from the Issue Navigator, the upper right corner of the issue contains a small box that allows you to "Return to search" or to jump directly to the "Previous" or "Next" issue on the list.

You can "Configure" the Issue Navigator to customize the set of columns it displays. I've replaced the default "Bugzilla Id" column with the "Components" column.

You can customize any issue search by clicking "Edit" in the top left corner of the Issue Navigator. You can also save such customized filters for future use.

You can subscribe the results of any query in your favourite feed aggregator.

There are probably a ton of other nice features that I haven't yet noticed.

Wednesday, July 19, 2006

JUnit tests for known issues, part 2

A few days ago I considered different options for including known issue test cases (ones that you expect to fail) in a JUnit test suite in a way that wouldn't make the full test suite fail. I decided to adopt a solution that uses system properties to selectively enable such known issue test cases. Here's how I implemented it for Apache Jackrabbit using Maven 1 (we're currently working on migrating to Maven 2, so I'll probably post Maven 2 instructions later on).

The first thing to do is to make the known issue tests check for a system property used to enable a test. The example class below illustrates two ways of doing this; either to make the full known issue test code conditional, or to add an early conditional return to skip the known issue. You can either use a single property like "test.known.issues" or different properties to allow fine grained control over which tests are run and which skipped. I like to use the known issue identifier from the issue tracker as the controlling system property, so I can selectively enable the known issue tests for a single reported issue.

public class ExampleTest extends TestCase {

    public void testFoo() {
        if (Boolean.getBoolean("ISSUE-foo")) {
            // test code for "foo"
        }
    }

    public void testBar() {
        if (!Boolean.getBoolean("ISSUE-bar")) {
            return;
        }
        // test code for "bar"
    }

}

Once this instrumentation is in place, the build system needs to be configured to pass the identified system properties to the code when requested. In Maven 1 this happens through the maven.junit.sysproperties setting in project.properties:

maven.junit.sysproperties=ISSUE-foo ISSUE-bar
ISSUE-foo=false
ISSUE-bar=false

This way the known issue tests will be skipped when normally running "maven test", but can be selectively enabled either on the command line ("maven -DISSUE-foo=true test") or by modifying project.properties or build.properties.

Sunday, July 16, 2006

Volunteering to mentor Graffito

Raphaël Luta, mentor of the incubating Apache Graffito project, recently asked for volunteers to help him mentor the Graffito project. Encouraged by my recent ASF membership and the fact that I had been keeping an eye on the project for quite a while, I decided to volunteer.

Before formal appointment as a mentor I've introduced myself on the Graffito mailing list and started getting more familiar with the project. I'm especially interested in the Object/Content Mapping (OCM) framework JCR-mapping that is included as a Graffito subproject, but also on the other parts of the project. Having worked on various content management systems for over ten years since our first pre-Midgard site experiments (including a custom SGML vocabulary mapped to early HTML with DSSSL) with the Greywolves, I'm still eager to learn new approaches like Graffito.

Graffito is both a framework for building content applications and a set of existing portlet components. Although the project is related to Apache Jetspeed-2, the portlets should work on any compliant portal engine. The framework components can also be used outside the portlet model. The project aims also for independence from the underlying content storage, using a generic "Graffito Core Model" and itsderivates as the abstract content model and "Content Server" abstraction for the content storage layer.

The current default content server implementation is based on Apache OJB and runs on top of relational databases. The JCR-mapping subproject is planned to be used for a similar task on top of JCR content repositories, especially Apache Jackrabbit. Just like OJB is not a subproject of Graffito, we've had initial discussion about possibly moving the JCR-mapping project outside Graffito. Making it a Jackrabbit subproject is an interesting alternative, especially if we want to target for an eventual JCR federation within the ASF, but for now I think it's best to get the communities together and see what patterns emerge before making final decisions on what to do with thesubproject.

Missing API details in Java: Null references

Even though Java APIs tend to be well documented thanks to the Javadoc, there are some details that are quite often missing, causing developers to program by coincidence. One of the main issues is the handling of null references.

Although there are guaranteed to be no dangling references on the Java platform, a reference can still be null and cause the infamous NullPointerException (aka NPE) when passed to an unwary piece of code. Null references are very convenient in expressing the absense of something, but I these special cases are often not well documented. There are three main cases of null references that are commonly used but seldom documented: optional arguments, member variables, and return values.

Optional arguments

Instead of overloading a method name to cover the case where one or more of the arguments are unavailable, the method can allow some of its arguments to be null. This is especially common for constructors that allow optional configuration options. This practice is otherwise very convenient, but disturbingly often not documented, leaving the client developer to wonder whether it is OK to pass a null reference to as a seemingly optional method argument. Often the solution is to just pass the null reference and rely on coincidence to keep it working.

A good example is the DocumentBuilder.parse(InputStream stream, String systemId) method in JAXP. It is explicitly documented that an IllegalArgumentException is thrown when the stream argument is null, but the systemId argument is just documented as "Provide a base for resolving relative URIs". There is also an overloaded DocumentBuilder.parse(InputStream stream) method, and incidentally it happens that calling the former method with a null systemId is equivalent to calling the latter method.

Now a JAXP client developer that has an InputStream and system identifier string that might be null, could either do the right thing and program defensively:

if (systemId == null) {
    builder.parse(stream);
} else {
    builder.parse(stream, systemId);
}

or rely on the coincidence that a null system identifier is actually allowed:

builder.parse(stream, systemId);

The latter case is in my experience what most of the developers would do, and thus the JAXP implementation is in practice required to keep allowing null system identifiers. The systemId argument should therefore be documented as "Optional base for resolving relative URIs" or even more explicitly as "Base for resolving relative URIs, or null".

Member variables

A good practice in Java is to keep all member variables private or at least protected. Unfortunately this allows the developer to be lazy in documenting the permitted states of the variable. After all, a private member is not a part of the public interface of a class, so why bother documenting it. A member variable can be null either by having explicitly been set so or by having been passed as null to a constructor or a setter. Often you need to explicitly search through the sources of a class to determine the possible states of a member variable. This is especially important when using the JavaBean conventions where a private member variable is often exposed trough a getter method with a template javadoc that contains no mention of the valid states of the underlying variable.

The JavaBean case is actually especially troublesome as the common pattern for JavaBean properties is:

private Object something;

/** Returns something */
public Object getSomething() {
    return something;
}

/** Sets something */
public void setSomething(Object something) {
    this.something = something;
}

It is most often not documented whether null references are allowed in the setter or if the client is required to explicitly set the property before doing anything with a bean instance. This is in my experience the main cause ofNullPointerExceptions in component-based systems.

Return values

Null references are commonly used to represent the absence of some value. For example the Map.get(Object key) method returns null when an entry for the given key is not found in the map. Such cases are usually well documented (the Map.get method returns "the value to which this map maps the specified key, or null if the map contains no mapping for this key"), but in some cases it is just implicitly assumed that a client developer will expect a null return value.

The most common causes of undocumented null return values are the JavaBean getters described above, but sometimes a genuine processing method forgets to mention that the return value might be null. A good example is theZipInputStream.getNextEntry() method, that returns "the ZipEntry just read" but fails to mention that the "ZipEntry just read" is null if no more Zip entries are available. A clever developer will of course assume that this is the case, since the method doesn't throw aNoSuchElementException like the Iterator.next() method does, but the only way to know for sure is to read the ZipInputStream sources and even then you are left with the bad feeling that the implementation might well be changed in a future release.

The return value of the getNextEntry() method should therefore be documented as "the ZipEntry just read, or null if no more entries are available".

Monday, July 10, 2006

UMLet 7

org.apache.jackrabbit.extractor class diagram

Ever since learning UML back in 1998 I've been looking for decent UML tools that best suit my rather ad-hoc diagramming style. Even though I've occasionally used them, I've never really enjoyed the heavyweight, round-tripping, IDE-integrated (even IDE-embedding!) modelling monoliths that most of the UML tools seem to evolve into sooner or later. My reasons for using UML are documenting existing code and discussing new ideas, almost never to actually implement anything. I usually also work in highly heterogeneous settings with co-developers using a wide variety of tools and development environments. Adapting to a do-all-be-all UML tool is in many cases simply impossible or at least quite difficult.

org.apache.jackrabbit.core.query TextFilter class diagram

Thus I've actively stayed away from the high-end offerings and focused more on the low-end alternatives like Dia and the most popular UML tool in the world, MS PowerPoint. However they never felt really natural, being either too inflexible or requiring too much manual work especially when rearranging diagrams. Luckily a few years ago, while doing my yearly lookout for better development tools, I stumbled upon UMLet, a lightweight open source UML diagram editor that has a rather original but very flexible and convenient user interface. It even works as a drop-in plugin for Eclipse.

org.apache.jackrabbit.core.query.lucene TextExtractor class diagram

A few weeks ago after upgrading to Eclipse 3.2, I went looking for an UMLet upgrade and was happy to find version 7 available for download. The new version has nice new features like color and transparency support, new diagram types, and various user interface improvements like improved mouse selection support. Warmly recommended.

The attached class diagrams were quickly created using UMLet 7 to describe the structure of a mid-sized patch I sent for consideration as part of the Jackrabbit issue JCR-415.

Saturday, July 8, 2006

JUnit tests for known issues

A few months ago I started working on the Jackrabbit improvement issue JCR-325. Following a good practice, the first thing I did was create a test case for the missing functionality. However, this breaks another good practice of always passing 100% of the test suite. This and a few other know issue tests are currently causing the Jackrabbit test suite to fail even without any real problems, making it more difficult to check whether a recent change or an experimental patch breaks things.

To fix the situation I started wondering if there was a JUnit version of the TODO blocks in Perl's Test::More. The problem is that JUnit can only report tests as successful or failing (or erroneous if they throw an exception), there is no way to easily mark test failures as TODOs. Googling around and asking the Jackrabbit mailing list produced some workarounds:

Use a system property to determine whether to perform or skip the known issue test cases.

Put the known issues tests in separate test case classes and exclude them from the test suite.

Use a JUnit addon to ignore marked test cases as explained in an article that discusses this same issue.

Use an alternative test framework like TestNG, that has this functionality built-in.

I didn't want to start changing the entire test suite or even tweaking the build environment, so the last two options were out. I also wanted to make the setup easily configurable so a developer can selectively enable testing for a known issue, thus the first alternative of using a system property looks like the best solution. It seems that the Apache OJB project has reached the same solution.

Monday, July 3, 2006

Analyzing sleep EEG

I'm about to start playing with some sleep EEG data from my girlfriend's research group. She's studying neurobiology, and spends this summer working for the Sleep Research Group at Biomedicum Helsinki. They are looking for a way to automatically detect (score) different sleep stages from the raw EEG data. They currently have some semi-automated tools that use a Fourier transformation and some heuristics to detect the different stages, but the tools are not nearly accurate enough. Thus they need to spend countless dull hours manually scoring the data from their rat experiments.

I promised my girlfriend that I'd take a look at the data and see if I could come up with something useful. It's quite likely that I'll just end up admitting defeat (smarter people have tried before...), but it's still a very interesting problem which also gives me a chance to learn something new. She promised to provide me with some real data and any required background information. Here's a list of things I plan to do with the data:

Find or write a simple tool for displaying the time and frequence graphs of a selected window of the EEG data. This will allow me to get familiar with the data and the possible common patterns.

Apply tools like genetic algorithms or neural networks (for example self-organizing maps) on the data and let the computer come up with the heuristics for detecting the sleep stages.

Validate the effectiveness of these tools against the manual scoring and the existing semi-automated tools.

Let's see how this turns out...

Thursday, June 29, 2006

Remote office

Last Friday we traveled to the Finnish archipelago to spend the mid-summer holiday with my family at my father's summer cottage on the Holma island. I decided to stay for this whole week, working remotely over a GPRS connection. The picture on the right shows my desk with our younger cat, Möllis, sleeping cozily on top of the warm laptop power source.

In related news, two weeks ago Yukatan got its first office. Since starting my own company in early 2005 I've been working from home, but from day one that was considered a temporary solution. Getting the office took a little bit longer than expected, but once I get back in town next week, I'll be working from an office room in Olarinluoma, just a 15 minute walk from our home.

ASF membership

A few days ago I was invited to become a member of the Apache Software Foundation! I'm greatly honored by the invitation that acknowledges my "commitment to collaborative open-source software development, through sustained participation and contributions within the Foundation's projects".

This is definitely one of the highlights of my career. :-)

Wednesday, February 15, 2006

The road to Jackrabbit 0.9

I recently volunteered to work as the release manager for the 0.9 and 1.0 releases of the incubating Apache Jackrabbit project. The 0.9 part of this task was finished yesterday when I officially announced the release of Apache Jackrabbit version 0.9. The process leading up to the release announcement has been an interesting and educating one.

I first got involved with the Jackrabbit project in November 2004 when I announced an initial version of the JCR-RMI layer I had been working on. Based on positive feedback I started getting more and more active on the mailing list and continued working on JCR-RMI. My work was rewarded with committer status almost exactly a year ago in February 2005. Since then I've been working on various parts of Jackrabbit although I have still to touch the deep internals of Jackrabbit core. I got another reward for my work when I was invited to join the expert group of JSR 283: Content Repository for Java^TM Technology API Version 2.0.

Talk of making a Jackrabbit 1.0 release started sometime in summer 2005 after the JSR 170 had stabilized the JCR API. There was however not too much activity towards a release so I decided to start pushing things in late September. This sparked some discussion and concerns about some of the open issues. The Jira roadmap for 1.0 started to grow bigger and bigger as more issues were filed. I finally solved this problem in November by setting up a separate 1.1 goal for the more complex issues.

Despite these efforts and a lot of ongoing development work there still wasn't too much happening toward an initial release. This situation changed in January when Roy suggested that a 1.0 release plan might be a good thing to have. I volunteered to act as the release manager and based on the resulting discussions I published the Jackrabbit 0.9 release plan in late January. The idea (as suggested by Roy) was to first make a 0.9 release to streamline the release process for a 1.0 release soon after 0.9.

After working a while with making the build environment better fit the release needs I announced a 0.9 release candidate. Making the release candidate had me reading through a number of release guidelines on the various Apache websites and refreshing my GnuPG skills. All Apache releases are digitally signed so I needed to set up my own key. I also got my key signed by Roy, so I expect my key to show up in the Apache web of trust in a few days.

There was only one issue that got fixed between the first release candidate and the version I posted for the official release vote. The release vote passed with no objections, but then Roy suggested a few packaging changes before endorsing the release as required by the Incubator policy. I made the changes, and since no source files were modified there was no need to restart the release vote. Roy then proceeded to request and receive approval for the release from the Incubator PMC.

After the approval we placed the release under cvs.apache.org where incubating projects are allowed to publish their releases. I then updated the Jackrabbit web site and sent the release announcement. The next step the more challenging 1.0 release on which I'll start focusing tomorrow.

Sunday, February 12, 2006

The joy of troubleshooting

How to troubleshoot a problem in a computer system? There are a number of standard troubleshooting methodologies like trial-and-error, bottom-up, top-down, divide-and-conquer, and even root cause analysis, but none of them is a silver bullet for all possible problems. Often the most effective way is to dynamically combine these approaches based on experience and understanding of the underlying technologies. This is a story of a particularly hard problem with an unexpected cause that I troubleshooted recently.

The problem I was facing was related to a Midgard staging/live setup using the Exorcist tool. The staging site was working just fine but the live site was having weird problems on some pages. It appeared as if the MidCOM components of the troublesome pages were losing a part of their configuration parameters.

Step 1. Because everything was working fine on the staging site I figured that the problem is most likely related to the staging/live replication step performed using the Exorcist. Were the parameters not being replicated? None of the common causes for such situations (unapproved content, parameters in a different sitegroup than the main object, etc.) seemed to apply however so I had to inspect whether the replication dump actually contained the missing parameters. It did!

Step 2. So the parameters were being replicated, but was there some trouble in how they were being imported in the live database? There was nothing strange in the import log file, so I opened a MySQL prompt to inspect the live database. The troublesome parameters seemed to exist just where they should be, and they even had all the correct GUID mappings and other details that had previously caused problem with the Exorcist.

Step 3. It seemed that the live database was in perfect condition, so the problem must be caused by something else. The problem was affecting just some MidCOM components, so I figured that there might be some difference in the component versions used on the staging and live hosts. No, the MidCOM source trees on both the staging and live hosts were identical.

Step 4. If the problem wasn't with MidCOM, then it must be a problem with the Midgard framework. Adding debug prints to one of the troublesome MidCOM components I was able to narrow down the problem to a $object->listparameters($component) call in the MidCOM configuration class. The configuration class seemed to be working just fine everywhere else, so the problem must be in the listparameter method! For some object/component combinations the method just returned nothing and didn't log any errors even though I double- and triple-checked that the parameters existed in the database.

Step 5. I added a set of debug prints to the Midgard-PHP listparameters method, compiled and installed the modified module, and reloaded the troublesome pages. They still didn't work properly, but now I got a detailed trace of what was happening under the hood. The method actually did execute the correct SQL query but received no results from the database. This was certainly weird, as I was very certain that the parameters in question actually did exist.

Step 6. Could there be something wrong with the MySQL client library used by the Midgard core to send queries to and receive results from the database? The normal MySQL command line tools use the same library so I started the MySQL prompt and entered the SQL statement copied from the Midgard log file. No results! The SQL statement was:

SELECT id,domain,name,value FROM record_extension WHERE (tablename='topic' AND oid=27 AND domain='de.linkm.sitemap' AND lang=0) AND (record_extension.sitegroup in (0,2)) ORDER BY name

Step 7. There are some autogeneration artifacts in the query, but no real reason why it shouldn't work. I removed the domain='de.linkm.sitemap' constraint and got a list of parameters, including ones with the domain column set to de.linkm.sitemap! Even worse, the exact same query with domain='midcom' or domain='midcom.helper.nav' were returning results, for some totally unknown reason MySQL seemed to not like the domain='de.linkm.sitemap' constraint. I tried also some other variations of the query and found out that even just removing the ORDER BY clause made the query return the correct rows. This should ring heavy warning bells for anyone who knows SQL, as the ORDER BY clause only orders the returned rows.

Step 8. Could this be some weird character set and collation issue like the ones we've been facing recently? No, it works the same way (rows are returned or not returned depending on whether the ORDER BY clause is used) regardless of the character encoding being used.

Step 9. Has the database been corrupted? I backed up and restored the entire database but the query was still misbehaving. I then copied the correctly behaving database from the staging host to the live host. The same error appeared in that database as well!

Step 10. The staging host was running MySQL version 4.1.12 while the live host had MySQL 4.1.7. Both stable and tested releases installed from the standard RHEL packages. I was recalling ugly memories from the MySQL 3.x times, when weird database errors seemed to be much more common.

Step 11. OK, it seemed that the problem was some MySQL bug that got fixed somewhere between versions 4.1.7 and 4.1.12. I didn't want to risk messing with the database installation so I figured I'd better find some workaround rather than trying to upgrade the database version.

Step 12. I used the EXPLAIN statement trying to find out what could be triggering the MySQL bug. The output suggested that the query planner was using some of the indexes on the record_extension table. The indexes actually look quite weird:

KEY record_extension_oid_idx(oid), KEY record_extension_tablename_idx(tablename(10)), KEY record_domain_tablename_idx(domain(10)), KEY record_extension_sitegroup_idx(sitegroup), KEY record_extension_name_idx(name(10))

The indexes don't seem to be reasonable. Why the arbitrary ten character limit, and why use only a single column per index? This seems anomalous enough to trigger an obscure bug in MySQL, so I just dropped the record_domain_tablename_idx index. And it worked!

The entire troubleshooting session took about three hours with a couple of breaks in between.

Saturday, January 7, 2006

Analyzing the Jackrabbit architecture with Lattix LDM

Tim Bray pointed to David Berlind who pointed to the Lattix company. Lattix makes a tool called Lattix LDM that uses a Dependency Structure Matrix to work with software architecture. I watched the nice Lattix demo and decided to try the software out.

After receiving my Community license and struggling for a while to get the software running on Linux (need to include both the jars and the Lattix directory in the Java classpath!) I loaded the latest Jackrabbit jar file for analysis. The dependency matrix of the top-level packages after an initial partitioning is shown below:

The matrix contains all the package dependencies. A number in a cell of the matrix tells how many dependencies the package on the vertical column has on the package on the horizontal row. You can tell how widely a package is used by reading the cells on the package row. The package column identifies the other packages that the selected package uses or depends on. In general a healthy architecture only contains dependencies located below the diagonal.

The packages 2-6 form the general purpose Jackrabbit commons module, while the more essential parts of the Jackrabbit architecture are found within the core module. I grouped the commons packages and expanded the core module to get a more complete view of the Jackrabbit internals:

There was no immediate structure appearing, so I used the automatic DSM partitioning tool on the core module to sort out the package dependencies:

Jackrabbit core after initial partitioning

The value, config, fs, and util packages form a lower utility layer and the jndi package a higher deployment tool layer. The most interesting part lies between those layers, in the large interdependent section in the middle. The key to the architecture seems to be the main core package that both uses and is used by other packages in this section. I opened a separate view for examining the contents of the main core package:

Partitioning classes within the Jmain core package

The partitioning suggests that it might make sense to split the package in two parts. Without concern for semantic grouping, I just grouped the classes in the upper half as core.A and the classes in the lower half as core.B. This seems useful as the core.B package seems to be a bit better in terms of external dependencies:

Jackrabbit core after splitting the main core package

Running the package partitioning again, I got a bit more balanced results although the main section still is heavily interdependent:

Jackrabbit core partitioning after splitting the main core package

Looking at the vertical columns it seems like the main culprits for the interdependencies are the nodetype, state, version, and the virtual core.A packages. Both the nodetype and state package contain subpackages so I wanted to see if the dependencies could be localized to just a part of those packages:

Contents of the Jackrabbit state and nodetype packages

This is interesting, the interdependencies for the state package are for the main state package, while the nodetype interdependencies only affect the nodetype.virtual subpackage. I split both packages along those dependency relations,and partitioned the core module again:

Jackrabbit core partitioning after splitting the state and nodetype packages

The persistence managers in the state subpackages are now outside the main section just like the non-virtual nodetype classes. After a short while of further research on the dependencies I found that the partitioning of the main state package would suggest that the item state managers be split to a separate package:

After creating a new statemanager package for containing the item state managers, the partitioning of the core module starts to look better. The only remaining circular dependencies are for the virtual core.A and core.B packages:

Jackrabbit core partitioning after moving the state managers into a new statemanager packate

Looking at the virtual core.B package we find that only the NodeId, PropertyId, and ItemId classes depend on the state package:

In fact it seems that it might make sense to move the classes there. After doing that the core module partitioning looks even better:

Jackrabbit core partitioning after moving the ItemId classes to the state package

The only remaining source of cyclic dependencies is the virtual core.A package into which I wont be going any deeper at this moment. Even now the analysis seems to have provided a number of suggestions for reducing the amount of cyclic dependencies and thus the improving the design quality of the Jackrabbit core:

Split the main core package into subpackages

Move the nodetype.virtual package to a higher level

Move the state subpackages to a separate package

Make a separate package for the item state managers

Move the NodeId, PropertyId, and ItemId classes to the state package

Note that these suggestions are just initial ideas based on a quick walkthrough of the Jackrabbit architecture using a Dependency Structure Matrix as a tool. As such the approach only gives structural insight to the architecture, and for this short analysis I didn't much include knowledge about the semantic roles and relationships of the Jackrabbit classes and packages.

Monday, January 2, 2006

Implementing mRFC 0024

Today I wrote the mRFC 0024: Full text indexing in Midgard proposal for adding full text and content tree support to the Midgard Query Builder. Like Torben did for the MidCOM indexer, I'm planning to use Apache Lucene as the underlying full text engine. The search indexer process shall be based on the Lucene Java library, but I haven't yet decided what I should use for the query part. On the surface the best option would seem to be either the Lucene4C or the CLucene library, but both options have drawbacks. The Lucene4C seems like the best match for the midgard-core environment, but it doesn't seem to be too actively developed and there's even been talk of abandoning it for a gcj-compiled version of Lucene Java. The CLucene library is more mature, but it's written in C++ and might therefore cause some unexpected build issues for midgard-core. One option would of course be to actually try linking midgard-core with a gcj-compiled Lucene Java! I'll prototype with all these options tomorrow while the mRFC 0024 vote takes place.

Another interesting issue in mRFC 0024 is the introduction of the parent cache, or actually a global content tree structure. Currently Midgard supports a sort of a tree model for all content, but it is mostly accessible only as limited views like for example the topic, page and snippet trees. Functions like is_in_tree or list_..._all have also required major scope limitations or other performance hacks to be useful. This is a bit troublesome for many use cases like searching and access controlling. The proposed parent cache would greatly simplify such content tree operations.

If the proposed content tree model catches on, then a natural migration path for Midgard 2.0 would be to make the proposed parent_guid field the official parent link in all Midgard records. This would both simplify the object model and allow for much flexibility in organizing the content tree. It would for example be possible to create an event calendar topic that has all the event objects as direct descendants instead of having to use an explicit link to a separate root event. The only problem with this approach is that it is a major backwards compatibility issue...

Sunday, January 1, 2006

Network is the computer

I am sick and tired of doing backups, synchronizing settings, and having trouble accessing information. These are all symptoms of keeping your data locally on multiple computers. As a new year's resolution I have decided to get rid of all these problem.

So far I've tried to solve these issues by maintaining my own mail and web servers and keeping my data mostly on the servers. The problem with this approach is that I've never had (and probably never will have) the time to set up and maintain all the network services I'd need.

Thus I've decided to fully embrace the famous "Network is the computer" slogan by moving to fully external network applications for most of my daily information management.

As of today my Internet toolset will consist of the following:

Gmail for email - jukka.zitting@gmail.com

Wordpress.com for blogging - jukkaz.wordpress.com

Flickr for images - flickr.com/photos/jlz

del.icio.us for bookmarks - del.icio.us/jz

The main reason why I haven't done this before is the question of data access and ownership. There is always the change of one of the service providers going down and taking the service with them. The external service interface also limits the things you can do with your data. Luckily with the recent emergence of various programmable APIs (REST, SOAP, etc.) these problems have become much less pressing. I can now write my own tools to import, export, and manipulate the externally stored data as easily (or even more easily!) as local data. This, I believe, is one of the cornerstones of the network as a computer.