Monday, January 2, 2006

Implementing mRFC 0024

Today I wrote the mRFC 0024: Full text indexing in Midgard proposal for adding full text and content tree support to the Midgard Query Builder. Like Torben did for the MidCOM indexer, I'm planning to use Apache Lucene as the underlying full text engine. The search indexer process shall be based on the Lucene Java library, but I haven't yet decided what I should use for the query part. On the surface the best option would seem to be either the Lucene4C or the CLucene library, but both options have drawbacks. The Lucene4C seems like the best match for the midgard-core environment, but it doesn't seem to be too actively developed and there's even been talk of abandoning it for a gcj-compiled version of Lucene Java. The CLucene library is more mature, but it's written in C++ and might therefore cause some unexpected build issues for midgard-core. One option would of course be to actually try linking midgard-core with a gcj-compiled Lucene Java! I'll prototype with all these options tomorrow while the mRFC 0024 vote takes place.

Another interesting issue in mRFC 0024 is the introduction of the parent cache, or actually a global content tree structure. Currently Midgard supports a sort of a tree model for all content, but it is mostly accessible only as limited views like for example the topic, page and snippet trees. Functions like is_in_tree or list_..._all have also required major scope limitations or other performance hacks to be useful. This is a bit troublesome for many use cases like searching and access controlling. The proposed parent cache would greatly simplify such content tree operations.

If the proposed content tree model catches on, then a natural migration path for Midgard 2.0 would be to make the proposed parent_guid field the official parent link in all Midgard records. This would both simplify the object model and allow for much flexibility in organizing the content tree. It would for example be possible to create an event calendar topic that has all the event objects as direct descendants instead of having to use an explicit link to a separate root event. The only problem with this approach is that it is a major backwards compatibility issue...

No comments:

Post a Comment