- The incubating Apache PDFBox project is just about to release the eagerly anticipated 0.8.0 release. I'm expecting to see the release announcement on Tuesday next week. PDFBox is a Java library for working with PDF documents.
- Another incubating project, Apache UIMA, is working towards the 2.3.0 release. I'm looking forward to seeing both UIMA and PDFBox graduating from the Apache Incubator shortly after the respective releases. UIMA is a framework and a set of components for analyzing large volumes of unstructured information.
- The Apache Sling project is a component-based project like Apache Felix, so there is no clear project-wide release cycle. Instead Sling is about to start releasing new versions of most of the components changed since the all-inclusive incubator releases. Sling is a JCR-based web framework.
- Apache Tika uses PDFBox for extracting text content from PDF documents. I'm hoping to see a Tika 0.5 release soon with the latest PDFBox dependency and the design improvements I've been working on. Tika is a toolkit for extracting text and metadata from all kinds of documents.
- Apache Solr is about to enter code freeze in preparation for the 1.4 release that will include the "Solar Cell" feature based on Tika. Solr is a search server based on Lucene.
- The Commons IO project has been upgraded to use Java 5 features and I'm starting to push it towards a 2.0 release. Commons IO is a library of Java IO utilities.
- Lucene Java is gearing up for the 2.9 release, and will soon follow up with the 3.0 release. The trie range feature is an especially welcome addition for many use cases. Lucene is a feature-rich high performance search engine.
- And last but not least, Apache Jackrabbit is getting ready to release the 2.0 version based on the recently approved JCR 2.0 standard. Jackrabbit is a feature-complete JCR content repository implementation.
I'm hoping to see most of these releases happening in time for the ApacheCon US 2009 conference in early November.