tag:blogger.com,1999:blog-18737785749732014332024-03-08T02:25:19.194-08:00Jukka ZittingSoftware craftmanshipAnonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.comBlogger81125tag:blogger.com,1999:blog-1873778574973201433.post-88634227847756603992010-11-24T14:56:00.000-08:002014-10-29T18:02:43.750-07:00The case for the digital Babel fish<a href="http://manning.com/mattmann/"><img class="alignright size-medium wp-image-382" title="Tika in Action" src="http://jukkaz.files.wordpress.com/2010/11/cover.jpg?w=239" alt="" width="239" height="300" /></a>"<em>Just like Arthur Dent, who after inserting a Babel fish in his ear could understand Vogon poetry, a computer program that uses Tika can understand Microsoft Word documents.</em>" This is how <a href="http://manning.com/mattmann/">Tika in Action</a>, our book on <a href="http://tika.apache.org/">Apache Tika</a>, introduces it's subject. Download the freely available <a href="http://manning.com/mattmann/MEAP_CH01.pdf">first chapter</a> to read the the full introduction.<br/><br/><a href="http://sunset.usc.edu/~mattmann/">Chris Mattmann</a> and I started writing the Tika in Actionbook for <a href="http://manning.com/">Manning</a> at the beginning of this year, and we're now well past the half-way post. If we keep up this pace, the book should be out in print by next Summer! And thanks to the <a href="http://manning.com/about/meap.html">Manning Early Access Program</a> (MEAP), you can already pre-order and access an early access edition of the book at the <a href="http://manning.com/mattmann/">Tika in Action MEAP page</a>.<br/><br/>If you're interested, use the "tika50" code to get a 50% early access discount when purchasing the MEAP book. You'll still receive updates on all new chapters and of course the full book when it's finished. Note that this discount code is valid only until December 17th, 2010.<br/><br/>We're also very interested in all comments and other feedback you may have about the book. Use the <a href="http://www.manning-sandbox.com/forum.jspa?forumID=678">online forum</a> or contact us directly, and we'll do our best to make the book more useful to you!<br/><br/><em>Update: The book is out in print now! Use the "tika37com" code for a 37% discount on the final book.</em>Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com3tag:blogger.com,1999:blog-1873778574973201433.post-60236819057083873332010-11-11T17:58:00.000-08:002014-10-29T18:02:43.660-07:00Open sourcing made easyOpen sourcing a closed codebase can be difficult. The typical approach is to decide that you'll go open source, make big news about it and then try to figure out how to proceed. It's no wonder many open source transitions end up being more painful than expected and fail to generate as much community interest and involvement as hoped. How can you do better?<br/><br/><strong>0. Start small</strong><br/><br/>Even though your marketing people will be eager to use a good story, you should to avoid the temptation to make a big deal about your shiny new open source project. Instead, start with small, reversible steps that allow you to get comfortable with the new way of developing software before making public commitments. In other words, learn to walk before you try to run. The next sections outline how to do this.<br/><br/><strong>1. Clean up the codebase</strong><br/><br/>Do you really know what's inside your existing codebase? Do you have rights to use and redistribute all the included intellectual property? Are there trade secrets or other bits in the codebase that you'd rather not show everyone? Do you wish to keep parts of the codebase closed so you can keep selling them as an add-on components on top of the open source offering?<br/><br/>Answering these questions should be your first task. You'll need to spend some time auditing and possibly refactoring your code to prepare it for the public eye. Depending on the codebase this could be anything from a trivial exercise to a significant project. The nice thing is that the increased understanding and potential modularity you gain from this work will be quite valuable even if you never take the next step.<br/><br/><strong>2. Open up your tools</strong><br/><br/>Now that your codebase is clean and ready for the public view, you can (and should!) start using public tools to develop the code. You can either make your existing version control, issue tracking and other tools public, or migrate to a new set of public tools. There are plenty of excellent free hosting services for open source projects, so you have a good opportunity to both lower your maintenance costs and improve your productivity through better tooling!<br/><br/>There's no need yet to worry about external users or contributors. In fact the fewer people you attract at this stage, the better! The main purpose of this step is to make your developers comfortable with the idea that anyone could come and see all their code and all the mistakes they are making. This is a <em>big cultural change</em> for many developers, and you'll want to start small to give them time to adapt in peace.<br/><br/><strong>3. Engage the community</strong><br/><br/>If you've followed the steps so far, you've actually already open sourced your codebase. Are you and your developers comfortable with the situation? It's still possible to switch back to closed source with minimal disruption and no lost reputation if you're having second thoughts. But if you are willing to move forward, now is the time to start enjoying the benefits of open development!<br/><br/>Call in your marketing people to do their magic. Tell the world about the code you're sharing, and invite everyone to participate! If you're product is in any way useful to someone, you'll start seeing people come in, ask questions, submit bug reports and perhaps even contribute fixes. At this point it is useful to have a few people ready to help such new users and contributors, but it's surprising how quickly the community can become self-sufficient. More on that in a later post...Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com5tag:blogger.com,1999:blog-1873778574973201433.post-12690009412884086812010-11-07T17:33:00.000-08:002014-10-29T18:02:43.563-07:00Models of corporate open sourceThere are many different ways and reasons for companies to develop their software as open source. Here's some brief commentary on the main approaches you'll encounter in practice.<br/><br/><strong>0. Closed source</strong><br/><br/>Well, closed source is obviously not open, but I should mention it as not all software can or should be open. The main benefit of closed source software is that you can sell it. If you are working for profit, then you should only consider open sourcing your software if the benefits of doing so outweigh the lost license revenue.<br/><br/><strong>1. Open releases</strong><br/><br/>Also known as code drops. You develop the software internally, but you make your release available as open source to everyone who's interested. Allows you to play the "open source" card in marketing, and makes for a great loss leader for a "pro" or "enterprise" version with a higher price tag. And no changes are needed from more traditional closed source development processes. Unfortunately your users don't have much of an incentive to get involved in the development unless they decide to fork your codebase, which usually isn't what you'd want.<br/><br/><strong>2. Open development</strong><br/><br/>Making it easy for your users to get truly involved in your project requires changes in the way you approach development. You'll need to open up your source repositories, issue trackers and other tools, and make it easy for people to interact directly with your developers instead of going through levels support personnel. Do that, and you'll start receiving all sorts of contributions like bug reports, patches, new ideas, documentation, support, advocacy and sales leads for free. You can even allow trusted contributors to commit their changes directly to your codebase without losing control of the project.<br/><br/><strong>3. Open community</strong><br/><br/>Control, or the illusion of it, is a double-edged sword. If you're the "owner" the project, why should others invest heavily in developing or supporting "your" code? To avoid this inherent limitation and to unlock the full potential of the open source community, you'll need to let go of the idea of the project being yours. Instead you're just as much a user and a contributor to the project as everyone else, with no special privileges. The more you contribute, the more you get to influence the direction of the project. This is the secret sauce of most truly successful and sustainable open source projects, and it's also a key ingredient of the <a href="http://www.apache.org/foundation/how-it-works.html">Apache Way</a>.<br/><br/><strong>So what's the right way?</strong><br/><br/>There's no single best way to do open (or closed) source, and the right model for your project depends on many factors like your business strategy and environment. The right model can even vary between different codebases within the same company. For example in the "open core" model you increase the level of innovation in and adoption of your core technologies by open sourcing them (or using existing open source components), but you make money and maintain your competitive edge through closed source add-ons or full layers on top of the open core. This is the model we've been using quite successfully at <a href="http://www.day.com/">Day</a> (now a part of <a href="http://www.adobe.com/">Adobe</a>).<br/><br/>If you've decided to go open source and you don't have a strong need to maintain absolute control over your codebase (like I suppose <a href="http://www.oracle.com/">Oracle</a> now has over the <a href="http://openjdk.java.net/">OpenJDK</a>!), I would recommend going all the way to the open community model. It can be a tough cultural change and often requires changes in your existing development processes and practices, but the payback can be huge. In military terms the community can act as a force multiplier not just for your developers, but also for the QA and support personnel and often even your sales and marketing teams!<br/><br/>If you're interested in pursuing the open community model as described above, the <a href="http://incubator.apache.org/">Apache Incubator</a> is a great place to start!Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com6tag:blogger.com,1999:blog-1873778574973201433.post-37320304405313661462010-11-01T15:54:00.000-07:002014-10-29T18:02:43.539-07:00Chongqing on the rise"The largest city you've never heard about." That's how the <a href="http://www.foreignpolicy.com/">Foreign Policy magazine</a> labeled <a href="http://en.wikipedia.org/wiki/Chongqing">Chongqing</a> in a <a href="http://www.foreignpolicy.com/articles/2010/08/16/chicago_on_the_yangtze">recent story</a> about the city. Today the Finnish television showed an <a href="http://yle.fi/ohjelmat/834652">interesting documentary</a> that centered on the same city, and I recall seeing it mentioned also in the <a href="http://www.economist.com/">Economist</a> recently. A sign of things to come?<br/><br/>I find it interesting that many of the above stories give the impression of Chongqing as a megacity of 30+ million people, when in fact (or at least according to Wikipedia) the urban population is "just" 5+ million people and a majority of the rest are farmers living in the surrounding areas that are administratively part of the city.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com0tag:blogger.com,1999:blog-1873778574973201433.post-26083999314338209872010-08-26T10:12:00.000-07:002014-10-29T18:02:19.450-07:00Age discrimination with Clojure<a href="http://michid.wordpress.com/">Michael Dürig</a>, a colleague of mine and big fan of <a href="http://www.scala-lang.org/">Scala</a>, wrote a <a href="http://michid.wordpress.com/2010/08/24/so-scala-is-too-complex/" title="So Scala is too complex?">nice post</a> about the relative complexity of Scala and Java.<br/><br/>Such comparisons are of course highly debatable, as seen in the comments that Michi's post sparked, but for the fun of it I wanted to see what the equivalent code would look like in <a href="http://clojure.org/">Clojure</a>, my favourite post-Java language.<br/><br/>[sourcecode language="clojure"]<br/>(use '[clojure.contrib.seq :only (separate)])<br/><br/>(defstruct person :name :age)<br/><br/>(def persons<br/> [(struct person "Boris" 40)<br/> (struct person "Betty" 32)<br/> (struct person "Bambi" 17)])<br/><br/>(let [[minors majors] (separate #(<= (% :age) 18) persons)]<br/> (println minors)<br/> (println majors))<br/>[/sourcecode]<br/><br/>The output is:<br/><br/>[sourcecode language="clojure"]<br/>({:name Bambi, :age 17})<br/>({:name Boris, :age 40} {:name Betty, :age 32})<br/>[/sourcecode]<br/><br/>I guess the consensus among post-Java languages is that features like JavaBean-style structures and functional collection algorithms should either be a built-in part of the language or at least trivially implementable in supporting libraries.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com5tag:blogger.com,1999:blog-1873778574973201433.post-61153335133035601372010-07-28T06:54:00.000-07:002014-10-29T18:02:19.332-07:00Open Source at Adobe?The <a title="Adobe to Acquire Day Software" href="http://www.adobe.com/aboutadobe/pressroom/pressreleases/201007/072810AdobetoAcquireDaySoftware.html">news</a> is just in about <a href="http://www.adobe.com/">Adobe</a> being set to acquire <a href="http://www.day.com/">Day Software</a> (see also the <a href="http://www.day.com/day/en/company/adobefaq.html">FAQ</a>). Assuming the deal goes through, it looks like I'll be working for Adobe by the end of this year. I'm an open source developer, so I'm looking forward to <a title="Will Adobe See the Light (of Day)?" href="http://www.computerworlduk.com/community/blogs/index.cfm?entryid=3098&blogid=14">finding out</a> how committed Adobe is in supporting the <a title="Day Awarded for Open Source Support" href="http://www.day.com/day/en/company/news_events/press_releases/day_awarded_for_opensourcesupport.html">open development model</a> we're using for many parts of Day products.<br/><br/>The <a href="http://www.computerworlduk.com/community/blogs/index.cfm?entryid=3098&blogid=14#tsb">first comments</a> from <a href="http://twitter.com/erikdlarson">Erik Larson</a>, a senior director of product management and strategy at Adobe, seem promising and he also <a href="http://twitter.com/erikdlarson/status/19729843366">asked</a> what the deal should mean for open source. This is my response from the perspective of the open source projects I'm involved in.<br/><br/>First and foremost I'm looking forward to continuing the open and standards-based development of our key technologies like <a href="http://jackrabbit.apache.org/">Apache Jackrabbit</a> and <a href="http://sling.apache.org/">Apache Sling</a>. There's no way we'd be able to maintain the current level of <a href="http://grep.codeconsult.ch/2010/07/02/open-innovation-in-software-means-open-source-2/">innovation and productivity</a> in these key parts of our product infrastructure without our symbiotic relationship with the open source community.<br/><br/>Second, I'm hoping that our experience and involvement with open source projects will help Adobe better interact with the various open source efforts that leverage Adobe standards and technologies like <a title="Extensible Metadata Platform" href="http://www.adobe.com/products/xmp/">XMP</a>, <a href="http://www.adobe.com/devnet/pdf/">PDF</a> and <a href="http://www.adobe.com/devnet/flv/">Flash</a>. The Apache Software Foundation is a home to a <a title="Digital Media at Apache" href="http://jukkaz.wordpress.com/2007/11/16/digital-media-at-apache/">growing collection</a> of digital media projects like <a title="Apache PDFBox" href="http://pdfbox.apache.org/">PDFBox</a>, <a title="Apache FOP" href="http://xmlgraphics.apache.org/fop/">FOP</a>, <a title="Apache Tika" href="http://tika.apache.org/">Tika</a>, <a title="Apache Batik" href="http://xmlgraphics.apache.org/batik/">Batik</a> and <a title="Apache Commons Sanselan" href="http://commons.apache.org/sanselan/">Sanselan</a>, all of which are in one way or another related to Adobe's business. For example as a committer and release manager of the <a href="http://pdfbox.apache.org/">Apache PDFBox</a> project I'd much appreciate better access to Adobe's deep technical PDF know-how. Similarly, in <a href="http://tika.apache.org/">Apache Tika</a> we're considering using XMP as our metadata standard, and better access to and co-operation with the people behind Adobe's XMP toolkit SDK (see more below) would be highly valuable.<br/><br/>It would be great to see Adobe becoming more proactive in reaching out and supporting such grass-roots efforts that leverage their technologies. I've <a href="http://markmail.org/message/zwobxjatnngs2slt">dealt with</a> Adobe lawyers on such cases before with good results but it did take some time before I found the correct people to contact. Another area of improvement would be to make freely redistributable Adobe IP more easily accessible for external developers by pushing them out to central repositories like Maven Central, RubyGems or CPAN, for example like I did when making <a href="http://jira.codehaus.org/browse/MAVENUPLOAD-2485">PDF core font information</a> available on Maven Central.<br/><br/>Finally, it would be great to see Adobe going further in embracing an open development model for some of their codebases like the <a href="http://sourceforge.net/adobe/adobexmp/home/">XMP toolkit SDK</a> that they already release under open source licenses. I'd love to champion or mentor the effort, should Adobe be willing to bring the XMP toolkit to the <a href="http://incubator.apache.org/">Apache Incubator</a>!Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com7tag:blogger.com,1999:blog-1873778574973201433.post-13984248171470565142010-05-27T15:48:00.000-07:002014-10-29T18:02:19.170-07:00Forking a JVMThe <a title="java.lang.Thread" href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Thread.html">thread model</a> of Java is pretty good and works well for many use cases, but every now and then you need a separate process for better isolation of certain computations. For example in <a href="http://tika.apache.org/">Apache Tika</a> we're <a title="TIKA-416: Out-of-process text extraction" href="https://issues.apache.org/jira/browse/TIKA-416">looking</a> for a way to avoid <a title="PDFBOX-209: java.lang.OutOfMemoryError while parsing pdf file" href="https://issues.apache.org/jira/browse/PDFBOX-209">OutOfMemoryErrors</a> or <a title="PDFBOX-511: JVM crash in PDColorSpaceInstance.createColor()" href="https://issues.apache.org/jira/browse/PDFBOX-511">JVM crashes</a> caused by <a title="Crash in the libcmm" href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6645513">faulty libraries</a> or <a title="TIKA-259: Safe parsing of droste.zip" href="https://issues.apache.org/jira/browse/TIKA-259">troublesome input data</a>.<br/><br/>In C and many other programming languages the straightforward way to achieve this is to <a href="http://www.opengroup.org/onlinepubs/000095399/functions/fork.html">fork</a> separate processes for such tasks. Unfortunately Java doesn't support the concept of a fork (i.e. creating a copy of a running process). Instead, all you can do is to start up a completely new process. To create a mirror copy of your current process you'd need to start a new JVM instance with a recreated classpath and make sure that the new process reaches a state where you can get useful results from it. This is quite complicated and typically depends on predefined knowledge of what your classpath looks like. Certainly not something for a simple library to do when deployed somewhere inside a complex application server.<br/><br/>But there's another way! The latest Tika trunk <a href="http://svn.apache.org/viewcvs?view=rev&rev=944945">now</a> <a href="http://svn.apache.org/viewcvs?view=rev&rev=948081">contains</a> an early version of a fork feature that allows you to start a new JVM for running computations with the classes <em>and</em> data that you have in your current JVM instance. This is achieved by copying a few supporting class files to a temporary directory and starting the "child JVM" with only those classes. Once started, the supporting code in the child JVM establishes a simple communication protocol with the parent JVM using the standard input and output streams. You can then send serialized data and processing agents to the child JVM, where they will be deserialized using a special class loader that uses the communication link to access classes and other resources from the parent JVM.<br/><br/>My code is still far from production-ready, but I believe I've already solved all the tricky parts and everything seems to work as expected. Perhaps this code should go into an <a href="http://commons.apache.org/">Apache Commons</a> component, since it seems like it would be useful also to other projects beyond Tika. Initial searching didn't bring up other implementations of the same idea, but I wouldn't be surprised if there are some out there. Pointers welcome.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com13tag:blogger.com,1999:blog-1873778574973201433.post-75339741480412593692010-05-25T18:12:00.000-07:002014-10-29T18:02:19.031-07:00Apache meritocracy vs. architectsCeki Gülcü recently wrote an <a title="The forces and vulnerabilities of the Apache model" href="http://ceki.blogspot.com/2010/05/forces-and-vulnerabilites-of-apache.html">interesting post</a> on the Apache community model and its vulnerability in cases where consensus can not be reached with reasonable effort. Also the discussion in the comments is interesting.<br/><br/>Ceki's done some amazing work especially on Java logging libraries, and his design vision shines through the code he's written. He's clearly at the high edge of the talent curve even among a community of highly qualified open source developers, which is why I'm not surprised that he dislikes the conservative nature of the consensus-based development model used at Apache. And the <a href="http://logging.apache.org/log4j/">log4j</a> history certainly is a sorry example of conservative forces more or less killing active development. In hindsight Ceki's decision to start the <a href="http://www.slf4j.org/">slf4j</a> and <a href="http://logback.qos.ch/">logback</a> projects may have been the best way out of the deadlock.<br/><br/>Software development is a complex task where best results are achieved when a clear sense of architecture and design is combined with hard work and attention to details. A consensus-based development model is great for the latter parts, but can easily suffer from the design-by-committee syndrome when dealing with architectural changes or other design issues. From this perspective it's no surprise that the Apache Software Foundation is considered a great place for maintaining stable projects. Even the <a href="http://incubator.apache.org/">Apache Incubator</a> is geared towards established codebases.<br/><br/>Even fairly simple refactorings like the one I'm <a href="https://issues.apache.org/jira/browse/JCR-890">currently proposing</a> for <a title="Apache Jackrabbit" href="http://jackrabbit.apache.org/">Apache Jackrabbit</a> can require quite a bit of time-consuming consensus-building, which can easily frustrate people who are proposing such changes. In Jackrabbit I'm surrounded by highly talented people so I treat the consensus-building time as a chance to learn more and to challenge my own assumptions, but I can easily envision cases where this would just seem like extra effort and delay.<br/><br/>More extensive design work is almost always best performed mainly by a single person based on reviews and comments by other community members. Most successful open and closed source projects can trace their core architectures back to the work of a single person or a small tightly-knit team of like-minded developers. This is why many projects recognize such a "benevolent dictator" as the person with the final word on matters of project architecture.<br/><br/>The Apache practices for resolving vetos and other conflicts work well when dealing with localized changes where it's possible to objectively review two or more competing solutions to a problem, but in my experience they don't scale that well to larger design issues. The best documented practice for such cases that I've seen is the "<a href="http://incubator.apache.org/learn/rules-for-revolutionaries.html">Rules for revolutionaries</a>" post, but it doesn't cover the case where there are multiple competing visions for the future. Any ideas on how such situations should best be handled in Apache communities?Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com11tag:blogger.com,1999:blog-1873778574973201433.post-91943590929770266212010-05-14T04:47:00.000-07:002014-10-29T18:02:18.991-07:00Buzzword conference in JuneLike the <a href="http://www.lucene-eurocon.com/">Lucene conference</a> I <a href="http://jukkaz.wordpress.com/2010/04/21/lucene-conference-in-may/">mentioned</a> earlier, <a href="http://berlinbuzzwords.de/">Berlin Buzzwords 2010</a> is a new conference that fills in the space left by the decision not to organize an ApacheCon in Europe this year. Going beyond the Apache scope, Berlin Buzzwords is a conference for all things related to scalability, storage and search. Some of the key projects in this space are <a title="Apache Hadoop" href="http://hadoop.apache.org/">Hadoop</a>, <a title="Apache CouchDB" href="http://couchdb.apache.org/">CouchDB</a> and <a title="Apache Lucene" href="http://lucene.apache.org/">Lucene</a>.<br/><br/><a href="http://berlinbuzzwords.de/"><img class="aligncenter size-full wp-image-315" title="Berlin Buzzwords 2010" src="http://jukkaz.files.wordpress.com/2010/05/buzzpoint_logo.png" alt="" width="450" height="118" /></a><br/><br/>I'll be there to make a case for hierarchical databases (including <a title="JSR 283: Content Repository for Java Technology API Version 2.0" href="http://jcp.org/en/jsr/detail?id=283">JCR</a> and <a title="Apache Jackrabbit" href="http://jackrabbit.apache.org/">Jackrabbit</a>) and to present <a href="http://tika.apache.org/">Apache Tika</a> project. The abstracts of my talks are:<br/><p style="padding-left:30px;"><strong>The return of the hierarchical model</strong></p><br/><p style="padding-left:30px;">After its introduction the relational model quickly replaced the network and hierarchical models used by many early databases, but the hierarchical model has lived on in file systems, directory services, XML and many other domains. There are many cases where the features of the hierarchical model fit the needs of modern use cases and distributed deployments better than the relational model, so it's a good time to reconsider the idea of a general-purpose hierarchical database.</p><br/><br/><p style="padding-left:30px;">The first part of this presentation explores the features that differentiate hierarchical databases from relational databases and NoSQL alternatives like document databases and distributed key-value stores. Existing hierarchical database products like XML databases, LDAP servers and advanced filesystems are reviewed and compared.</p><br/><br/><p style="padding-left:30px;">The second part of the presentation introduces the Content Repositories for the Java Technology (JCR) standard as a modern take on standardizing generic hierarchical databases. We also look at Apache Jackrabbit, the open source JCR reference implementation, and how it implements the hierarchical model.</p><br/><br/>and:<br/><br/><p style="padding-left:30px;"><strong>Text and metadata extraction with Apache Tika</strong></p><br/><p style="padding-left:30px;">Apache Tika is a toolkit for extracting text and metadata from digital documents. It's the perfect companion to search engines and any other applications where it's useful to know more than just the name and size of a file. Powered by parser libraries like <a href="http://poi.apache.org/">Apache POI</a> and <a title="Apache PDFBox" href="http://pdfbox.apache.org/">PDFBox</a>, Tika offers a simple and unified way to access content in dozens of document formats.</p><br/><p style="padding-left:30px;">This presentation introduces Apache Tika and shows how it's being used in projects like Apache Solr and Apache Jackrabbit. You will learn how to integrate Tika with your application and how to configure and extend Tika to best suit your needs. The presentation also summarizes the key characteristics of the more widely used file formats and metadata standards, and shows how Tika can help deal with that complexity.</p><br/>I hear there are still some early bird <a href="http://twitter.com/hadoopberlin/status/13911230201">tickets available</a>. See you in Berlin!Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com2tag:blogger.com,1999:blog-1873778574973201433.post-30010795423555099082010-05-14T02:58:00.000-07:002014-10-29T18:02:18.926-07:00Commit early, commit often!A <a href="http://svn.apache.org/viewvc?view=revision&revision=943816">huge commit</a> was made in a log4j branch yesterday. The followup discussion:<br/><br/><a href="http://markmail.org/message/qv7g2rsqe55gbikw">Comment</a>:<br/><blockquote><br/><p style="padding-left:30px;"><em>"I haven't had a chance to review the rest of the commit, but it seems like a substantial amount of work that was done in isolation. While things are still fresh, can you walk through the whats in this thing and the decisions that you made."</em></p><br/></blockquote><br/><a href="http://markmail.org/message/lrlqa6hqfaypfage">Reply</a>:<br/><blockquote><br/><p style="padding-left:30px;"><em>"I didn't want to commit code until I had the core of something that actually functioned. I struggled for a couple of weeks over how to attack XMLConfiguration. [...] See below for what I came up with."</em></p><br/></blockquote><br/>Followed by ten bullet points about the changes made. Unfortunately the only thing our version control system now knows about these changes is "First version".Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com4tag:blogger.com,1999:blog-1873778574973201433.post-28977045250210856122010-04-21T15:23:00.000-07:002014-10-29T18:02:18.872-07:00Lucene conference in MayThis year there is no <a href="http://eu.apachecon.com/c/aceu2009/">ApacheCon Europe,</a> but a number of more focused events related to projects at Apache and elsewhere are showing up to fill the space.<br/><br/>The first one is <a href="http://lucene-eurocon.org/">Apache Lucene EuroCon</a>, a dedicated Lucene and Solr user conference on 18-21 May in Prague. That's the place to be if you're in Europe and interested in Lucene-based search technology (or want to stop by for the <a href="http://www.ceskypivnifestival.cz/en/">beer festival</a>). I'll be there presenting <a href="http://lucene.apache.org/tika/">Apache Tika</a>, and the abstract of my presentation is:<br/><blockquote><br/><p style="padding-left:30px;"><em>Apache Tika is a toolkit for extracting text and metadata from digital documents. It's the perfect companion to search engines and any other applications where it's useful to know more than just the name and size of a file. Powered by parser libraries like <a href="http://poi.apache.org/">Apache POI</a> and <a href="http://pdfbox.apache.org/">PDFBox</a>, Tika offers a simple and unified way to access content in dozens of document formats.</em></p><br/><p style="padding-left:30px;"><em>This presentation introduces Apache Tika and shows how it's being used in projects like <a href="http://lucene.apache.org/solr/">Apache Solr</a> and <a href="http://jackrabbit.apache.org/">Apache Jackrabbit</a>. You will learn how to integrate Tika with your application and how to configure and extend Tika to best suit your needs. The presentation also summarizes the key characteristics of the more widely used file formats and metadata standards, and shows how Tika can help deal with that complexity.</em></p><br/></blockquote><br/>The rest of the <a href="http://lucene-eurocon.org/agenda.html">conference program</a> is also now available. See you there!Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com3tag:blogger.com,1999:blog-1873778574973201433.post-37884542352118076632010-04-12T09:06:00.000-07:002014-10-29T18:02:18.834-07:00"SIMPLE".toLowerCase() is simple, right?It turns out that <code>"SIMPLE".toLowerCase().equals("simple")</code> is not true if your default locale is Turkish, but your code is written in English. Turkish has two "i" characters, one with a dot and one without, which throws the above code off balance. The fix is to write the expression either as <code>"SIMPLE".toLowerCase(Locale.ENGLISH).equals("simple")</code> or even better as <code>"SIMPLE".equalsIgnoreCase("simple")</code>.<br/><br/>I just stumbled on this issue with <a href="http://lucene.apache.org/tika/">Apache Tika</a> (see <a href="https://issues.apache.org/jira/browse/TIKA-404">TIKA-404</a>), and it seems like I'm <a href="http://java.sys-con.com/node/46241">not</a> <a href="http://www.mattryall.net/blog/2009/02/the-infamous-turkish-locale-bug">the</a> <a href="http://jira.atlassian.com/browse/CONF-5931">only</a> <a href="https://issues.apache.org/jira/browse/COLLECTIONS-294">one</a>.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com2tag:blogger.com,1999:blog-1873778574973201433.post-55835998175457735192010-03-29T17:31:00.000-07:002014-10-29T18:02:18.818-07:00True size of FinlandWhenever you see a map, the chances are that it uses the <a href="http://en.wikipedia.org/wiki/Mercator_projection">Mercator projection</a>. It's a fine enough projection especially on a local scale, but I've always disliked the way it makes places that are far from the equator seem much larger than they really are. Since I've lived most of my life in Finland (i.e. above 60° N or as high up north as Alaska), I find that this distortion heavily affects my ability to accurately estimate distances in other parts of the world even when I'm well aware of this problem.<br/><br/>To illustrate this issue, I've constructed the below image that shows how Finland compares to Central Europe and Southern China (the areas I'm most interested in) in the Mercator projection and the <a href="http://en.wikipedia.org/wiki/Goode_homolosine_projection">Goode homolosine projection</a> that accurately represents the relative areas of any two places on the earth. The difference is really quite striking:<br/><br/><img class="aligncenter size-full wp-image-296" title="Mercator vs Goode homolosine" src="http://jukkaz.files.wordpress.com/2010/03/size.png" alt="" width="430" height="416" /><br/><br/>I'm considering purchasing a poster with such an <a href="http://en.wikipedia.org/wiki/Map_projection#Equal-area">equal-area</a> world map and hanging it on a wall somewhere I can see it every day. That way I could perhaps overcome the systematic error that the Mercator projection has taught me.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com1tag:blogger.com,1999:blog-1873778574973201433.post-5948376237512371272010-01-29T15:40:00.000-08:002014-10-29T18:02:18.804-07:00The new BASICI'm seeing <a title="Mark Pilgrim: Tinkerer's Sunset" href="http://diveintomark.org/archives/2010/01/29/tinkerers-sunset">many</a> <a title="Tim Bray: Nothing Creative" href="http://www.tbray.org/ongoing/When/201x/2010/01/27/iPad">posts</a> that worry about computing devices like iPhones and the new iPad preventing people from having direct control over the hardware. Mark is telling us about a Ctrl+Reset and a BASIC prompt. Nowadays you get started with the following on an HTML page:<br/><pre> <script type="text/javascript"><br/> document.write("Hello, World!");<br/> </script></pre><br/>And you can do <em>anything</em>! Don't tell me the days of tinkering are over.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com0tag:blogger.com,1999:blog-1873778574973201433.post-37869493081324593082009-12-07T02:17:00.000-08:002014-10-29T18:02:18.789-07:00Daily Shoot, week 3Another week of <a href="http://twitter.com/dailyshoot">@dailyshoot</a>:<br/><br/><p style="text-align:center;"><a href="http://www.flickr.com/photos/jlz/4147320799/"><img class="alignnone" style="vertical-align:middle;" title="Home Alone" src="http://farm3.static.flickr.com/2602/4147320799_99dd07b1e5_m.jpg" alt="" width="160" height="240" /></a> <a href="http:/http://www.flickr.com/photos/jlz/4151305182/"><img class="alignnone" style="vertical-align:middle;" title="My Door" src="http://farm3.static.flickr.com/2598/4151305182_9714bd8a99_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4153338693/"><img class="alignnone" style="vertical-align:middle;" title="Playing with Food" src="http://farm3.static.flickr.com/2508/4153338693_5194ec93ec_m.jpg" alt="" width="160" height="240" /></a> <a href="http://www.flickr.com/photos/jlz/4155752365/"><img class="alignnone" style="vertical-align:middle;" title="Velo" src="http://farm3.static.flickr.com/2575/4155752365_a7d4467902_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4158486726/"><img class="alignnone" style="vertical-align:middle;" title="Walking in the Air" src="http://farm3.static.flickr.com/2627/4158486726_5ebd746f1a_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4161185154/"><img class="alignnone" style="vertical-align:middle;" title="Big Wheel" src="http://farm3.static.flickr.com/2485/4161185154_019562aa99_m.jpg" alt="" width="160" height="240" /></a> <a href="http://www.flickr.com/photos/jlz/4164652474/"><img class="alignnone" style="vertical-align:middle;" title="Recycled" src="http://farm5.static.flickr.com/4003/4164652474_3eaa68c975_m.jpg" alt="" width="240" height="160" /></a></p><br/><br/>PS. Check out the updated <a href="http://dailyshoot.com/">dailyshoot.com</a> web site.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com0tag:blogger.com,1999:blog-1873778574973201433.post-80372578475956039502009-11-29T09:38:00.000-08:002014-10-29T18:02:18.774-07:00Daily Shoot, week 2As I <a href="http://jukkaz.wordpress.com/2009/11/23/daily-shoot-week-1/">mentioned</a> last week, I've been following <a href="http://twitter.com/dailyshoot">@dailyshoot</a> for a series of daily photo assignments. Here's what I shot this week:<br/><br/><p style="text-align:center;"><a href="http://www.flickr.com/photos/jlz/4127988621/"><img class="alignnone" title="Rugged Terrain" src="http://farm3.static.flickr.com/2643/4127988621_1ff9e7f58a_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4131789816/"><img class="alignnone" title="Droplets" src="http://farm3.static.flickr.com/2526/4131789816_bbcb836d58_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4133945303/"><img class="alignnone" title="The Final Frontier" src="http://farm3.static.flickr.com/2501/4133945303_78bcf77558_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4137339865/"><img class="alignnone" title="Tools of Distance" src="http://farm3.static.flickr.com/2789/4137339865_5774d6e308_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4138818696/"><img class="alignnone" title="Our Separate Ways" src="http://farm3.static.flickr.com/2579/4138818696_4caec3921c_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4141391776/"><img class="alignnone" title="Learning to Dance" src="http://farm3.static.flickr.com/2544/4141391776_99f809a30b_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4143462625/"><img class="alignnone" title="The Green Field" src="http://farm3.static.flickr.com/2561/4143462625_f698920c52_m.jpg" alt="" width="240" height="160" /></a></p>Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com0tag:blogger.com,1999:blog-1873778574973201433.post-90119031202181900942009-11-28T14:18:00.000-08:002014-10-29T18:02:18.759-07:00Sling over HTTPA few days ago I <a href="http://jukkaz.wordpress.com/2009/11/24/jackrabbit-over-http/">posted</a> about <a href="http://jackrabbit.apache.org/" title="Apache Jackrabbit">Jackrabbit</a>, and now it's time to follow up with <a href="http://sling.apache.org/" title="Apache Sling">Sling</a> as a means of accessing a <a title="Content Repository over HTTP" href="http://jukkaz.wordpress.com/2009/11/18/content-repository-over-http/">content repository over HTTP</a>. Apache Sling is a web framework based on JCR content repositories like Jackrabbit and among other things it adds some pretty nice ways of accessing manipulating content over HTTP.<br/><br/>The easiest way to get started with Sling is to download the "Sling Standalone Application" from the <a href="http://sling.apache.org/site/downloads.cgi">Sling downloads page</a>. Unpack the distribution package and start the Sling application with "java -jar org.apache.sling.launchpad.app-5-incubator.jar". Like Jackrabbit, Sling can by default be accessed at <a href="http://localhost:8080/">http://localhost:8080/</a>. There's a <a href="http://sling.apache.org/site/discover-sling-in-15-minutes.html">15 minute tutorial</a> that you can check out to learn more about Sling.<br/><br/>Since Sling comes with an embedded Jackrabbit repository, it also supports much of the WebDAV functionality covered in my previous post. Instead of rehashing those points, this post takes a look at the additional HTTP content access features in Sling.<br/><br/><strong>CR1: Create a document</strong><br/><br/>Like with Jackrabbit, all documents in Sling have a path that is used to identify and locate the document. Sling solves the problem of having to come up with the document name by supporting a virtual "star resource" that'll automatically generate a unique name for a new document. Thus instead of having to think of a URL like "http://localhost:8080/hello" in advance, the new document can be created by simply posting to the star resource at "http://localhost:8080/*".<br/><br/>The <a href="http://sling.apache.org/site/manipulating-content-the-slingpostservlet-servletspost.html">Sling POST servlet</a> is a pretty versatile tool, and can be used to perform many content manipulation operations using normal HTTP POST requests and the application/x-www-form-urlencoded format used by normal HTML forms. With the POST servlet, the example document can be created like this:<br/><br/><pre style="margin:20px;">$ curl --data 'title=Hello, World!' --data 'date=2009-11-17T12:00:00.000Z' \<br/> --data 'date@TypeHint=Date' --user admin:admin \<br/> http://localhost:8080/*<br/></pre><br/><br/>The 201 Created response will contain a Location header that points to the newly created document. In this case the returned URL is "http://localhost:8080/hello_world_" based on some document title heuristics included in Sling. If you run the command again you'll get a different URL since the Sling star resource will automatically avoid overwriting existing content.<br/><br/>Pros:<br/><div><br/><ul><br/> <li>A single standard POST request is enough</li><br/> <li>The HTML form format is used for the POST body</li><br/> <li>Automatically generated clean and readable document URL</li><br/></ul><br/></div><br/><div>Cons:</div><br/><div><br/><ul><br/> <li>The star resource URL pattern is fixed and creates an unnecessarily tight binding between the client and the server</li><br/></ul><br/></div><br/><strong>CR2: Read a document</strong><br/><br/>Sling contains multiple ways of accessing the document content in different renderings. In fact much of the power of Sling comes from the extensive support for rendering underlying content in various different and easily customizable ways.<br/><br/>Unfortunately at least the latest 5-incubator version of the Sling Application doesn't support any reasonable default rendering at the previously returned document URL. The client needs to explicitly know to add a ".json" or ".xml" suffix to the document URL to get a JSON or XML rendering of the document.<br/><br/><pre style="margin:30px;">$ curl http://localhost:8080/hello_world_.json<br/>{<br/> "title": "Hello, World!",<br/> "date": "Tue Nov 17 2009 12:00:00 GMT+0100",<br/> "jcr:primaryType": "nt:unstructured"<br/>}<br/>$ curl http://localhost:8080/hello_world_.xml<br/><?xml version="1.0" encoding="UTF-8"?><br/><hello_world_ xmlns:fn="http://www.w3.org/2005/xpath-functions"<br/> xmlns:fn_old="http://www.w3.org/2004/10/xpath-functions"<br/> xmlns:xs="http://www.w3.org/2001/XMLSchema"<br/> xmlns:jcr="http://www.jcp.org/jcr/1.0"<br/> xmlns:mix="http://www.jcp.org/jcr/mix/1.0"<br/> xmlns:sv="http://www.jcp.org/jcr/sv/1.0"<br/> xmlns:sling="http://sling.apache.org/jcr/sling/1.0"<br/> xmlns:rep="internal"<br/> xmlns:nt="http://www.jcp.org/jcr/nt/1.0"<br/> jcr:primaryType="nt:unstructured"<br/> date="2009-11-17T12:00:00.000+01:00"<br/> title="Hello, World!"/><br/></pre><br/><br/>The JCR document view format is used for the XML rendering.<br/><br/>Pros:<br/><ul><br/> <li>A single GET request is enough</li><br/> <li>Both the JSON and XML formats are easy to consume</li><br/></ul><br/>Cons:<br/><ul><br/> <li>Simply GETting the document URL doesn't return anything useful</li><br/> <li>The ".json" and ".xml" URL patterns create an unnecessary binding between the client and the server</li><br/> <li>Neither rendering contains property type information</li><br/> <li>The XML rendering contains unnecessary namespace declarations</li><br/></ul><br/><br/><strong>CR3: Update a document</strong><br/><br/>The Sling POST servlet supports also document updates, so we can just POST the updated properties to the document URL:<br/><br/><pre style="margin:30px;">$ curl --data 'history=Document date updated' \<br/> --data 'date=2009-11-18T12:00:00.000Z' \<br/> --data 'date@TypeHint=Date' --user admin:admin \<br/> http://localhost:8080/hello_world_</pre><br/><br/>Pros:<br/><ul><br/> <li>A single standard POST request is enough</li><br/> <li>The HTML form format is used for the POST body</li><br/></ul><br/>Cons:<br/><ul><br/> <li>None.</li><br/></ul><br/><br/><strong>CR4: Delete a document</strong><br/><br/>You can either use the special ":operation=delete" feature of the Sling POST servlet or a standard DELETE request to delete a document:<br/><br/><pre style="margin:30px;">$ curl --data ':operation=delete' --user admin:admin \<br/> http://localhost:8080/hello_world_<br/>$ curl --request DELETE --user admin:admin \<br/> http://localhost:8080/hello_world_</pre><br/><br/>Pros:<br/><ul><br/> <li>A standard DELETE or POST request is all that's needed</li><br/></ul><br/>Cons:<br/><ul><br/> <li>None.</li><br/></ul>Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com1tag:blogger.com,1999:blog-1873778574973201433.post-29933455687268290302009-11-23T18:25:00.000-08:002014-10-29T18:02:18.635-07:00Jackrabbit over HTTPLast week I <a title="Content Repository over HTTP" href="http://jukkaz.wordpress.com/2009/11/18/content-repository-over-http/">posted</a> a simple set of operations that a "RESTful content repository" should support over HTTP. Here's a quick look at how <a href="http://jackrabbit.apache.org/">Apache Jackrabbit</a> meets this challenge.<br/><br/>To get started I first downloaded the standalone jar file from the <a href="http://jackrabbit.apache.org/downloads.html">Jackrabbit downloads page</a>, and started it with "java -jar jackrabbit-standalone-1.6.0.jar". This is a quick and easy way to get a Jackrabbit repository up and running. Just point your browser to <a href="http://localhost:8080/">http://localhost:8080/</a> to check that the repository is there.<br/><br/>Jackrabbit comes with a built-in advanced WebDAV feature that gives you pretty good control over your content. The root URL for the default workspace is <a href="http://localhost:8080/server/default/jcr:root/">http://localhost:8080/server/default/jcr:root/</a> and by default Jackrabbit grants full write access if you specify any username and password.<br/><br/>Note that Jackrabbit also has another, filesystem-oriented WebDAV feature that you can access at <a href="http://localhost:8080/repository/default/">http://localhost:8080/repository/default/</a>. This entry point is great for dealing with simple things like normal files and folders, but for more fine-grained content you'll want to use the advanced WebDAV feature as outlined below.<br/><br/><strong>CR1: Create a document</strong><br/><br/>All documents (nodes) in Jackrabbit have a pathname just like files in a normal file system. Thus to create a new document, we first need to come up with a name and a location for it. Let's call the example document "hello" and place it at the root of the default workspace, so we can later address it at the path "/hello". The related WebDAV URL is http://localhost:8080/server/default/jcr:root/hello/.<br/><br/>You can use the MKCOL method to create a new node in Jackrabbit. An MKCOL request without a body will create a new empty node, but you can specify the initial contents of the node by including a snippet of JCR system view XML that describes your content. In our case we want to specify the "title" and "date" properties. Note that JCR does not support date-only properties, so we need to store the date value as a more accurate timestamp.<br/><br/>The full request looks like this:<br/><br/><pre style="margin:20px;">$ curl --request MKCOL --data @- --user name:pass \<br/> http://localhost:8080/server/default/jcr:root/hello/ <<END<br/><sv:node sv:name="hello" xmlns:sv="http://www.jcp.org/jcr/sv/1.0"><br/> <sv:property sv:name="message" sv:type="String"><br/> <sv:value>Hello, World!</sv:value><br/> </sv:property><br/> <sv:property sv:name="date" sv:type="Date"><br/> <sv:value>2009-11-17T12:00:00.000Z</sv:value><br/> </sv:property><br/></sv:node><br/>END</pre><br/><br/>The resulting document is available at the URL we already constructed above, i.e. http://localhost:8080/server/default/jcr:root/hello/.<br/><br/>Pros:<br/><div><br/><ul><br/> <li>A single standard WebDAV MKCOL request is enough</li><br/> <li>The standard JCR system view XML format is used for the MKCOL body</li><br/> <li>The XML format is easy to produce</li><br/></ul><br/></div><br/><div>Cons:</div><br/><div><br/><ul><br/> <li>We need to decide the name and location of the document before it can be created</li><br/> <li>The name of the document is duplicated, once in the URL and once in the sv:name attribute</li><br/> <li>The date property must be specified down to the millisecond</li><br/> <li>While standardized, the MKCOL method is not as well known as PUT or POST</li><br/> <li>While standardized, the JCR system view format is not as well known as JSON, Atom or generic XML</li><br/> <li>The system view XML format is quite verbose</li><br/></ul><br/></div><br/><strong>CR2: Read a document</strong><br/><br/>Now that the document is created, we can read it with a standard GET request:<br/><br/><pre style="margin:30px;">$ curl --user name:pass http://localhost:8080/server/default/jcr:root/hello/<br/><?xml version="1.0" encoding="UTF-8"?><br/><sv:node sv:name="hello"<br/> xmlns:fn="http://www.w3.org/2005/xpath-functions"<br/> xmlns:fn_old="http://www.w3.org/2004/10/xpath-functions"<br/> xmlns:xs="http://www.w3.org/2001/XMLSchema"<br/> xmlns:jcr="http://www.jcp.org/jcr/1.0"<br/> xmlns:mix="http://www.jcp.org/jcr/mix/1.0"<br/> xmlns:sv="http://www.jcp.org/jcr/sv/1.0"<br/> xmlns:rep="internal"<br/> xmlns:nt="http://www.jcp.org/jcr/nt/1.0"><br/> <sv:property sv:name="jcr:primaryType" sv:type="Name"><br/> <sv:value>nt:unstructured</sv:value><br/> </sv:property><br/> <sv:property sv:name="date" sv:type="Date"><br/> <sv:value>2009-11-17T12:00:00.000Z</sv:value><br/> </sv:property><br/> <sv:property sv:name="message" sv:type="String"><br/> <sv:value>Hello, World!</sv:value><br/> </sv:property><br/></sv:node></pre><br/><br/>Note that the result includes the standard jcr:primaryType property that is always included in all JCR nodes. Also all namespaces registered in the repository are included even though strictly speaking they add little value to the response.<br/><br/>Pros:<br/><ul><br/> <li>A single GET request is enough</li><br/> <li>The XML format is easy to consume</li><br/></ul><br/>Cons:<br/><ul><br/> <li>The system view format is a bit verbose and generally not that well known</li><br/></ul><br/><br/><strong>CR3: Update a document</strong><br/><br/>The WebDAV feature in Jackrabbit does not support setting multiple properties in a single request, so we need to use separate requests for each property change. The easiest way to update a property is to PUT the new value to the property URL. The only tricky part is that unless the node type explicitly says otherwise the new value is by default stored as a binary stream. You need to specify a custom jcr-value/... content type to override that default.<br/><br/><pre style="margin:30px;">$ curl --request PUT --header "Content-Type: jcr-value/date" \<br/> --data "2009-11-18T12:00:00.000Z" --user name:pass \<br/> http://localhost:8080/server/default/jcr:root/hello/date<br/>$ curl --request PUT --header "Content-Type: jcr-value/string" \<br/> --data "Document date updated" --user name:pass \<br/> http://localhost:8080/server/default/jcr:root/hello/history</pre><br/><br/>GETting the document after these changes will give you the updated property values.<br/><br/>Pros:<br/><ul><br/> <li>Standard PUT requests are used</li><br/> <li>No XML or other wrapper format needed, just send the raw value as the request body</li><br/></ul><br/>Cons:<br/><ul><br/> <li>More than one request needed</li><br/> <li>Need to use non-standard jcr-value/... media types for non-binary values</li><br/></ul><br/><br/><strong>CR4: Delete a document</strong><br/><br/>Deleting a document is easy with the DELETE method:<br/><pre style="margin:30px;">$ curl --request DELETE --user name:pass \<br/> http://localhost:8080/server/default/jcr:root/hello/</pre><br/>That's it. Trying to GET the document after it's been deleted gives a 404 response, just as expected.<br/><br/>Pros:<br/><ul><br/> <li>A standard DELETE request is all that's needed</li><br/></ul><br/>Cons:<br/><ul><br/> <li>None.</li><br/></ul>Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com6tag:blogger.com,1999:blog-1873778574973201433.post-22677095851852781032009-11-22T16:12:00.000-08:002014-10-29T18:02:18.579-07:00Daily Shoot, week 1A week ago <a href="http://duncandavidson.com/">James Duncan Davidson</a> and <a href="http://clarkware.com/about.html">Mike Clark</a> launched <a href="http://twitter.com/dailyshoot">@dailyshoot</a>, a Twitter feed that posts daily photo assignments. The idea is to encourage people who want to learn photography to practice it every day with the help of a simple assignment that fits a single tweet. I'm following Duncan's blog, so I <a title="The Daily Shoot" href="http://blog.duncandavidson.com/2009/11/the-daily-shoot.html">found out</a> about Daily Shoot the day it was launched.<br/><br/>So far I've completed all the assignments and I've already learned quite a bit doing so. It's very interesting to see how other people interpret the same assignments. I avoid looking at other responses before completing an assignment so that I don't end up just copying someone else's approach. Once I'm done I look at what other's have done for some nice insight on what I could have done differently. The process is quite educational.<br/><br/>Here's what I've shot this week:<br/><p style="text-align:center;"><a href="http://www.flickr.com/photos/jlz/4110418300/"><img class="alignnone" title="The Red Desert" src="http://farm3.static.flickr.com/2804/4110418300_4368b4b7dd_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4113281186/"><img class="alignnone" title="Martinsgasse" src="http://farm3.static.flickr.com/2762/4113281186_b80c2b5c50_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4115683738/"><img class="alignnone" title="Into the Unknown" src="http://farm3.static.flickr.com/2750/4115683738_6902197e9a_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4117124465/"><img class="alignnone" title="Satrap" src="http://farm3.static.flickr.com/2591/4117124465_f3250abbb6_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4119973735/"><img class="alignnone" title="Four" src="http://farm3.static.flickr.com/2773/4119973735_af899a27e9_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4122393762/"><img class="alignnone" title="Fier Franken fünfundachtzig" src="http://farm3.static.flickr.com/2622/4122393762_46642d7b60_m.jpg" alt="" width="240" height="160" /></a> <a href="http://www.flickr.com/photos/jlz/4125156460/"><img class="alignnone" title="Der Basler" src="http://farm3.static.flickr.com/2755/4125156460_2976ac927d_m.jpg" alt="" width="240" height="160" /></a></p><br/>You can click on the pictures for more background on each assignment and how I approached it. For more information on Daily Shoot, see the recently launched <a href="http://dailyshoot.com/">website</a>.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com1tag:blogger.com,1999:blog-1873778574973201433.post-17296280485558116292009-11-17T18:45:00.000-08:002014-10-29T18:02:18.436-07:00Content Repository over HTTPTwo weeks ago during the <a href="http://us.apachecon.com/c/acus2009/schedule/barcamp">BarCamp</a> at the <a href="http://us.apachecon.com/c/acus2009/">ApacheCon US</a> I chaired a short session titled "The RESTful Content Repository". The idea of the session was to discuss the various ways that existing content repositories support RESTful access over HTTP and to perhaps find some common ground from which a generic content repository protocol could be formulated.<br/><br/>The <a title="Representational State Transfer" href="http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm">REST architectural style</a> was generally accepted as a useful set of constraints for the architecture of distributed content-based applications, but as an architectural style it doesn't define what the bits on the wire should look like. This is what we set out to define with the <a title="Hypertext Transfer Protocol -- HTTP/1.1" href="http://tools.ietf.org/html/rfc2616">HTTP protocol</a> as a baseline. We didn't get too far, but see below for some collected thoughts and a useful set of "test cases" that I hope to use to further investigate this idea.<br/><br/><strong>Existing solutions</strong><br/><br/>Many existing content repositories and related products already support one or more HTTP-based access patterns: <a href="http://jackrabbit.apache.org/">Apache Jackrabbit</a> exposes two slightly different WebDAV-based access points. <a href="http://sling.apache.org/">Apache Sling</a> adds the <a href="http://sling.apache.org/site/manipulating-content-the-slingpostservlet-servletspost.html">SlingPostServlet</a> and default JSON and XML renderings of content. <a href="http://couchdb.apache.org/">Apache CouchDB</a> uses JSON over HTTP as the primary access protocol. <a href="http://lucene.apache.org/solr/">Apache Solr</a> uses XML over HTTP. <a href="http://www.midgard-project.org/">Midgard</a> doesn't have a built-in HTTP binding for content, but makes it very easy to implement such bindings. This list just scratches the surface...<br/><br/>There are even existing generic protocols that match at least parts of what we wanted to achieve. <a href="http://tools.ietf.org/html/rfc2518">WebDAV</a> has been around for ten years already, but the way it extends HTTP with extra methods makes it harder to use with existing HTTP clients and libraries. The <a href="http://tools.ietf.org/html/rfc5023">AtomPub protocol</a> solves that issue, but being based on the <a href="http://tools.ietf.org/html/rfc4287">Atom format</a> and leaving much of the server behaviour undefined, AtomPub may not be the best solution for generic content repositories.<br/><br/><strong>Content repository operations over HTTP</strong><br/><br/>To better understand the needs and capabilities of existing solutions, we should come up with a simple set of content operations and find out if and how different systems support those operations over HTTP. The most basic such set of operations is CRUD, i.e. how to create, read, update, and delete a document, so let's start with that. I'm giving each operation a key (CRn, as in "Content Repository operation N") and a brief description of what's expected. In later posts I hope to explore how these operations can be implemented with <a href="http://curl.haxx.se/">curl</a> or some other simple HTTP client accessing various kinds of content repositories. I'm also planning to extend the set of required operations to cover features like search, linking, versioning, transactions, etc.<br/><br/><strong>CR1: Create a document</strong><br/><br/>Documents with simple properties like strings and dates are basic building blocks of all content applications. How can I create a new document with the following properties?<br/><ul><br/> <li>title = "Hello, World!" (string)</li><br/> <li>date = 2009-11-17 (date)</li><br/></ul><br/>At the end of this operation I should have a URL that I can use to access the created document.<br/><br/><strong>CR2: Read a document</strong><br/><br/>Given the URL of a document (see CR1), how do I read the properties of that document?<br/><br/>The retrieved property values should match the values given when the document was created.<br/><br/><strong>CR3: Update a document</strong><br/><br/>Given the URL of a document (see CR1), how do update the properties of that document? For example, I want to update the existing date property and add a new string property:<br/><ul><br/> <li>date = 2009-11-18 (date)</li><br/> <li>history = "Document date updated" (string)</li><br/></ul><br/>When the document is read (see CR2) after this update, the retrieved information should contain the original title and the above updated date and history values.<br/><br/><strong>CR4: Delete a document</strong><br/><br/>Given the URL of a document (see CR1), how do I delete that document?<br/><br/>Once deleted, it should no longer be possible to read (see CR2) or update (see CR3) the document.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com12tag:blogger.com,1999:blog-1873778574973201433.post-11549675334735684912009-10-27T04:01:00.000-07:002014-10-29T18:02:18.375-07:00NoSQL interests<a href="http://www.nosqloakland.org/"><img class="alignright size-full wp-image-247" title="NoSQL Oakland" src="http://jukkaz.files.wordpress.com/2009/10/nosqloakland-small.png" alt="NoSQL Oakland" width="203" height="44" /></a>We're organizing a <a href="http://www.nosqloakland.org/">NoSQL meetup in Oakland</a> on Monday next week. In addition to helping set the meetup agenda, the "Topics you are interested in" question in the <a href="http://spreadsheets.google.com/viewform?formkey=dENwRmlTMlhGZ3lfclJqYW9hVGlkTHc6MA">sign up form</a> provides some interesting insight on the current interests of the NoSQL community. Here's a quick breakdown of the key terms distilled from the 88 signups we've received so far.<br/><br/>Note that the data is biased towards Apache projects due to the meetup being organized at <a href="http://us.apachecon.com/c/acus2009/">ApacheCon US 2009</a>.<br/><h2>Projects</h2><br/>The following open source projects were mentioned. The list is in alphabetical order, as the data set is too small to make any reasonable ordering by popularity.<br/><ul><br/> <li><a href="http://incubator.apache.org/cassandra/">Cassandra</a></li><br/> <li><a href="http://couchdb.apache.org/">CouchDB</a></li><br/> <li><a href="http://hadoop.apache.org/">Hadoop</a></li><br/> <li><a href="http://hadoop.apache.org/hbase/">HBase</a></li><br/> <li><a href="http://hadoop.apache.org/hdfs/">HDFS</a></li><br/> <li><a href="http://jackrabbit.apache.org/">Jackrabbit</a></li><br/> <li><a href="http://lucene.apache.org/">Lucene</a></li><br/> <li><a href="http://lucene.apache.org/mahout/"> Mahout</a></li><br/> <li><a href="http://www.danga.com/memcached/">memcached</a></li><br/> <li><a href="http://www.mongodb.org/">MongoDB</a></li><br/> <li><a href="http://code.google.com/p/redis/">Redis</a></li><br/> <li><a href="http://riak.basho.com/">Riak</a></li><br/> <li><a href="http://code.google.com/p/scalaris/">Scalaris</a></li><br/> <li><a href="http://sling.apache.org/">Sling</a></li><br/> <li><a href="http://1978th.net/tokyocabinet/">Tokyo Cabinet</a></li><br/> <li><a href="http://project-voldemort.com/">Voldemort</a></li><br/></ul><br/><h2>Topics</h2><br/>Many responses were about the "big data" aspect of the NoSQL movement. Some frequent keywords: distributed storage, large transactional data, consistency, failover, availability, reliability, stability, failure detection, failed node replacement, (petabyte) scalability, consistency levels, storage technology, performance, benchmarks, optimization, backup and recovery, map/reduce<br/><br/>Another common theme were the various database types and the NoSQL "development model". Keywods: document stores, key/value stores, consistent hashing, graph databases, object databases, persistent queues, content modeling, migration from the relational model, social graphs, streaming, software as a service, offline applications, full text search, natural language processing<br/><br/>Beyond the above big themes, I found it interesting that the following technologies were specifically named: Erlang, Java, WebSimpleDB, WebDAV<br/><br/>In addition to specific topics, many people were asking for case studies or "lessons learned" -type presentations.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com4tag:blogger.com,1999:blog-1873778574973201433.post-85024532675823287282009-10-16T16:39:00.000-07:002014-10-29T18:02:18.318-07:00Putting POI on a dietThe <a href="http://poi.apache.org/">Apache POI</a> team is doing an amazing job at making <a href="http://office.microsoft.com/">Microsoft Office</a> file formats more accessible to the open source Java world. One of the projects that benefits from their work is <a href="http://lucene.apache.org/tika/">Apache Tika</a> that uses POI to extract text content and metadata from all sorts of Office documents.<br/><br/><a href="http://poi.apache.org/"><img class="aligncenter size-full wp-image-240" title="Apache POI" src="http://jukkaz.files.wordpress.com/2009/10/poi.jpg" alt="Apache POI" width="138" height="126" /></a><br/><br/>However, there's one problem with POI that I'd like to see fixed: It's too big.<br/><br/>More specifically, the <a href="http://www.jarvana.com/jarvana/archive-details/org/apache/poi/ooxml-schemas/1.0/ooxml-schemas-1.0.jar">ooxml-schemas jar</a> used by POI for the pre-generated <a title="Apache XMLBeans" href="http://xmlbeans.apache.org/">XMLBeans</a> bindings for the <a title="ECMA-376: Office Open XML File Formats" href="http://www.ecma-international.org/publications/standards/Ecma-376.htm">Office Open XML</a> schemas is taking up over 50% of the 25MB size of the current Tika application. The pie chart below illustrates the relative sizes of the different parser library dependencies of Tika:<br/><br/><img class="aligncenter size-full wp-image-241" title="Relative sizes of Tika parser dependencies" src="http://jukkaz.files.wordpress.com/2009/10/tika-pie.png" alt="Relative sizes of Tika parser dependencies" width="500" height="200" /><br/><br/>Both PDF and the Microsoft Office formats are pretty big and complex, so one can expect the relevant parser libraries to be large. But the 14MB size of the ooxml-schemas jar seems excessive, especially since the standard OOXML schema package from which the ooxml-schemas jar is built is only 220KB in size.<br/><br/>Does anyone have good ideas on how to best trim down this OOXML dependency?Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com3tag:blogger.com,1999:blog-1873778574973201433.post-57233506575589943962009-09-23T11:11:00.000-07:002014-10-29T18:02:18.288-07:00Some graphics work for a changeI've recently spent <a title="JCRSITE-20: Site readability improvements" href="https://issues.apache.org/jira/browse/JCRSITE-20">some</a> <a title="fancy download button" href="http://twitter.com/jukkaz/status/3243226733">effort</a> in improving the look of the <a href="http://jackrabbit.apache.org/">Apache Jackrabbit</a> website. I'm no designer, so the results aren't that great, but it's been a nice break from the regular project work. And I got to brush up my <a title="Adobe Photoshop" href="http://www.adobe.com/products/photoshop/photoshop/">Photoshop</a> and <a title="The GNU Image Manipulation Program" href="http://www.gimp.org/">Gimp</a> skills.<br/><br/>One part of the effort was <a title="JCRSITE-24: Jackrabbit favicon" href="https://issues.apache.org/jira/browse/JCRSITE-24">creating an icon</a> for the site. Previously the site used the feather icon used as the default on all Apache project sites, but I wanted a Jackrabbit-specific icon that helps me to quickly identify and access Jackrabbit pages among the numerous tabs I usually have open in my browser. The work is a good example of incremental improvements in action:<br/><br/><img class="aligncenter size-full wp-image-236" title="Jackrabbit icon steps" src="http://jukkaz.files.wordpress.com/2009/09/jackrabbit-icon-work.png" alt="Jackrabbit icon steps" width="372" height="218" /><br/><br/>I started with a copy of the Jackrabbit logo with nice alpha-layered transparent background. It looked great until I noticed that some browsers lost the smooth alpha layer and instead resulted in a rather badly aliased icon seen above.<br/><br/>The straightforward solution was to add a white background as can be seen in step 2. That worked already pretty well in all browsers.<br/><br/>After a few days of watching the icon I found it a bit too blocky to my taste, so I tried to restore some of the nice transparency effect by rounding the corners a bit. I'm pretty happy with the result.<br/><br/>Of course, if you have design talent and think you can do better, go for it!Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com1tag:blogger.com,1999:blog-1873778574973201433.post-44518701015665571022009-09-19T05:02:00.000-07:002014-10-29T18:02:18.274-07:00Release timeThere's lots of upcoming release activity at the Apache projects I'm more or less involved with:<br/><ul><br/> <li><a href="http://incubator.apache.org/pdfbox/"><img class="alignright" title="Apache PDFBox" src="https://s3.amazonaws.com/bits.ohloh.net/attachments/5981/logo_med.png" alt="" width="64" height="64" /></a>The incubating <a href="http://incubator.apache.org/pdfbox/">Apache PDFBox</a> project is just about to release the <a title="Apache PDFBox status update" href="http://jukkaz.wordpress.com/2009/01/23/apache-pdfbox-status-update/">eagerly anticipated</a> 0.8.0 release. I'm expecting to see the release announcement on Tuesday next week. PDFBox is a Java library for working with PDF documents.</li><br/> <li><a href="http://incubator.apache.org/uima/"><img class="alignright" title="Apache UIMA" src="https://s3.amazonaws.com/bits.ohloh.net/attachments/23695/uima-icon_med.png" alt="" width="64" height="64" /></a>Another incubating project, <a href="http://incubator.apache.org/uima/">Apache UIMA</a>, is <a title="Marshall Schor: Making good progress to 2.3.0 release, next checkpoint at close of next week" href="http://markmail.org/message/p3463oyvgohknxev">working towards</a> the 2.3.0 release. I'm looking forward to seeing both UIMA and PDFBox graduating from the <a href="http://incubator.apache.org/">Apache Incubator</a> shortly after the respective releases. UIMA is a framework and a set of components for analyzing large volumes of unstructured information.</li><br/> <li><a href="http://sling.apache.org/"><img class="alignright" title="Apache Sling" src="https://s3.amazonaws.com/bits.ohloh.net/attachments/8717/sling_med.png" alt="" width="64" height="64" /></a>The <a href="http://sling.apache.org/">Apache Sling</a> project is a component-based project like <a href="http://felix.apache.org/">Apache Felix</a>, so there is no clear project-wide release cycle. Instead Sling is about to start releasing new versions of most of the components changed since the all-inclusive incubator releases. Sling is a JCR-based web framework.</li><br/> <li><a href="http://lucene.apache.org/tika/"><img class="alignright" title="Apache Tika" src="https://s3.amazonaws.com/bits.ohloh.net/attachments/8697/tikaNoText_med.png" alt="" width="64" height="64" /></a><a href="http://lucene.apache.org/tika/">Apache Tika</a> uses PDFBox for extracting text content from PDF documents. I'm hoping to see a Tika 0.5 release soon with the latest PDFBox dependency and the <a title="TIKA-275: Parse context" href="https://issues.apache.org/jira/browse/TIKA-275">design improvements</a> I've been working on. Tika is a toolkit for extracting text and metadata from all kinds of documents.</li><br/> <li><a href="http://lucene.apache.org/solr/"><img class="alignright" title="Apache Solr" src="https://s3.amazonaws.com/bits.ohloh.net/attachments/13297/solr_FC_med.jpg" alt="" width="64" height="35" /></a><a href="http://lucene.apache.org/solr/">Apache Solr</a> is about to enter <a title="Yonik Seeley: solr 1.4 release schedule" href="http://markmail.org/message/6mb442fxjtq3dt6m">code freeze</a> in preparation for the 1.4 release that will include the "<a title="ExtractingRequestHandler" href="http://wiki.apache.org/solr/ExtractingRequestHandler">Solar Cell</a>" feature based on Tika. Solr is a search server based on Lucene.</li><br/> <li>The <a href="http://commons.apache.org/io/">Commons IO</a> project has been upgraded to use Java 5 features and I'm starting to <a title="Jukka Zitting: [io] Towards the 2.0 release" href="http://markmail.org/message/poqipa3yisit53wt">push it</a> towards a 2.0 release. Commons IO is a library of Java IO utilities.</li><br/> <li><a href="http://lucene.apache.org/java/"><img class="alignright" title="Lucene Java" src="https://s3.amazonaws.com/bits.ohloh.net/attachments/23787/lucene_med.png" alt="" width="64" height="64" /></a><a href="http://lucene.apache.org/java/">Lucene Java</a> is <a title="Yonik Seeley: Re: Lucene 2.9 RC4 now available for testing" href="http://markmail.org/message/5vez3yxwhznxaylv">gearing up</a> for the 2.9 release, and will soon <a title="DM Smith: Lucene 3.0 and Java 5 (was Re: Finishing Lucene 2.9)" href="http://markmail.org/message/3v7aj5yhizggoikm">follow up</a> with the 3.0 release. The<a title="LUCENE-1470: Add TrieRangeFilter to contrib" href="https://issues.apache.org/jira/browse/LUCENE-1470"> trie range</a> feature is an especially welcome addition for many use cases. Lucene is a feature-rich high performance search engine.</li><br/> <li><a href="http://jackrabbit.apache.org/"><img class="alignright" title="Apache Jackrabbit" src="https://s3.amazonaws.com/bits.ohloh.net/attachments/2233/jlogo64_med.png" alt="" width="64" height="64" /></a>And last but not least, <a href="http://jackrabbit.apache.org/">Apache Jackrabbit</a> is <a title="Jukka Zitting: Re: Jackrabbit 2.0 release plan" href="http://markmail.org/message/ced7d6kgdnos6atw">getting ready</a> to release the 2.0 version based on the <a title="JSR 283 Final Approval Ballot" href="http://jcp.org/en/jsr/results?id=4979">recently approved</a> JCR 2.0 standard. Jackrabbit is a feature-complete JCR content repository implementation.</li><br/></ul><br/>I'm hoping to see most of these releases happening in time for the <a href="http://us.apachecon.com/c/acus2009/">ApacheCon US 2009</a> conference in early November.Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com0tag:blogger.com,1999:blog-1873778574973201433.post-66140094082991880052009-08-11T09:11:00.000-07:002014-10-29T18:02:18.261-07:00Apache Jackrabbit 1.6.0 releasedThe <a href="http://jackrabbit.apache.org/">Apache Jackrabbit</a> project has just released Jackrabbit version 1.6.0. This release will most likely be the latest JCR 1.0 -based Jackrabbit 1.x minor release before the upcoming Jackrabbit 2.0 and the upgrade to JCR version 2.0. The purpose goal of this release is to push out as many of the recent Jackrabbit trunk improvements as possible so that the number of new things in Jackrabbit 2.0 remains manageable.<br/><p style="text-align:center;"><a href="http://jackrabbit.apache.org/downloads.html"><img class="size-full wp-image-226" title="Download Apache Jackrabbit 1.6.0" src="http://jukkaz.files.wordpress.com/2009/08/jackrabbit-download-1-6-0.png" alt="Download Apache Jackrabbit 1.6.0" width="368" height="95" /></a></p><br/><br/>The most notable changes and new features in this release are:<br/><ul><br/> <li>The RepositoryCopier tool makes it easy to backup and migrate repositories (<a href="https://issues.apache.org/jira/browse/JCR-442">JCR-442</a>). There is also improved support for selectively copying content and version histories between repositories (<a href="https://issues.apache.org/jira/browse/JCR-1972">JCR-1972</a>).</li><br/> <li>A new WebDAV-based JCR remoting layer has been added to complement the existing JCR-RMI layer <a href="https://issues.apache.org/jira/browse/JCR-1877">(JCR-1877</a>, <a href="https://issues.apache.org/jira/browse/JCR-1958">JCR-1958</a>).</li><br/> <li>Query performance has been further optimized (<a href="https://issues.apache.org/jira/browse/JCR-1820">JCR-1820</a>, <a href="https://issues.apache.org/jira/browse/JCR-1855">JCR-1855</a> and <a href="https://issues.apache.org/jira/browse/JCR-2025">JCR-2025</a>).</li><br/> <li>Added support for Ingres and MaxDB/SapDB databases (<a href="https://issues.apache.org/jira/browse/JCR-1960">JCR-1960</a>, <a href="https://issues.apache.org/jira/browse/JCR-1527">JCR-1527</a>).</li><br/> <li>Session.refresh() can now be used to synchronize a cluster node with changes from the other nodes in the cluster (<a href="https://issues.apache.org/jira/browse/JCR-1753">JCR-1753</a>).</li><br/> <li>Unreferenced version histories are now automatically removed once all the contained versions have been removed (<a href="https://issues.apache.org/jira/browse/JCR-134">JCR-134</a>).</li><br/> <li>Standalone components like the JCR-RMI layer and the OCM framework have been moved to a separate <a href="http://jackrabbit.apache.org/commons/">JCR Commons</a> subproject of Jackrabbit, and are not included in this release. Updates to those components will be distributed as separate releases.</li><br/> <li>Development preview: There are even more <a href="http://jcp.org/en/jsr/summary?id=283">JSR 283</a> features in Jackrabbit 1.6 than were included in the 1.5 version. These new features are accessible through special "jsr283" interfaces in the Jackrabbit API. Note that none of these features are ready for production use, and will be replaced with final JCR 2.0 versions in Jackrabbit 2.0.</li><br/></ul><br/>This release is the result of contributions from <a title="Jira contribution report for Jackrabbit 1.6.0" href="https://issues.apache.org/jira/secure/ConfigureReport.jspa?versionId=12313459&issueStatus=all&selectedProjectId=10591&reportKey=com.sourcelabs.jira.plugin.report.contributions:contributionreport&Next=Next">quite a few people</a>. Thanks to everyone involved, this is open source in action!Anonymoushttp://www.blogger.com/profile/06324831355629436046noreply@blogger.com0