Monday, November 23, 2009

Jackrabbit over HTTP

Last week I posted a simple set of operations that a "RESTful content repository" should support over HTTP. Here's a quick look at how Apache Jackrabbit meets this challenge.

To get started I first downloaded the standalone jar file from the Jackrabbit downloads page, and started it with "java -jar jackrabbit-standalone-1.6.0.jar". This is a quick and easy way to get a Jackrabbit repository up and running. Just point your browser to http://localhost:8080/ to check that the repository is there.

Jackrabbit comes with a built-in advanced WebDAV feature that gives you pretty good control over your content. The root URL for the default workspace is http://localhost:8080/server/default/jcr:root/ and by default Jackrabbit grants full write access if you specify any username and password.

Note that Jackrabbit also has another, filesystem-oriented WebDAV feature that you can access at http://localhost:8080/repository/default/. This entry point is great for dealing with simple things like normal files and folders, but for more fine-grained content you'll want to use the advanced WebDAV feature as outlined below.

CR1: Create a document

All documents (nodes) in Jackrabbit have a pathname just like files in a normal file system. Thus to create a new document, we first need to come up with a name and a location for it. Let's call the example document "hello" and place it at the root of the default workspace, so we can later address it at the path "/hello". The related WebDAV URL is http://localhost:8080/server/default/jcr:root/hello/.

You can use the MKCOL method to create a new node in Jackrabbit. An MKCOL request without a body will create a new empty node, but you can specify the initial contents of the node by including a snippet of JCR system view XML that describes your content. In our case we want to specify the "title" and "date" properties. Note that JCR does not support date-only properties, so we need to store the date value as a more accurate timestamp.

The full request looks like this:

$ curl --request MKCOL --data @- --user name:pass \
http://localhost:8080/server/default/jcr:root/hello/ <<END
<sv:node sv:name="hello" xmlns:sv="http://www.jcp.org/jcr/sv/1.0">
<sv:property sv:name="message" sv:type="String">
<sv:value>Hello, World!</sv:value>
</sv:property>
<sv:property sv:name="date" sv:type="Date">
<sv:value>2009-11-17T12:00:00.000Z</sv:value>
</sv:property>
</sv:node>
END


The resulting document is available at the URL we already constructed above, i.e. http://localhost:8080/server/default/jcr:root/hello/.

Pros:


  • A single standard WebDAV MKCOL request is enough

  • The standard JCR system view XML format is used for the MKCOL body

  • The XML format is easy to produce



Cons:



  • We need to decide the name and location of the document before it can be created

  • The name of the document is duplicated, once in the URL and once in the sv:name attribute

  • The date property must be specified down to the millisecond

  • While standardized, the MKCOL method is not as well known as PUT or POST

  • While standardized, the JCR system view format is not as well known as JSON, Atom or generic XML

  • The system view XML format is quite verbose



CR2: Read a document

Now that the document is created, we can read it with a standard GET request:

$ curl --user name:pass http://localhost:8080/server/default/jcr:root/hello/
<?xml version="1.0" encoding="UTF-8"?>
<sv:node sv:name="hello"
xmlns:fn="http://www.w3.org/2005/xpath-functions"
xmlns:fn_old="http://www.w3.org/2004/10/xpath-functions"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:jcr="http://www.jcp.org/jcr/1.0"
xmlns:mix="http://www.jcp.org/jcr/mix/1.0"
xmlns:sv="http://www.jcp.org/jcr/sv/1.0"
xmlns:rep="internal"
xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
<sv:property sv:name="jcr:primaryType" sv:type="Name">
<sv:value>nt:unstructured</sv:value>
</sv:property>
<sv:property sv:name="date" sv:type="Date">
<sv:value>2009-11-17T12:00:00.000Z</sv:value>
</sv:property>
<sv:property sv:name="message" sv:type="String">
<sv:value>Hello, World!</sv:value>
</sv:property>
</sv:node>


Note that the result includes the standard jcr:primaryType property that is always included in all JCR nodes. Also all namespaces registered in the repository are included even though strictly speaking they add little value to the response.

Pros:

  • A single GET request is enough

  • The XML format is easy to consume


Cons:

  • The system view format is a bit verbose and generally not that well known



CR3: Update a document

The WebDAV feature in Jackrabbit does not support setting multiple properties in a single request, so we need to use separate requests for each property change. The easiest way to update a property is to PUT the new value to the property URL. The only tricky part is that unless the node type explicitly says otherwise the new value is by default stored as a binary stream. You need to specify a custom jcr-value/... content type to override that default.

$ curl --request PUT --header "Content-Type: jcr-value/date" \
--data "2009-11-18T12:00:00.000Z" --user name:pass \
http://localhost:8080/server/default/jcr:root/hello/date
$ curl --request PUT --header "Content-Type: jcr-value/string" \
--data "Document date updated" --user name:pass \
http://localhost:8080/server/default/jcr:root/hello/history


GETting the document after these changes will give you the updated property values.

Pros:

  • Standard PUT requests are used

  • No XML or other wrapper format needed, just send the raw value as the request body


Cons:

  • More than one request needed

  • Need to use non-standard jcr-value/... media types for non-binary values



CR4: Delete a document

Deleting a document is easy with the DELETE method:
$ curl --request DELETE --user name:pass \
http://localhost:8080/server/default/jcr:root/hello/

That's it. Trying to GET the document after it's been deleted gives a 404 response, just as expected.

Pros:

  • A standard DELETE request is all that's needed


Cons:

  • None.

6 comments:

  1. I agree that exposing JCR internals over WebDAV is flexible, but cumbersome.

    The right way to address this IMHO is to actually extend WebDAV in a way that more JCR related features become available.

    Such as

    - mapping JCR node types to WebDAV resourcetypes
    - exposing the JCR node type registry as WebDAV REPORT

    If that was done, the message exchanges would become much simpler.

    One more note: the "client needs to pick the name" issue is already being addressed; please see http://greenbytes.de/tech/webdav/draft-reschke-webdav-post-05.html.

    ReplyDelete
  2. [...] General on 2009-11-28 by Jukka Zitting A few days ago I posted about Jackrabbit, and now it’s time to follow up with Sling as a means of accessing a content [...]

    ReplyDelete
  3. That's a nice, succinct explanation of using DAV to access a JCR repository served with Jackrabbit.

    I wonder how the user agent can edit a versioned node? JSR-283 section 15 describes node versioning, which Jackrabbit supports with the VersionManager API, but how does it translate to DAV?

    ReplyDelete
  4. Ugh so I got the email fields backwards, well darn. Waiting for the spam flood, shortly - still a nice article, though.

    ReplyDelete
  5. [I fixed the name/email field mixup above.]

    The WebDAV support in Jackrabbit should cover also versioning, see http://www.ietf.org/rfc/rfc3253.txt.

    ReplyDelete
  6. I have an existing perl application that implemented webdav and we are currently migrating to java and we plan to use Jackrabbit to handle the webdav functionality.
    I was able to successfully implement a server and client and upload and download files.
    My question is how I can access exisitng files that were uploaded in the earlier version.
    Since in this case there would not be any repository mappings .
    I am currently using jackrabbit standalone server.
    Thanks

    ReplyDelete