Tuesday, November 17, 2009

Content Repository over HTTP

Two weeks ago during the BarCamp at the ApacheCon US I chaired a short session titled "The RESTful Content Repository". The idea of the session was to discuss the various ways that existing content repositories support RESTful access over HTTP and to perhaps find some common ground from which a generic content repository protocol could be formulated.

The REST architectural style was generally accepted as a useful set of constraints for the architecture of distributed content-based applications, but as an architectural style it doesn't define what the bits on the wire should look like. This is what we set out to define with the HTTP protocol as a baseline. We didn't get too far, but see below for some collected thoughts and a useful set of "test cases" that I hope to use to further investigate this idea.

Existing solutions

Many existing content repositories and related products already support one or more HTTP-based access patterns: Apache Jackrabbit exposes two slightly different WebDAV-based access points. Apache Sling adds the SlingPostServlet and default JSON and XML renderings of content. Apache CouchDB uses JSON over HTTP as the primary access protocol. Apache Solr uses XML over HTTP. Midgard doesn't have a built-in HTTP binding for content, but makes it very easy to implement such bindings. This list just scratches the surface...

There are even existing generic protocols that match at least parts of what we wanted to achieve. WebDAV has been around for ten years already, but the way it extends HTTP with extra methods makes it harder to use with existing HTTP clients and libraries. The AtomPub protocol solves that issue, but being based on the Atom format and leaving much of the server behaviour undefined, AtomPub may not be the best solution for generic content repositories.

Content repository operations over HTTP

To better understand the needs and capabilities of existing solutions, we should come up with a simple set of content operations and find out if and how different systems support those operations over HTTP. The most basic such set of operations is CRUD, i.e. how to create, read, update, and delete a document, so let's start with that. I'm giving each operation a key (CRn, as in "Content Repository operation N") and a brief description of what's expected. In later posts I hope to explore how these operations can be implemented with curl or some other simple HTTP client accessing various kinds of content repositories. I'm also planning to extend the set of required operations to cover features like search, linking, versioning, transactions, etc.

CR1: Create a document

Documents with simple properties like strings and dates are basic building blocks of all content applications. How can I create a new document with the following properties?

  • title = "Hello, World!" (string)

  • date = 2009-11-17 (date)


At the end of this operation I should have a URL that I can use to access the created document.

CR2: Read a document

Given the URL of a document (see CR1), how do I read the properties of that document?

The retrieved property values should match the values given when the document was created.

CR3: Update a document

Given the URL of a document (see CR1), how do update the properties of that document? For example, I want to update the existing date property and add a new string property:

  • date = 2009-11-18 (date)

  • history = "Document date updated" (string)


When the document is read (see CR2) after this update, the retrieved information should contain the original title and the above updated date and history values.

CR4: Delete a document

Given the URL of a document (see CR1), how do I delete that document?

Once deleted, it should no longer be possible to read (see CR2) or update (see CR3) the document.

12 comments:

  1. CMIS falls within the same category as AtomPub, as it's somewhat bound to a specific domain and leaves server behavior quite open. I'm looking for something that supports a more generic content model with stricter requirements on server behaviour.

    But let's see, perhaps CMIS or AtomPub does support all my test cases sufficiently well or alternatively could be extended to do that. Starting with an existing protocol is certainly better than inventing a new one unless there are clear reasons for doing that.

    ReplyDelete
  2. As far as I can tell, WebDAV works just fine for this use case.

    You say:

    "There are even existing generic protocols that match at least parts of what we wanted to achieve. WebDAV has been around for ten years already, but the way it extends HTTP with extra methods makes it harder to use with existing HTTP clients and libraries. The AtomPub protocol solves that issue, but being based on the Atom format and leaving much of the server behaviour undefined, AtomPub may not be the best solution for generic content repositories."


    AtomPub requires PUT and DELETE. If you have a library that supports those then you *likely* can use WebDAV as well.

    ReplyDelete
  3. Write clients that need things like PUT or DELETE could easily enough do also MKCOL and friends.

    However, PROPFIND (and SEARCH from DASL) is a big problem for many read clients that can only do GET. The most notable case is a user with nothing but a browser and a URL.

    Nothing stops a WebDAV server from exposing much of the PROPFIND (and SEARCH) data also through GET, but that's already going beyond the spec. As with AtomPub and CMIS, that's certainly a valid approach to take but needs more investigation.

    ReplyDelete
  4. The Neutron Protocol is an introspection layer on top of WebDAV. That could again be a good step onwards:

    http://bergie.iki.fi/blog/neutron_protocol-separating_ui_from_the_cms/

    ReplyDelete
  5. Alexander KlimetschekNovember 18, 2009 at 12:20 PM

    Jukka:
    > The most notable case is a user with nothing but a browser and a URL.

    Right. The ultimate goal is to have it work with XML HTTP requests in the browser,
    where only GET and POST are safely supported cross-browser. No PUT, DELETE...
    and not talking about PROPFIND or MKCOL.

    Yes it is ugly from a HTTP perspective, but the possibilities you get when you can
    write clients in an on-demand application based on javascript are just huge.

    ReplyDelete
  6. XHR implementations that do not support extension methods in general violate a SHOULD level requirement in the work-in-progress XHR specification.

    Firefox allows all extension methods, and so does IE when falling back to the ActiveX implementation of XHR; that should cover something like 90% of the users.

    ReplyDelete
  7. [...] General on 2009-11-24 by Jukka Zitting Last week I posted a simple set of operations that a “RESTful content repository” should support over [...]

    ReplyDelete
  8. [...] I posted about Jackrabbit, and now it’s time to follow up with Sling as a means of accessing content repositories over HTTP. Apache Sling is a web framework based on JCR content repositories like Jackrabbit and among other [...]

    ReplyDelete
  9. "However, PROPFIND (and SEARCH from DASL) is a big problem for many read clients that can only do GET. The most notable case is a user with nothing but a browser and a URL."

    Yes, and we should fix that. There's a proposal at http://greenbytes.de/tech/webdav/draft-reschke-http-get-location-latest.html for this...

    ReplyDelete
  10. @Henri Bergius: I like that term Henri, an "introspection protocol." That's very much how I view Atom, in relation to AtomPub. AtomPub defines some RESTy behaviors, Atom defines some ways to view that history, that collection of activities.

    ReplyDelete
  11. Well restricting the HTTP verbs kinda goes against the idea of REST. I also do not feel like it makes sense. There are plenty of browser plugins that also give you access to other verbs than GET/POST. Furthermore many frameworks support emulating other verbs via a _method parameter. And finally there is always curl, which is perfectly suited if you for example want to give a user an example.

    ReplyDelete