Azure DocumentDB Primer

Microsoft has just announced Azure DocumentDB, their fully managed document-oriented database. You can read the announcement here.

DocumentDB is similar to MongoDB in that both effectively store JSON documents, provide a rich query API, and have deep JavaScript integration.

Like MongoDB, DocDB is organized into a hierarchy of Databases, Collections, and Documents.

Let’s look at some key differences.

DocumentDB-flavoured SQL

SQL is the query language of choice for many developers. DocumentDB provides a SQL-like query language which includes hierarchical querying and the ability to execute JavaScript.

Simple predicate query:

Sub-query:

Query with user-defined JavaScript function:

Queries can also apply inline JavaScript projections using the evalJS function.

See the Query Using DocumentDB SQL tutorial for more.

Scale-out & High Availability

This is the most important difference in our books. We’ve written before about MongoDB’s replication model. In short, MongoDB was designed for ease-of-use, not horizontal scalability or high availability.

DocDB is architected with High Availability and Scalability as a primary goal. DocDB is built upon an incredibly well battle-tested distributed systems framework which leverages Paxos for distributed consensus. This system allows the DocDB team to focus on building a solid database system rather than the intricacies of distributed systems.

Transactions

DocDB supports ACID compliant transactions. All queries are executed as a transaction. Semantics which aren’t easily conveyed via the query language can be written in JavaScript and executed either directly, as a stored procedure, a User-Defined Function, or as a trigger (see below).

JavaScript Integration

DocDB lets you store JavaScript scripts within collections for later execution. There are three kinds of scripts which can be stored: triggers, stored procedures, and user-defined functions (UDFs). Which type of script to use depends on when you want the script to execute.

Triggers

You can hook into query execution via pre-triggers and post-triggers, which are JavaScript functions which execute before or after a query. Triggers are great for performing validation, accounting, and notification tasks. Triggers execute in the context of the transaction and so throwing an exception from a trigger will abort the transaction.

Stored Procedures

Stored procedures (sprocs) can be executed directly by clients. They are referenced by name and can be passed parameters as well as return results to clients. Of course, sprocs can also perform queries on the collection they reside in.

User-Defined Functions (Custom Query Operators)

User Defined Functions behave like query operators and used during query execution.

Consistency Model

DocDB has a sane tunable consistency model with four modes:

  • Strong: The operation will not return until the query has been made durable.
  • Bounded Staleness: Guarantees the order of propagation of writes, but with reads lagging up to K prefixes behind the writes, where K is the staleness bound.
  • Session: Strong consistency scoped to a single client session. This consistency level is usually sufficient.
  • Eventual: The weakest form of consistency, where reads are of unbounded staleness, but eventually converge.

RESTful API

DocDB provides a RESTful interface over HTTP. Their .NET library is a fairly thin wrapper around the HTTP API. Queries must include an authorization header. If there is interest, I might publish a follow-up post with code samples for generating the authorization token. Reach me on Twitter @reubenbond if you’re interested.

EDIT: there was interest, so here’s a gist with sample code. Note that the date & x-ms-date header are both in RFC 1123 format. In C#, that’s  DateTime.UtcNow.ToString("R") .

Asynchronous LINQ in DocumentDB

If you’re using the .NET API, you’ll likely be interested to know that the DocDB LINQ provider allows for asyncrhonous querying. Simply cast your query to IDocumentQuery<ResultType> and call the ExecuteNextAsync method on it.

For example:

Support for async queries via LINQ is scheduled for the MongoDB C# driver’s 2.0 release.

@reubenbond

EDIT: Thanks to Aravind Ramachandran from the DocDB team for some fixes to the examples above – how sloppy of me :)

EDIT: The DocDB team hosted a nice tutorial here.