the codeface: August 2010

Monday, 9 August 2010

Running RavenDB on Azure

Microsoft’s Azure platform provides excellent data storage facilities in the form of the Windows Azure Storage service, with Table, Blob and Queue stores, and SQL Azure, which is a near-complete SQL Server-as-a-service offering. But one thing it doesn’t provide is a “document database”, in the NoSQL sense of the term.

I saw Captain Codeman’s excellent post on running MongoDB on Azure with CloudDrive, and wondered if Ayende’s new RavenDB database could be run in a similar fashion; specifically, on a Worker role providing persistence for a Web role.

The short answer was, with the current build, no. RavenDB uses the .NET HttpListener class internally, and apparently that class will not work on worker roles, which are restricted to listening on TCP only.

I’m not one to give up that easily, though. I’d already downloaded the source for Raven so I could step-debug through a few other blockers (to do with config, mainly), and I decided to take a look at the HTTP stack. Turns out Ayende, ever the software craftsman, has abstracted his HTTP classes and provided interfaces for them. I forked the project and hacked together implementations of those interfaces built on the TcpListener, with my own HttpContext, HttpRequest and HttpResponse types. It’s still not perfect, but I have an instance running at http://ravenworker.cloudapp.net:8080.

My fork of RavenDB is at http://github.com/markrendle/RavenDB – the Samples solution contains an Azure Cloud Service and a Worker Role project. My additions to the core code are mainly within the Server.Abstractions namespace if you want to poke around.

HOWTO

My technique for getting Raven running on a worker role differ from the commonly-used methods for third-party software. Generally these rely on running applications as a separate process, getting them to listen on a specified TCP port. With Raven, this is unnecessary since it consists of .NET assemblies which can be referenced directly from the worker project, so that’s how I did it:

The role uses an Azure CloudDrive for Raven’s data files. A CloudDrive is a VHD disk image that is held in Blob storage, and can be mounted as a drive within Azure instances.

Mounting a CloudDrive requires some fairly straightforward, boilerplate code:

private void MountCloudDrive()
{
var localCache = RoleEnvironment.GetLocalResource("RavenCache");

CloudDrive.InitializeCache(localCache.RootPath.TrimEnd('\\'), localCache.MaximumSizeInMegabytes);

var ravenDataStorageAccount =
CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("StorageAccount"));
var blobClient = ravenDataStorageAccount.CreateCloudBlobClient();
var ravenDrives = blobClient.GetContainerReference("ravendrives");
ravenDrives.CreateIfNotExist();
var vhdUrl =
blobClient.GetContainerReference("ravendrives").GetPageBlobReference("RavenData.vhd").Uri.ToString();

_ravenDataDrive = ravenDataStorageAccount.CreateCloudDrive(vhdUrl);

try
{
_ravenDataDrive.Create(localCache.MaximumSizeInMegabytes);
}
catch (CloudDriveException ex)
{
// This exception is thrown if the drive exists already, which is fine.
}

_ravenDrivePath = _ravenDataDrive.Mount(localCache.MaximumSizeInMegabytes, DriveMountOptions.Force);
}

(This code has been trimmed for size; the actual code involves more exception handling and logging.)

Once the drive is mounted, we can start the server:

private void StartTheServer()
{
var ravenConfiguration = new RavenConfiguration
{
AnonymousUserAccessMode = AnonymousUserAccessMode.All,
Port = _endPoint.Port,
ListenerProtocol = ListenerProtocol.Tcp,
DataDirectory = _ravenDrivePath
};

_documentDatabase = new DocumentDatabase(ravenConfiguration);
_documentDatabase.SpinBackgroundWorkers();

_ravenHttpServer = new HttpServer(ravenConfiguration, _documentDatabase);
_ravenHttpServer.Start();
}

Again, trimmed for size.

A few points on the configuration properties:

the Port is obtained from the Azure Endpoint, which specifies the internal port that the server should listen on, rather than the external endpoint which will be visible to clients;
I added a new Enum, ListenerProtocol, which tells the server whether to use the Http or Tcp stack;
AnonymousUserAccessMode is set to all. My intended use for this project will only expose the server internally, to other Azure roles, so I have not implemented authentication on the TCP HTTP classes yet;
The DataDirectory is set to the path that the CloudDrive Mount operation returned.

I have to sign a contribution agreement, and do some more extensive testing, but I hope that Ayende is going to pull my TCP changes into the RavenDB trunk so that this deployment model is supported by the official releases. I’ll keep you posted.

Friday, 6 August 2010

Half-truths and sleight of hand

Tweet by Brad Wilson:

“The Simple.Data blog post is built upon half-truths and sleight of hand. Microsoft.Data supports parameters. Your stuff is vuln to concat.”

Apology by me:

I didn’t know that Microsoft.Data supported parameters. Sorry.

And yes, Simple.Data is also vulnerable to SQL injection through concatenated SQL, as is anything that lets developers execute text SQL statements against a database.

However, what I’m trying to do with this project is explore other, non-text-based ways of interacting with a database that are not as complicated as full-blown ORMs.

Thursday, 5 August 2010

Simple.Data with proper types

One of the comments on my previous post asked about working with “real types”. It was something I intended to try, and I had a couple of ideas about how it might work. But one of the issues is that any value returned from a method or property on a dynamic object is also dynamic. So to get, for example, a User object back, the developer would need to use a cast or “as” conversion.

While I was working on the DynamicTable class in the Simple.Data code, I noticed one of the available overrides from the DynamicObject subclass was “TryConvert”. It turns out that this method is called when an implicit or explicit cast is attempted on the object.

ExpandoObject, which is what was being returned by the Find methods, doesn’t implement this method, so I created my own ExpandoObject called DynamicRecord and added the TryConvert override. So now, using the Simple.Data.Database class, you can do this:

var db = Database.Open(); // Connection string in config.
User user = db.Users.FindByName("Bob");

And the dynamic object will be magically implicitly cast to the User type, with compile-time checking and IntelliSense intact.

I’ve only proved this with happy-path tests so far (exact matches for all properties), and obviously it’s going to rely on the database table having column names which match the type, but it’s a very neat solution and extremely new-developer-friendly.

Updated code at http://github.com/markrendle/Simple.Data

Introducing Simple.Data

Update – 06-Aug-2010

This post was inaccurate, in that it failed to acknowledge that Microsoft.Data supports parameters in its text querying. I have made changes to address this inaccuracy.

What is it?

It’s a very lightweight data access framework that I wouldn’t dream of calling an ORM. It uses the dynamic features of Microsoft .NET 4 to provide very expressive ways of retrieving data. It provides non-SQL ways of doing things, inspired by the Ruby ActiveRecord library. It also provides SQL ways of doing things, but with features to protect against SQL injection.

Why?

Because Microsoft have identified a gap in the data access toolkit ecosystem which they think scares casual web developers who are used to environments like PHP. These developers neither want nor need a full-blown ORM like NHibernate or Entity Framework, and they don’t want to deal with the complexity of ADO.NET connections, commands and readers.

Fair enough.

Microsoft have recently released a preview of a new library, Microsoft.Data.dll, which aims to serve these developers. It provides various simple ways of opening up a database, and then lets you run SQL against it, which is how casual developers are used to working with MySQL in the PHP world. They build up their SQL strings and then they execute them.

The problem is, that’s wrong. And I believe that attempting to attract people to your stack by giving them a really easy way to carry on doing things wrong is also wrong. The right thing to do is give them a way to do things right that is as easy as, if not easier than, what they have been using.

Simple.Data is my attempt at that.

Example?

Using Microsoft.Data, you ~~would~~ should query a Users table like this:

var db = Database.Open(); // Connection specified in config.
string sql = "select * from users where name = @0 and password = @1";
var user = db.Query(sql, name, password).FirstOrDefault();

But you could query it like this:

var db = Database.Open(); // Connection specified in config.
string sql = "select * from users where name = '" + name
+ "' and password = '" password + "'";
var user = db.Query(sql).FirstOrDefault();

A lot of people are quite cross about this. One of the main problems they have is that building query strings in this way opens your application up to SQL injection attacks. This is a common pattern in PHP applications written by new developers.

To achieve the same task with Simple.Data, you can do this:

var db = Database.Open(); // Connection specified in config.
var user = db.Users.FindByNameAndPassword(name, password);

That’s pretty neat, right? So, did we have to generate the Database class and a bunch of table classes to make this work?

No.

In this example, the type returned by Database.Open() is dynamic. It doesn’t have a Users property, but when that property is referenced on it, it returns a new instance of a DynamicTable type, again as dynamic. That instance doesn’t actually have a method called FindByNameAndPassword, but when it’s called, it sees “FindBy” at the start of the method, so it pulls apart the rest of the method name, combines it with the arguments, and builds an ADO.NET command which safely encapsulates the name and password values inside parameters. The FindBy* methods will only return one record; there are FindAllBy* methods which return result sets. This approach is used by the Ruby/Rails ActiveRecord library; Ruby’s metaprogramming nature encourages stuff like this.

[A section which wrongly implied that Microsoft.Data did not support parameterized queries has been removed here.]

More examples…

Inserting using named parameters:

db.Users.Insert(Name: "Steve", Password: "Secret", Age: 21);

Executing an update:

db.Execute("update Stock set Current = Current - 1 where ProductId = ?", productId);

(Planned) inserting/updating using an object:

db.Stock.Insert(newStockItem);
db.Stock.UpdateByProductId(stockItem);

Roadmap

This project is more of a constructive criticism of Microsoft’s preview than anything else. I’m going to develop it a bit further (for example, the “planned” syntax support above), mainly as an exercise. One thing I’d maybe like to explore further is whether it can be used as a layer over NoSQL stores as well as RDBMS.

The project is hosted at http://github.com/markrendle/Simple.Data for anybody who wants to download and play with it.

I’d really appreciate feedback, so please do use the comments if you’ve got any criticisms, suggestions or encouragement to express.

the codeface