the codeface

Monday, 9 August 2010

Running RavenDB on Azure

Microsoft’s Azure platform provides excellent data storage facilities in the form of the Windows Azure Storage service, with Table, Blob and Queue stores, and SQL Azure, which is a near-complete SQL Server-as-a-service offering. But one thing it doesn’t provide is a “document database”, in the NoSQL sense of the term.

I saw Captain Codeman’s excellent post on running MongoDB on Azure with CloudDrive, and wondered if Ayende’s new RavenDB database could be run in a similar fashion; specifically, on a Worker role providing persistence for a Web role.

The short answer was, with the current build, no. RavenDB uses the .NET HttpListener class internally, and apparently that class will not work on worker roles, which are restricted to listening on TCP only.

I’m not one to give up that easily, though. I’d already downloaded the source for Raven so I could step-debug through a few other blockers (to do with config, mainly), and I decided to take a look at the HTTP stack. Turns out Ayende, ever the software craftsman, has abstracted his HTTP classes and provided interfaces for them. I forked the project and hacked together implementations of those interfaces built on the TcpListener, with my own HttpContext, HttpRequest and HttpResponse types. It’s still not perfect, but I have an instance running at http://ravenworker.cloudapp.net:8080.

My fork of RavenDB is at http://github.com/markrendle/RavenDB – the Samples solution contains an Azure Cloud Service and a Worker Role project. My additions to the core code are mainly within the Server.Abstractions namespace if you want to poke around.

HOWTO

My technique for getting Raven running on a worker role differ from the commonly-used methods for third-party software. Generally these rely on running applications as a separate process, getting them to listen on a specified TCP port. With Raven, this is unnecessary since it consists of .NET assemblies which can be referenced directly from the worker project, so that’s how I did it:

image

The role uses an Azure CloudDrive for Raven’s data files. A CloudDrive is a VHD disk image that is held in Blob storage, and can be mounted as a drive within Azure instances.

Mounting a CloudDrive requires some fairly straightforward, boilerplate code:

private void MountCloudDrive()
{
var localCache = RoleEnvironment.GetLocalResource("RavenCache");

CloudDrive.InitializeCache(localCache.RootPath.TrimEnd('\\'), localCache.MaximumSizeInMegabytes);

var ravenDataStorageAccount =
CloudStorageAccount.Parse(RoleEnvironment.GetConfigurationSettingValue("StorageAccount"));
var blobClient = ravenDataStorageAccount.CreateCloudBlobClient();
var ravenDrives = blobClient.GetContainerReference("ravendrives");
ravenDrives.CreateIfNotExist();
var vhdUrl =
blobClient.GetContainerReference("ravendrives").GetPageBlobReference("RavenData.vhd").Uri.ToString();

_ravenDataDrive = ravenDataStorageAccount.CreateCloudDrive(vhdUrl);

try
{
_ravenDataDrive.Create(localCache.MaximumSizeInMegabytes);
}
catch (CloudDriveException ex)
{
// This exception is thrown if the drive exists already, which is fine.
}

_ravenDrivePath = _ravenDataDrive.Mount(localCache.MaximumSizeInMegabytes, DriveMountOptions.Force);
}

(This code has been trimmed for size; the actual code involves more exception handling and logging.)

Once the drive is mounted, we can start the server:

private void StartTheServer()
{
var ravenConfiguration = new RavenConfiguration
{
AnonymousUserAccessMode = AnonymousUserAccessMode.All,
Port = _endPoint.Port,
ListenerProtocol = ListenerProtocol.Tcp,
DataDirectory = _ravenDrivePath
};

_documentDatabase = new DocumentDatabase(ravenConfiguration);
_documentDatabase.SpinBackgroundWorkers();

_ravenHttpServer = new HttpServer(ravenConfiguration, _documentDatabase);
_ravenHttpServer.Start();
}

Again, trimmed for size.

A few points on the configuration properties:

  • the Port is obtained from the Azure Endpoint, which specifies the internal port that the server should listen on, rather than the external endpoint which will be visible to clients;
  • I added a new Enum, ListenerProtocol, which tells the server whether to use the Http or Tcp stack;
  • AnonymousUserAccessMode is set to all. My intended use for this project will only expose the server internally, to other Azure roles, so I have not implemented authentication on the TCP HTTP classes yet;
  • The DataDirectory is set to the path that the CloudDrive Mount operation returned.

I have to sign a contribution agreement, and do some more extensive testing, but I hope that Ayende is going to pull my TCP changes into the RavenDB trunk so that this deployment model is supported by the official releases. I’ll keep you posted.

Friday, 6 August 2010

Half-truths and sleight of hand

Tweet by Brad Wilson:

“The Simple.Data blog post is built upon half-truths and sleight of hand. Microsoft.Data supports parameters. Your stuff is vuln to concat.”

Apology by me:

I didn’t know that Microsoft.Data supported parameters. Sorry.

And yes, Simple.Data is also vulnerable to SQL injection through concatenated SQL, as is anything that lets developers execute text SQL statements against a database.

However, what I’m trying to do with this project is explore other, non-text-based ways of interacting with a database that are not as complicated as full-blown ORMs.

Thursday, 5 August 2010

Simple.Data with proper types

One of the comments on my previous post asked about working with “real types”. It was something I intended to try, and I had a couple of ideas about how it might work. But one of the issues is that any value returned from a method or property on a dynamic object is also dynamic. So to get, for example, a User object back, the developer would need to use a cast or “as” conversion.

While I was working on the DynamicTable class in the Simple.Data code, I noticed one of the available overrides from the DynamicObject subclass was “TryConvert”. It turns out that this method is called when an implicit or explicit cast is attempted on the object.

ExpandoObject, which is what was being returned by the Find methods, doesn’t implement this method, so I created my own ExpandoObject called DynamicRecord and added the TryConvert override. So now, using the Simple.Data.Database class, you can do this:

var db = Database.Open(); // Connection string in config.
User user = db.Users.FindByName("Bob");

And the dynamic object will be magically implicitly cast to the User type, with compile-time checking and IntelliSense intact.

I’ve only proved this with happy-path tests so far (exact matches for all properties), and obviously it’s going to rely on the database table having column names which match the type, but it’s a very neat solution and extremely new-developer-friendly.

Updated code at http://github.com/markrendle/Simple.Data

Introducing Simple.Data

Update – 06-Aug-2010

This post was inaccurate, in that it failed to acknowledge that Microsoft.Data supports parameters in its text querying. I have made changes to address this inaccuracy.

What is it?

It’s a very lightweight data access framework that I wouldn’t dream of calling an ORM. It uses the dynamic features of Microsoft .NET 4 to provide very expressive ways of retrieving data. It provides non-SQL ways of doing things, inspired by the Ruby ActiveRecord library. It also provides SQL ways of doing things, but with features to protect against SQL injection.

Why?

Because Microsoft have identified a gap in the data access toolkit ecosystem which they think scares casual web developers who are used to environments like PHP. These developers neither want nor need a full-blown ORM like NHibernate or Entity Framework, and they don’t want to deal with the complexity of ADO.NET connections, commands and readers.

Fair enough.

Microsoft have recently released a preview of a new library, Microsoft.Data.dll, which aims to serve these developers. It provides various simple ways of opening up a database, and then lets you run SQL against it, which is how casual developers are used to working with MySQL in the PHP world. They build up their SQL strings and then they execute them.

The problem is, that’s wrong. And I believe that attempting to attract people to your stack by giving them a really easy way to carry on doing things wrong is also wrong. The right thing to do is give them a way to do things right that is as easy as, if not easier than, what they have been using.

Simple.Data is my attempt at that.

Example?

Using Microsoft.Data, you would should query a Users table like this:

var db = Database.Open(); // Connection specified in config.
string sql = "select * from users where name = @0 and password = @1";
var user = db.Query(sql, name, password).FirstOrDefault();

But you could query it like this:

var db = Database.Open(); // Connection specified in config.
string sql = "select * from users where name = '" + name
+ "' and password = '" password + "'";
var user = db.Query(sql).FirstOrDefault();

A lot of people are quite cross about this. One of the main problems they have is that building query strings in this way opens your application up to SQL injection attacks. This is a common pattern in PHP applications written by new developers.

To achieve the same task with Simple.Data, you can do this:

var db = Database.Open(); // Connection specified in config.
var user = db.Users.FindByNameAndPassword(name, password);

That’s pretty neat, right? So, did we have to generate the Database class and a bunch of table classes to make this work?

No.

In this example, the type returned by Database.Open() is dynamic. It doesn’t have a Users property, but when that property is referenced on it, it returns a new instance of a DynamicTable type, again as dynamic. That instance doesn’t actually have a method called FindByNameAndPassword, but when it’s called, it sees “FindBy” at the start of the method, so it pulls apart the rest of the method name, combines it with the arguments, and builds an ADO.NET command which safely encapsulates the name and password values inside parameters. The FindBy* methods will only return one record; there are FindAllBy* methods which return result sets. This approach is used by the Ruby/Rails ActiveRecord library; Ruby’s metaprogramming nature encourages stuff like this.

[A section which wrongly implied that Microsoft.Data did not support parameterized queries has been removed here.]

More examples…

Inserting using named parameters:

db.Users.Insert(Name: "Steve", Password: "Secret", Age: 21);

Executing an update:

db.Execute("update Stock set Current = Current - 1 where ProductId = ?", productId);

(Planned) inserting/updating using an object:

db.Stock.Insert(newStockItem);
db.Stock.UpdateByProductId(stockItem);

Roadmap

This project is more of a constructive criticism of Microsoft’s preview than anything else. I’m going to develop it a bit further (for example, the “planned” syntax support above), mainly as an exercise. One thing I’d maybe like to explore further is whether it can be used as a layer over NoSQL stores as well as RDBMS.

The project is hosted at http://github.com/markrendle/Simple.Data for anybody who wants to download and play with it.

I’d really appreciate feedback, so please do use the comments if you’ve got any criticisms, suggestions or encouragement to express.

Thursday, 11 March 2010

RDBMS vs NoSQL

(Yes, again.)

There's an awful lot of buzz around new-fangled "NoSQL" or "non-relational" databases at the moment. I'm hearing a lot of people asking why they'd choose one of these new-fangled storage models over the relational databases they know and (mostly) like. This post aims to shed some light on the choices available, and to maybe give some guidance if you're trying to make a decision.

Let's define some terms

I find it always helps to make sure you're talking about the same thing if you think you might be arguing.

Relational database management system (RDBMS)

When I say "relational database", I'm referring to an application or service which stores information in tables. Tables are divided into columns, and columns have a type, such as integer, date, text etc. The data held in the tables is divided into rows, and each row contains a value (possibly including NULL) for each of the columns. The RDBMS can maintain and enforce "master-detail" relationships between rows in different tables using primary and foreign keys.

The big commercial relational database systems are Microsoft SQL Server and Oracle; the main open-source alternatives are MySQL, PostgreSQL and SQLite.

New-fangled data stores (NoSQL)

I'm going to use the term NoSQL to cover everything else, which is a massive generalisation and therefore inevitably inaccurate (several new-fangled data stores actually support a SQL-like syntax). However, in the interests of a bit of clarity, I'll try to outline some of the distinct types here, although the distinctions can be a bit vague.

Key/Value Stores are the most basic NoSQL databases, and are essentially hashes with disk persistence. In some cases the value is a single binary or text object; in others it is a tuple with named properties, sometimes with searching supported against those properties. These solutions are schema-less: entities within a collection, domain or table may have completely different properties. Microsoft's Azure Table Storage and Amazon's SimpleDB are both cloud-based, pay-as-you-go Key/Value stores. Open-source implementations include BerkeleyDB, MemcacheDB, Redis and TokyoCabinet.

Wide Column, Tabular or Column Family Stores are scalable, distributed systems that support a defined schema for each "table", but not relationships between tables. Google's BigTable database, which is the storage solution for their AppEngine cloud platform, is a Wide Column store. Open-source alternatives include Cassandra and Hypertable.

Document Stores work with data structured using Object Notation, usually JSON or BSON. Not to be confused with SharePoint, Domino etc. This format allows complex data structures, including arrays and hashes, to be represented within a single entity that can be retrieved in one read operation; they also often support complex update operations such as adding elements to an array within a document. Some popular Document Stores are CouchDB, MongoDB and Riak.

Object Databases are superficially similar to Document Stores, but have predefined schema and inheritance hierarchies. In an Object Database, graphs are persisted using pointers between objects rather than by embedding graphs within other graphs. The most popular Object Database is db4o.

So which is best?

<SocraticDialectic>

Hello.

Hello.

Right, so. Why should I use any of these new-fangled "stores" instead of a relational database?

Mu.

Beg pardon?

I said "mu". I keep hearing this question, and I submit that the answer is "mu", which means "your question cannot be answered because it depends on incorrect assumptions."

What incorrect assumption is that?

The assumption that the relational database is axiomatically the correct storage solution in all situations, and that the onus is on all other options to prove that they are better.

Is that not right then?

No, it isn't. Have you heard of doing the simplest thing that could possibly work? Or YAGNI? It's possible to add unnecessary features to a system without coding them yourself. The first thing you should rule out as a storage medium is the file-system.

What, text files? Isn't that a bit 1980s?

Not at all. Let me ask you a question: what requirements or properties of your application preclude using one or more flat text files to persist your data?

I need structured data.

What's wrong with tab-delimited, CSV or even fixed-length field files?

I need master-detail structured data.

In which case, you could use XML, YAML or even JSON. Let's say we're using XML, even though it's quite the worst of those three options.

I need details with many masters.

You mean lookups? Why on earth would you want to use those?

To save space.

Right, because storage space is so expensive these days. You can buy a 2-terabyte hard disk and get change from £120 (or $150US). Keeping a gigabyte in the cloud'll cost you 15 cents a month.

OK, I don't need to save space. But I still need one copy of each piece of data, so I only have to update it in one place.

That's fine, but does that need to be supported by your persistence layer? Are you going to re-read those lookup descriptions every time you pull a record? Add multiple joins to every SELECT statement just to read back those varchar(40) descriptions?

As opposed to…?

As opposed to reading those lookups into memory once the first time they're needed, and maybe refreshing them in background every now and then if you're in a multi-user smart-client system and they're particularly dynamic, which they're probably not. How often does somebody add a new country to the system? Or a new type of customer, or method of payment, or whatever?

OK. I accept that lookups are not a valid reason to use a relational database.

Good. So, next requirement or property?

I need many-to-many relationships. For example, in my library application, each customer can borrow many books, and each book (over time) can be borrowed by many customers.

Well, we're still using XML, so you can have a list of child elements in your customer record detailing the books they've borrowed, and a list of child elements in the book record detailing the customers who have borrowed it.

So when somebody borrows a book, I have to update two records? That's bad, isn't it?

I don't know; it's your application. Do both records have to get updated in the same split second? Is it likely that another user will do a read that could get inaccurate information while the two updates are happening? Or is it enough that the customer and book records should be eventually consistent (for a given value of eventually)?

But then I'd have to read a bunch of files individually to get the details of those books, or the names of those customers.

Not at all.

Well how would you display sensible data on the screen? Are you going to cache the entire book catalogue and customer list in memory as well?

It's an option. But no, I would probably just decide which details about the book I need in the customer views, and which details about the customer I need in the book views, and then I'd include those details in the child records.

You'd duplicate the data?

Oh, yes.

[Shocked gasps from enthralled onlookers]

You're mad.

Not a bit of it. Books don't change their titles, or authors. Customers don't change their names very often; certainly not so often that it's beyond reason to bulk-update a bunch of book records when they do.

So you just duplicate the data?

Absolutely. Saves a whole bunch of extra disk reads, and those join tables you keep using in many-to-many relationships, well, let's be honest: they're a kludge, and they're one of the most common causes of performance issues due to volume of data.

I remain unconvinced.

OK. Let's take your library example. Let's say your library is in a particularly literate part of the world and has 10,000 active customers, and 100,000 books. And let's say that each customer borrows two books a week. After one year, your many-to-many join table has 1,000,000 rows in it. Meanwhile, each of my customers' records have 100 book record keys, titles, author names, publishers, and loan start- and end-dates, and each of my book records have, on average, 10 customer record keys, names, maybe addresses, and loan start- and end-dates. In storage terms, let's say you've used probably 32 bytes per row (two 32-bit keys and two 64-bit date/times, plus overhead), so that's 32MB, plus the same again for indexing. Meanwhile, let's just hazard a guess and say I've used a whole gigabyte with all my duplicated data.

Exactly. Look at all that storage you're wasting.

My goodness, you're right! That's 0.1% of a £120 hard-disk I've just frittered away! Or nearly $2 a year for the cloud storage.

There's no need to be sarcastic.

I wasn't; I was being ironic. Anyway, my point is, pulling a customer record and displaying the complete list of books they've borrowed for me is a single read operation; for you it's a three-table join with one of those tables containing a million records for each year your software's been in use.

I-

Actually, scratch that.

Really?

Yes, I was wrong.

I thought you were.

I forgot that you'll have normalised out your book catalogue into a book table with a many-to-many join to an author table, and another many-to-many join to a publisher table, so you've actually got seven tables totalling a few million rows that you need to reference to get the same information that I can get in a single read operation. How's your EXPLAIN output looking, by the way?

I'm not sure. What does DEPENDENT SUBQUERY mean?

No idea, but it sounds nasty. You can fix it later. You're supposed to be giving me reasons you need a relational database.

So I am. OK, well, I need to search the data.

Aha! Very good. Much closer. That would be a reason for a database management system, but not necessarily a relational one. There are lots of NoSQL solutions that will let you run queries against your datasets, with varying degrees of performance. I've written queries in MongoDB that run against massive datasets so fast it'd make your eyes bleed.

[Suddenly, a heavenly light shines down upon the pupil. Or some scales fall from his eyes.]

I've been such a fool! I drank the relational database kool-aid and no mistake! Thank you for showing me the error of my ways. I'm going NoSQL all the way for my next project.

What? No, wait, come back.

Why? I'm convinced. You made your case.

No, I haven't. I've just stopped you from being one kind of wrong and you've immediately started being another, completely different, but equally bad kind of wrong.

No need to be rude.

Sorry. But flat text files and  NoSQL databases aren't the solution to every persistence problem. There are perfectly good reasons to use a SQL database.

Such as?

You might need the whole ACID thing: Atomicity, Consistency, Isolation and Durability. If your system deals with financial data, or medical records, or other sensitive information, you're going to want a robust transaction system.

True.

And if you've got a lot of concurrent users who might try to update the same bit of data at the same time, like an entry in a Stock table in an e-commerce system, then isolation and locking could be very handy.

And NoSQL systems can't do ACID and locking and so on?

Oh, they can, one way or another. But a lot of the time you end up writing extra code in your data access layer to do stuff that a decent RDBMS will do for you. And if you've got a room-full of developers who are dead good with SQL Server or Entity Framework or whatever, why force them to learn a new technology?

I would have a hard time justifying the time and money required for our developers to learn the new skills.

Then again, in the cloud the NoSQL storage can be much, much cheaper than the RDBMS. For example, Windows Azure storage costs $0.15 per gigabyte per month, whereas a 1GB SQL Azure instance is $9.99 per month; 10GB is $99.99 per month.

So Windows Azure storage is cheaper than SQL Azure storage?

Not necessarily. With Windows Azure Storage you also pay for transactions, at the rate of $0.01 for every ten thousand. SQL Azure doesn't charge for transactions. So if you're storing less than a gigabyte but running more than ten million transactions per month, SQL Azure would be cheaper. Or if you're storing less than ten gigabytes but running more than a hundred million transactions-

Stop! I'm all confused. Please, just tell me, what should I use?

You want a definite answer?

Yes.

OK.

There really is one?

There really is one.

And are you going to tell me what it is?

I am.

Now?

Now.

Wow.

Although I don't think you're going to like it*.

It doesn't matter! I must know!

Now?

Yes, now!

Alright.

Well?

You're really not going to like it.

Tell me!

The answer is…

Yes?

The answer is: It All Depends.

I want the last 5 minutes of my life back.

5 minutes? I've spent hours writing this.

</SocraticDialectic>

Anyway, the point is…

…every system is different. Which persistence paradigm is best depends on the project and its requirements and properties. All I want to say is that you shouldn't start with RDBMS and then look for reasons not to use it; at the same time, don't start with Project Voldemort or CouchDB just because they're new and exciting. Approach every project with no preconceptions about what you're going to use; or have "flat text files" as the default choice.

A non-exhaustive list of things to consider:

  • If constant consistency is a key requirement then you should probably use an RDBMS.
  • If horizontal scalability is important and eventual consistency is good enough, then a NoSQL solution might be better.
  • If you're deploying an application to Windows Azure or Amazon Web Services, then you should probably compare the costs of Windows Azure Storage Services or SimpleDB to SQL Azure or Elastic RDBMS for your particular application.
  • If you're writing an application for a mobile platform, you're back to limited storage space so you might get back to RDBMS and normalising for efficient use of that space.
  • If you're writing a small application you want to knock out quickly using Entity Framework or NHibernate or ActiveRecord then use SQL Server Compact or SQLite, or skip that stuff altogether and serialise object graphs to files on the hard-disk.

In fact, if you really design your architecture with an open mind (or beginner's mind), you'll come up with a combination of technologies and paradigms that properly meet different parts of the requirements; maybe use SQL Server 2008 R2 for stock control, invoicing and geo-spatial search operations, but Azure Table Storage for auditing, text files for logging. And why not evaluate distributed Document Stores for the awesome customer review and discussion features that you're adding in phase two?

Most important of all, before making any decisions, take the time to do some apples-to-apples comparisons of potential solutions, with use cases,  data volumes and loads that are representative of your system. I recently chose MongoDB for a personal project, but I started out comparing SQL Server, MySQL and PostgreSQL and finding them all wanting, which is how I ended up looking at NoSQL databases in the first place.

(Just for the record, the most important query and most frequently-run query for the application took nearly 15 seconds on the fastest relational database (even after some violent denormalization); MongoDB can retrieve the same information in under half a second.)

Further reading

A comprehensive – and very balanced – list of articles on NoSQL databases: http://nosql-database.org/links.html

 

*All apologies to Douglas Adams.

Tuesday, 9 March 2010

XmlElementAsDictionary: A very simple wrapper for LINQ to XML

Why?

I've been forced into engaging with XML in all its gory details recently, as part of my AzureKit project (more on which in another post). Being a big fan of all things LINQ, I started out trying to use the LINQ-to-XML framework. I had problems. This is not a fault of the framework; as a fundamental part of the .NET Framework it must provide a way to achieve everything, and that's not the same as providing a way to do things simply.
The REST API for Windows Azure uses a hybrid of the Atom syndication format and the ADO.NET Data Services format. Here is the XML required for an Insert to an Azure Table:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<entry xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices"
xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"
xmlns="http://www.w3.org/2005/Atom">
<title />
<updated>2008-09-18T23:46:19.3857256Z</updated>
<author><name /></author>
<id />
<content type="application/xml">
<m:properties>
<d:Address>Mountain View</d:Address>
<d:PartitionKey>mypartitionkey</d:PartitionKey>
<d:RowKey>myrowkey1</d:RowKey>
<d:Timestamp m:type="Edm.DateTime">0001-01-01T00:00:00</d:Timestamp>
</m:properties>
</content>
</entry>

I can't post the LINQ-to-XML code you need to use to create that small document, because I deleted it. Then I formatted the hard drive that it had been on. Then I burned it, just to be on the safe side. The code was not clean. The code was feculent.

The quickest example that I can give is that to use a prefixed namespace, you don't use new XElement("m:properties"); you use new XElement(parentElement.GetNamespaceForPrefix("m") + "properties"). Seriously. I'm just not doing that.

If you don't like an interface, wrap it

For these purposes, an XML tree is a collection of named elements, some of which have values, some of which have sub-elements, and each of which may have a number of attributes.

More importantly a single Atom entry does not involve "lists"; that is, there are not multiple children with the same name within a parent element, like this:

<numbers>
<number>One</number>
<number>Two</number>
</numbers>

In fact, an Azure Atom entry is a lot like a dictionary, except with nesting.

I like dictionaries. In some other languages, I like hashes. But we're talking about .NET and C# here, so I like dictionaries.

XmlElementAsDictionary

I've created a very simple class which wraps around an XElement object and provides access to that element's children, attributes and value using the Dictionary paradigm. Into this class I have baked support for namespaces which does not violate the Principle of Least Astonishment.

Using this class to create the Azure/Atom XML element above looks like this:

var entry = new XmlElementAsDictionary("entry", "http://www.w3.org/2005/Atom");

entry.AddPrefixedNamespace("d", "http://schemas.microsoft.com/ado/2007/08/dataservices");
entry.AddPrefixedNamespace("m", "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata");

entry["title"].Clear();
entry["updated"].Value = DateTime.Now.ToString("s");
entry["author"]["name"].Clear();
entry["id"].Clear();
entry["content"].Attributes["type"] = "application/xml";
entry["content"]["m:properties"]["d:Address"].Value = "Mountain View";
entry["content"]["m:properties"]["d:PartitionKey"].Value = "mypartitionkey";
entry["content"]["m:properties"]["d:RowKey"].Value = "myrowkey1";
entry["content"]["m:properties"]["d:Timestamp"].Value = DateTime.MinValue.ToString("s");
entry["content"]["m:properties"]["d:Timestamp"].Attributes["m:type"] = "Edm.DateTime";

The class will also accept an XElement to its constructor, and so can be used for reading XML in this manner as well as writing.

Once complete, the element can be converted back into XML using ToElement() or ToDocument(), or to plain-text using ToString() (which gives the same result as calling ToElement().ToString().

Limitations

As specified, this approach also applies the Dictionary limitation of unique keys, so you can't have lists. But you can construct elements using this wrapper and then compose them into lists.

Availability

I'm collecting various helpful utility classes that get created in support of other projects together in an open-source project called NExtLib. You can download the source from Codeplex. Releases will happen in sync with other, higher-level projects, including AzureKit, the first version of which I'm releasing later this month and hope to present on at user groups in the near future.

Feedback

I'd love to hear any thoughts, ideas for improvement, criticisms, abuse or death threats.

Thursday, 28 January 2010

[meta: new blog location]

My regular reader (hello Mike) may have noticed that I've moved to Blogger; it's part of a stream-lining exercise regarding hosting packages, which were mounting up. Hopefully I'll retrieve the older content and re-upload it soon.

Followers