Skip to content

Your Web Service Might Not Be RESTful If…

The other day, I gave a brief talk about our HTTP Library, Resourceful. After a few minutes of going over the features, it became apparent to me that very few people have taken the time to appreciate the finer points of HTTP. Everyone who calls themself a web application developer needs to take a few hours to read RFC2616: Hypertext Transfer Protocol — HTTP/1.1. Its not very long, and increadibly readable for a spec. Print it out, and read a few sections when you go for your morning “reading library” break. Unfortunately, a great many people got confused by it, and ended up reimplementing a lot of http in another layer, and thats how we ended up with SOAP and XML-RPC. There’s a good parable about how this all went of the rails for awhile, until some people re-discovered a section in Roy T. Fielding’s disseration, “Representational State Transfer (REST)“.

Needless to say, REST is making a huge comeback, at least in the agile startup communities. It’s fast, lightweight, and easy to put together. Ruby on Rails even has excellent support for getting up and running quicky. Sadly, though, it’s not quite right, and as a result, developers have misconstrued REST yet again, and its making things harder than they really need to be, and also leading them down a path that leads to lots of headaches in the future. If you’re interested in learning more about REST, there’s plenty of excellent resources on the REST Wiki, particularly REST In Plain English.

For some of my examples, I’m going to pick on the Pivotal Tracker “RESTful” API. Sorry guys, I needed to pick someone, and I love your product (I use it every day), but you’re part of the reason for this post. I wanted to write a client for your service, but its really much harder than it needs to be. The service violates many of the constraints of REST, and therefore naming it “RESTful” is incorrect. You’re not the only ones, though, so don’t feel bad, nearly EVERY API that claims to be RESTful isn’t. For a look at one that gets it (mostly) right, check out Netflix.

If Your Web Services Do Any of These Things, You’re Doing it Wrong

  1. Clients have to read documentation to know the locations of top-level resources.
  2. Clients have to concatenate strings to get to the next resource.
  3. You have an “API/Key/Token” in a header or a url.
  4. You have a version string in a url.

1. Have a Minimum of Starting Points

If you look at the Available Actions on Pivotal Tracker’s API page, you’ll see they list several actions that can be performed. This isn’t REST, this is XML-RPC. Nearly everybody gets this one wrong. Due to the amount of confusion, Roy Fielding published a post to stop people abusing the term “RESTful” and to try and clarify what a real RESTful API is. His final point is:

A REST API should be entered with no prior knowledge beyond the initial URI (bookmark) and set of standardized media types that are appropriate for the intended audience (i.e., expected to be understood by any client that might use the API).

The point here is that there should be only one resource that is the starting point for any interaction with the service. This is called a “well-known” resource, and is never, ever allowed to change locations. If it does change, you break every single client out there. By publishing a dozen or more well-known resources in their API docs, Tracker is no longer permitted to change any of them. This increases the maintenance burden, because now they have to maintain all these resources for the lifetime of the application, or deprecate any third-party clients.

If they had instead added a single resource that described the locations of these other resources, they would have much more flexibility in the future. An example of the content of such a resource:

<?xml version="1.0" encoding="UTF-8"?>
<services>
  <service>
    <name>AllProjects</name>
    <href>http://www.pivotaltracker.com/services/projects</href>
  </service>
  <service>
    <name>AllActivities</name>
    <href>http://www.pivotaltracker.com/services/activites</href>
  </service>
</services>

Note: Yes, they list several other actions on their API. However, each of them violates another one of the REST constraints, so I have ommitted them for the time being.

Now every client just needs to know the name of the resource they’re looking for, eg “AllActivites”, and they can continue as before. If, for some perfectly valid reason, Pivotal decides to change the name of “Activites” to, say, “Actions”, they only have to modify the href of the “AllActivities” service description, add a “AllActions” service, and every single client using it by the name instead of a hardcoded href continues to work flawlessly, or at least as well as it did before. Less maintenance burden on the service developers, and no burden at all for the developer of a well-written client.

2. Don’t Make a Client Construct URIs

In that very same bullet point, Roy continues…

From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations…

If you look at the Tracker API docs Available API Actions for projects, you’ll see “Single project” and “All my projects”. We already covered how to handle the “AllProjects” resource, an in the example above, we remove the “Single project” resource entirely. So how do you get to the resource for a single project? Simple, you follow its link in the “AllProjects” resource.

    <?xml version="1.0" encoding="UTF-8"?>
    <projects type="array">
      <project>
        <href>http://www.pivotaltracker.com/services/v2/projects/1</href>

        <id>1</id>
        <name>Sample Project</name>
        <iteration_length type="integer">2</iteration_length>
        <week_start_day>Monday</week_start_day>
        <point_scale>0,1,2,3</point_scale>

        <stories_href>http://www.pivotaltracker.com/services/v2/projects/1/stories?{-join|&|filter,limit,offset}</stories_href>
        <iterations_href>http://www.pivotaltracker.com/services/v2/projects/1/iterations</iterations_href>
        <activities_href>http://www.pivotaltracker.com/services/v2/projects/1/activities</activities_href>
      </project>
      <!-- ... -->
    </projects>

For a client to find a single project, they would know its name. They would GET the list of services, find “AllProjects” by name, GET the “href” provided, and look for the project “Sample Project” by name. They could then use the href attribute to obtain the single resource for the project. Additionally, we also have links to all the actions in the docs that required a PROJECT_ID in the url. To get the iterations or activities for a project, a client has to only locate the project, and follow the links.

You should also notice the part of the stories_href enclosed in {braces}. This is known as a URI Template, and is very handy. If you noticed in pivotals API docs, they had three ways of getting stories: All stories, stories by a filter, and stories by a limit and offset. I took the liberty of combining these into single href, using the template to describe the query parameters. A ruby client, using the Addressable::URI library, could fill out that uri like this:

template = Addressable::Template(stories_href)
template.expand({
  "filter" => 'label:"needs feedback" type:bug'
})

All these extra requests might seem like a rather long way of going about it, however, the advantages are immense:

Should Tracker become huge, and everybody and their grandmother starts using it to keep track of their development projects, Tracker could outstrip the load of a single database. Since it appears they are using AUTOINCREMENT id columns for the project id, sharding the projects table is going to be hard. However, if they were to start using UUID columns for project ids, then sharding is a whole lot less complicated. However, if they change the project id in the API, everyone’s clients break. If clients were to instead follow the href, they can do whatever they want to the id, and existing clients will have no trouble at all following.

But wait, it gets better. What happens if the service still isn’t fast enough, for any number of perfectly plausible reasons? Because they’re using hrefs, they can put anything they want there. Say they decide to shard the application servers, so every project with an odd-numbered id goes to www1.pivotaltracker.com, and everything even-numbered goes to www2.pivotaltracker.com. They just have to update the links, and everyone’s client continues working.

If all resources are specified like this, then a client can get to every resource from that one starting point. You are free to move, rename, and add resources as you desire, without making things complicated for your API clients. Less maintenance burden on you, and none on your users.

Don’t put an “API Token” in a custom header, or in the URIs

While there’s nothing technically un-RESTful about this, its still annoying to your clients. And unless you have a full-time security expert on your staff, you probably did it wrong, and its not nearly as secure as you think it is. It’s also vulnerable to man-in-the-middle attacks and replay attacks, unless you use SSL. And if you do use SSL, then you’ve thrown away one of the major advantages of HTTP, which is caching. Just about every HTTP server and proxy are able to handle caching, and if they operate to spec, they’re not allowed to cache SSL documents. I’ll get more into caching in a future blog post, just realize that it can be immensely beneficial to the performance of your application, and you’re going to want to do everything you can to facilitate that.

Luckily, you have a third option: HTTP Digest Authentication. Its been vetted by security professionals and time, and is almost certainly more secure than some secret key you’ve come up with. There are many varieties of Digest auth. The one most useful for RESTful web services uses an algorithm of “MD5-sess” and Quality of Protextion (qop) of “auth”. The MD5-sess algorithm allows for 3rd-party authentication services, and not requiring the server to maintain a plaintext copy of the users’ passwords. A qop of “auth” protects against chosen-plaintext cryptanalysis attacks, by having a counter incremented by the client, and a client-generated nonce. For a quick overview, Wikipedia has a good article, and be sure to check out the spec, [RFC2617]. Here’s a simple example to see whats going on. Client requests are denoted by >, with server responses <. This obviously isn’t the whole content, just the interesting bits.

> GET /

< HTTP/1.1 401 Authorization Required
< WWW-Authenticate: Digest 
                    qop="auth", 
                    realm="My RESTful Application", 
                    opaque="55dd3242dd79740cefb67528b983bc8e", 
                    algorithm=MD5-sess, 
                    nonce="MjAwOS0wNy0xOSAyMDozMToyOToxODQ2NjA6MjAxZjRiMjVjZjRiYTc0MDEwNWIwY2U2NWIxMGNjNj"

> GET /
> Authorization: Digest 
                 username="admin", 
                 qop="auth", 
                 realm="My RESTful Application", 
                 algorithm="MD5-sess",
                 opaque="55dd3242dd79740cefb67528b983bc8e", 
                 nonce="MjAwOS0wNy0xOSAyMDozMToyOToxODQ2NjA6MjAxZjRiMjVjZjRiYTc0MDEwNWIwY2U2NWIxMGNjNj", 
                 uri="/", 
                 nc=00000001, 
                 cnonce="Mjg5MDIz", 
                 response="1b8e5cdcd8d49ca65e3d6142567e44cf"

< HTTP/1.1 200 OK
< Authentication-Info: qop=auth, 
                       nc=00000001, 
                       cnonce="Mjg5MDIz", 
                       nextnonce=00000002

Digest auth works when the client make an initial request without any authentication info. The server responds with a 401, and provides a few parameters to the client in the WWW-Authenticate header. The realm is a string used to identify the application. The client uses MD5 to hash together their username, the realm and their password. This is referred to as HA1. When the user was created, the server did the same, and HA1 is what is stored in the database.

The client then generates a random string (the “client nonce” or cnonce) and increments a counter (”nonce counter” nc). It hashes method as an uppercase string (”GET”) and the URI (”/”) together to produce HA2. Finally, it hashes HA1, HA2, the nonce, nc, cnonce, and qop all together to arrive at response. It packages this all up into the Authorization header, and makes the request again. The server has all the information it needs (it stored the HA1 instead of the plaintext password) to hash the same parameters itself. If it arrives at the same response, then it knows the client knows the password for the user, and allows it to proceed.

Optionally, the server can provide an Authentication-Info header attached to the response. This provides enough information for the client to automatically authenticate for the next request, without having to get a 401 again. An alternative would be to just keep using the same nonce over and over, but this may be subject to replay attacks. The downside of this, though, is that the client cannot pipeline requests.

Don’t put the API version in the URI

Several web services (including Tracker’s) have uris that look like http://myapp.com/v1/projects or http://myapp.com/projects?v=2. While this is perfectly RESTful, it seems a bit odd. From a pedantically REST-view, /v1/projects/1234 and /v2/projects/1234 are the locations of totally different resources, when, in fact, they are simply different representations of the same resource. From a more practical standpoint, say a client is written when only version one of a service is available, and it stores (”bookmarks”) some of these resources. Some time later, the application team decides they need to release some incompatible changes to their API, so they increment the version. Some time after that, the client upgrades to support the new version. However, the upgrade is not as clean as it might be, because they still have the saved locations pointing to the old version. The client either needs to support both versions, or write a tool that does, so it can migrate the url to their new locations. They could munge the urls, but if one of the incompatible changes was going from integer ids to UUIDs, they have no choice.

Luckily, HTTP has a built-in solution to this problem: Content Negotiation. It makes use of two headers, Accept on the client side, and Content-Type on the server side. The Tracker services serve everything back with a Content-Type of application/xml. Its not just any old XML, however, it is a specific form of XML, the schema of which is described in their API docs. This is the situation for which the use of mimetypes is intended. If every form of image out there just used a mime-type of image, we’d have a much harder time of things. Luckily, there’s more than that, with image/gif, image/png, and image/jpeg, which all represent different encodings of images. Following the same idea, Tracker could instead use something like application/vnd.pivotal.tracker.v1+xml. Yes, its still XML, but its Pivotal Tracker Version 1 flavor of XML. Then when Pivotal decides its time for incompatible changes, they only have to add an additional content type, application/vnd.pivotal.tracker.v2+xml.

Following this idea, now a project always lives at /projects/1234. This is better, because while v1 and v2 of a project probably aren’t different, their representations are. When a client updates versions, their links don’t break, nor do they have to support two or more versions.

I’ve only just brushed the surface of this topic. For more, Peter Williams has an excellent discussion of it here, here, and here. (disclaimer &emdash; Peter is a former coworker and personal friend. This section and his posts are about a solution we came up with for a project.)

Now You Don’t Have Any Excuses

I hope that this post serves as a good description of why you shouldn’t be designing web services the way every body else does. It seems that everyone is just copying everyone else, without really understanding the pros and cons of the implementations. I hope this sparks some discussion, because I don’t know that these are even the best way to be doing it, I just know from the experience of writing both applications and consumers, they way everyone is doing it now is much more difficult than it needs to be.

Tagged

Writing DataMapper Adapters - A Tutorial

Introduction

The adapter API for DataMapper has been in a bit of flux recently. When I submitted my proposal for a talk at MountainWest, adapters were irritatingly complex to write. You just needed to know too much about DataMapper’s internals to be able to write one. A week before the conference began, I started a significant effort to re-write the API to make it easier. I succeeded, a little too well; my 30 minute talk only took 15. Since then, I’ve written a couple more adapters from scratch, and refined the API further. This post will serve as notes on the changes that I’ve made, and a tutorial on writing adapters.

The API changes are currently only in my branch, but they will be merged into the DataMapper/next branch. For now, you’ll need to use my adapters_1.0 branch.

This tutorial will follow my process as I make a DataMapper adapter for TokyoTyrant. You can grab the code from my github repo, paul/dm-tokyotyrant-adapter.

Setup

I’ll assume you know how to build a gem, and get it all set up using your favorite gem builder, so I’m going to skip all that. To begin, we only need a couple files. First (of course!), the spec:

spec/dm-tokyotyrant-adapter_spec.rb

require File.dirname(__FILE__) + '/spec_helper'

require 'dm-core/spec/adapter_shared_spec'

describe DataMapper::Adapters::TokyoTyrantAdapter do
  before :all do
    @adapter = DataMapper.setup(:default, :adapter   => 'tokyo_tyrant',
                                          :hostname  => 'localhost',
                                          :port      => 1978)
  end

  it_should_behave_like 'An Adapter'

end

And thats all there is to it. We make an @adapter instance var, which gets returned from DataMapper.setup, and then run the adapter shared spec. As of now, the shared spec is fairly thorough, but its far from comprehensive. If we run this now, we’ll get some errors about not finding the TokyoTyrantAdapter. So, lets go make it.

Initialization

lib/dm-tokyotyrant-adapter.rb

require 'dm-core'
require 'dm-core/adapters/abstract_adapter'       # 1

require 'tokyotyrant'

module DataMapper::Adapters

  class TokyoTyrantAdapter < AbstractAdapter      # 2
    include TokyoTyrant

    def initialize(name, options)
      super                                       # 3

      @options[:hostname] ||= 'localhost'         # 4
      @options[:port]     ||= 1978

      @db = RDB::new                              
    end
  end

end

Some of this is pretty TokyoTyrant-specific. Since the Ruby API isn’t very Rubyish, I’m going to skip over a lot of it, and just talk about the DataMapper/adapter specific stuff. Referencing the comments in the code above:

  1. require the abstract adapter explicitly, since its not require‘d as part of requiring dm-core.
  2. Make a class that follows the naming convention #{AdapterName}Adapter so that DataMapper can find it when we use the :adapter => 'adapter_name' option. Inherit from AbstractAdapter as well, as it will provide us with many helpers we’ll be using.
  3. Make an initialize method, and call super. This will turn any provided options into a Mash (a Hash that can use a string and a symbol as the same key. It handles a little other setup for you, as well.
  4. The rest is Tyrant-specific, but useful to know. We set some default connection options, and initialze a @db object.

If we run the spec now, it connects, and we get a bunch of pending specs, saying we need to implment #read, #create, etc…

dm-tokyotyrant-adapter/master % rake spec
(in /home/rando/dev/dm-tokyotyrant-adapter)
*****

Pending:

DataMapper::Adapters::TokyoTyrantAdapter needs to support #create (Not Yet Implemented)
/usr/lib/ruby/gems/1.8/gems/dm-core-0.10.0/lib/dm-core/spec/adapter_shared_spec.rb:52

DataMapper::Adapters::TokyoTyrantAdapter needs to support #read (Not Yet Implemented)
/usr/lib/ruby/gems/1.8/gems/dm-core-0.10.0/lib/dm-core/spec/adapter_shared_spec.rb:75

DataMapper::Adapters::TokyoTyrantAdapter needs to support #update (Not Yet Implemented)
/usr/lib/ruby/gems/1.8/gems/dm-core-0.10.0/lib/dm-core/spec/adapter_shared_spec.rb:107

DataMapper::Adapters::TokyoTyrantAdapter needs to support #delete (Not Yet Implemented)
/usr/lib/ruby/gems/1.8/gems/dm-core-0.10.0/lib/dm-core/spec/adapter_shared_spec.rb:129

DataMapper::Adapters::TokyoTyrantAdapter needs to support #read and #create to test query matching (Not Yet Implemented)
/usr/lib/ruby/gems/1.8/gems/dm-core-0.10.0/lib/dm-core/spec/adapter_shared_spec.rb:289

Finished in 0.005982 seconds

5 examples, 0 failures, 5 pending

Create

def create(resources)                                     # 1
  db do |db|                                              # 2
    resources.each do |resource|                          # 3
      initialize_identity_field(resource, rand(2**32))    # 4
      save(db, key(resource), serialize(resource))        # 5
    end
  end
end
  1. resources is an Array of DataMapper Resource objects.
  2. #db is a helper to make TokyoTyrant’s api a little more friendly. It handles connecting to the ttserver, and yields the connection to the block. When finished, it closes the connetion.
  3. Some adapters might be able to support bulk creates, like SQL INSERT. This one doesn’t, so we’ll loop over every resource.
  4. We’ll need to set the identity field. More on this later.
  5. Put the resource into the database. #key and #serialize are helpers, I’ll explain them in a bit.

Something useful to note here: The resources being passed in to this method are the actual resources in use by DataMapper. That means that any modifications you make to them will also be automatically availble to anything using DataMapper. This is extremely useful for any data store that can provide a representation of the created object. If the data store set some fields as a result of creation, eg, a created_at timestamp, or an href linking to the location of the resource, you can update the resource right here, and not have to have DataMapper perform a #read to update the resource object.

If you’re coming from an RDBMS world, you’ll be familiar with sequences. Since you’re here, learning how to write adapters, I’m going to assume you’re not going to be talking to a relational database. If thats the case, and you don’t need to support these kinds of sequences, you should probably use UUIDs or something similar for your identity fields. Sequences are not scalable or distributable, they’re a relic of the big RDBMSs. I only have this #initialize_identity_field line in there to show how its done. As you can see, I’m not even picking it sequentially, but choosing a random number, instead, because I don’t have a resonable way to keep track of sequences. The method won’t try to overwrite a value if one is already set, so take the opportunity to use a UUID instead, and save everyone involved a bunch of trouble.%lt;/soapbox>

Because TokyoCabinet & Tyrant are key-value stores, I’ve written a couple helpers to try and coerce resources into a single key and value. First, I choose a key from the model name, and keys in the model, like so:

def key(resource)
  model = resource.model
  key = resource.key.join('/')
  "#{model}/#{key}"
end

We get the model, and the keys from the resource. One thing to keep in mind, is that DataMapper assumes composite keys for every model, so even if a model has only a single key, Resource#keys will always return an array. We use that to build a string, like Article/1234. I chose a slash as the delimiter, because TokyoTyrant has a ReSTful interface, and it will make for pretty urls.

We also need to serialze the resource. I chose to serialize it as JSON, because its cross-platform, and lightweight. YAML or even XML would also be ok choices, depending on what you may be interoperating with.

def serialize(resource)
  resource.attributes(:field).to_json
end

resource#attributes normally returns a Hash of {:property_name => value} pairs. DataMapper properties also can take an option, :field, which is used to indicate the name of the field used by the data store. Because we’re writing an adapter to a data-store, thats what we want. #attributes can take an optional argument to indicate what we want to use as keys. Here, I used :field, meaning I want the field attribute of the property. It will then return a Hash of the form {"field_name" => value} There usually won’t be a difference, but its important that adapters use the field instead of the name, so that someone writing a model can use the :field option to property correctly.

Let’s run the spec again, and see how we did:

dm-tokyotyrant-adapter/master % rake spec
(in /home/rando/dev/dm-tokyotyrant-adapter)
/usr/lib/ruby/gems/1.8/gems/rake-0.8.3/lib/rake/gempackagetask.rb:13:Warning: Gem::manage_gems is deprecated and will be removed on or after March 2009.
****..

Finished in 0.009957 seconds

6 examples, 0 failures, 4 pending

Read

def read(query)
  model = query.model

  db do |db|
    keys = db.fwmkeys(model.to_s)
    records = []
    keys.each do |key|
      value = db.get(key)
      records << deserialize(value) if value
    end
    filter_records(records, query)
  end
end

#read takes a DataMapper::Query object, which has everything needed to filter, sort, and limit records. For simple adapters, that don’t have a native query language, you don’t need to care. The #filter_records helper in AbstractAdapter will take care of everything for you. All you need to do it provide it an Array of Hashes, using the field name of the property as the key. Since we use json to serialize the value, here we deserialize it back into a hash. We used field names as the keys, so no further translation is needed. TokyoTyrant provides the #fwmkeys method as a way to search for a key prefix, so we pass the model name in, because the model name is the first part of the key we used. We pass all the records we found in to #filter_records, which performs the filtering, and we then return the result.

Update

def update(attributes, collection)                                 # 1
  attributes = attributes_as_fields(attributes)                    # 2
  db do |db|
    collection.each do |resource|                                  # 3
      attributes = resource.attributes(:field).merge(attributes)   # 4
      save(db, key(resource), serialize(resource))                 # 5
    end
  end
end
  1. We take an attributes hash and a DataMapper::Collection. The attributes are in the form of {Property => value}, using the actual property object. A Collection is a set of resources.
  2. We need to convert the keys in the attributes has from Property objects into :field name. Luckily, AbstractAdapter provides #attributes_as_fields, which does exactly that.
  3. Iterate over every resource in the collection
  4. Update the attributes hash with the combination of the existing attributes, merged with the attributes we wish to update.
  5. Write the whole thing back to the database.

You may also want to take a look at how the InMemoryAdapter in dm-core accomplishes the same task. It extracts the query used to build the collection, and looks for those records in its data store, using #filter_records. It then updates each record in-place. Either way works fine, and the ease of which may depend upon the adapter. In TokyoTyrant, finding the records is harder than retrieving them, so I opted to just re-save the ones I already had in the collection. An SQL adapter is able to update the records without loading them, so using the query is faster. ( “UPDATE {attributes} WHERE {query}” ).

Delete

def delete(collection)
  db do |db|
    collection.each do |resource|
      db.delete(key(resource))
    end
  end
end

At this point, it should all be self-explainatory. Just iterate over every resource in the colleciton, and delete its key from the db. Yay.

Conclusion

And thats all there is to it. 3 hours, 2 beers, and ~100 LOC later, and we have a fully-capable adapter that can be used with DataMapper. I was running the specs at every stage, but left them out for brevity. Here’s the final run:

dm-tokyotyrant-adapter/master % rake spec
(in /home/rando/dev/dm-tokyotyrant-adapter)
......................................

Finished in 0.175668 seconds

38 examples, 0 failures

As I said before, the specs aren’t exactly comprehensive, but they will be added to over the next few weeks. For now, they’re good enough that you can be pretty confident your adapter will work for most things.

Thanks for tuning in, leave a comment, or come visit me in #datamapper on freenode if you have any adapter questions.

I spoke at Mountain West!

Confreaks posted my talk. Everyone go make fun of that huge nerd up there!

I pushed some of the changes I talked about to my github branch. This covers the Conditions objects.

Next on my personal roadmap for adapters one-point-oh edition are for Repository to handle turning the responses from adapters into Resource objects, if they aren’t already.

DataMapper Echo Adapter

I just wrote a simple adapter that can be used to investigate the DM Adapter API, and debug your own adapter. Its really simple to use:

DataMapper.setup(:default, 
                 :adapter => :echo, 
                 :echo => {:adapter => :in_memory})

Set the :echo option to and options hash or connection uri that can initialize the adapter you want to wrap. This will print out the method calls, arguments, and return values to STDOUT.

#read
query: #<DataMapper::Query @repository=:default 
                           @model=Article 
                           @fields=[#<DataMapper::Property @model=Article @name=:id>, 
                                    #<DataMapper::Property @model=Article @name=:title>] 
                           @links=[] @conditions=[] @order=[] @limit=nil @offset=0 
                           @reload=false @unique=false>
 # => [#<Article @id=1 @title="Test" @text=<not loaded>>]

Its on github Example output

A Response to “Database Versioning”

I was just going to post a comment in reply to Adam Wiggins’s Database Versioning post, but it ended up being pretty long, so I’ll post a response here instead.

I’m the original author and current maintainer of the migrations plugin for datamapper. I spent a lot of time thinking about AR migrations before I started writing it. I think that DM migrations have solved a few of the problems he has with AR migrations.

The part about screwing up a migration, and having to re-run it sounds more like a tooling problem. When I write a migration, I drop/create the db, and re-run all the migrations to ‘test’ it. (Also, the DM migration specs should help with this.) Yeah, it blows away all your development data, but you should have fixtures or scripts or something to make it easy to recreate.

There are also long-term plans for a plugin in datamapper to inspect the current database schema, examine the definitions in the models, then “infer” the migration that needs to take place. It will be impossible, of course, to guess at what kind of data migration might be needed, but I believe that migrations shouldn’t touch data. If, given your fullname => firstname, lastname example, I add the new columns, and run a rake task to handle the data. After a few days/weeks, when I’m sure that every production server has been upgraded, and that task run, I’ll write a migration to drop the fullname column.

I do agree that having the database schema living in two different places if very non-dry, but even his suggestion of a schema.yml would duplicate the column definitions that are present in datamapper models.So

I’ve used these DM migrations in 2 projects now that have been in production for >6 months, and it fits in very well with my workflow. I tend to break up the migration files by table, so I end up with schema/people.rb, schema/articles.rb, schema/comments.rb, with each of those being a table in the db. Then inside one of the files, I list the migrations in version order: 1, :create_people_table, 2, :add_firstname_lastname, 3, :remove_fullname. This lets me see at a glance what version I’m on for a particular table, and I don’t have to worry about dependencies. If I do need to modify several tables at once, I have a simple rake task that tells me what the maximum version number is, so I can make one after it.

I think that tryring to use SHAs as version numbers would be even more annoying than epoch timestamps as versions. I do like the idea about the model/application requiring a specific version, and refusing to start otherwise. From a DataMapper POV, it would be easy to add a #requires_db_version(5) method to the model. I’m already in the habit of not using my models in migrations, by virtue of never writing data migrations. I even just usually write the migrations in raw SQL, it gives me more control over the table stucture when I really care.

So, essentially, DataMapper already provides the solution that Adam outlines in his post; Replace schema.yml with DataMapper model definitions, and have the discipline to not write data migrations. Write specs for your migrations, like everything else, and use DM migrations’ sane versioning, rather than AR’s irritating one, and you should be fine. There are definitely improvements to be made with DM migrations, to be sure, but I feel like I got the underlying design mostly right.

Tagged ,

I’m speaking at Boulder Ruby Group in 2 weeks

I’m going to be giving a practice run of the talk I’ll be giving at MountainWest at the Boulder Ruby Group meeting next Wednesday (18th, 7pm). Come see it and tell me what I’m doing wrong give me some constructive criticism.

HOWTO - Get a list of a class’s subclasses

I recently came across a situation where I had an AbstractClass, an I wanted to know all of the classes that had inherited from it. There were lots of implementations on the web, but that weren’t exactly what I wanted, or they used ObjectSpace to get ALL the classes, and see if the interesting one was in its ancestors.

I only needed it one-level deep, but it would be fairly easy to extend it for more.

class ParentClass
  def self.subclasses
    @subclasses ||= Set.new
  end

  def self.inherited(subclass)
    subclasses << subclass
  end
end

class ChildA < ParentClass; end
class ChildB < ParentClass; end

ParentClass.subclasses
# => #<Set: {ChildA, ChildB}>

I’m Speaking at MountainWest!

I’m going to be giving a talk at Mountain West Ruby Conf!

For those of you too lazy to scroll down and find the details of my talk, I’ll repeat them here:

Some might think of DataMapper as a better, faster, competitor to ActiveRecord. However, they would be missing on of its greatest strengths. At its core, DataMapper provides a uniform interface on top of ANY persistance layer. All thats needed is a simple adapter class that can translate the native persitance into a simple 4-method API for DataMapper to consume. This talk will cover that API, and some best-practices on implementing an adapter. We will explore the YAML Adapter, which I will be writing for the purposes of this talk.

Wish me luck!

Ruby Dir.glob bug

To further elaborate on Yehuda’s twit:

[~/tmp][rando@apollo]
 % mkdir first first/second
[~/tmp][rando@apollo]
 % touch first/second/test.txt
[~/tmp][rando@apollo]
 % chmod -x first
[~/tmp][rando@apollo]
 % ls first/second/*.txt
ls: cannot access first/second/*.txt: Permission denied
[~/tmp][rando@apollo]
 % irb
irb(main):001:0> Dir.glob('first/second/*.txt')
=> []

If you try to glob some things in a directory that has some ancestor missing the eXecute permission, ruby doesn’t give any indication of an error.

This took Yehuda and I about 30 minutes to track down why a merb app wasn’t loading bundled gems under passenger. Apache was running as nobody, and the parent dir of the app was missing the global execute permission.

Tagged

Wildfires in Boulder

Here’s the view out my kitchen window, most mornings:

Here’s what it looks like this fine evening:

From fire

Click the album link for more. From my porch, I can see parts of the mountainside flashing from the lights of the fire trucks. They’ve evacuated 11,000 homes, but the wind is blowing the fire the other direction. There have been clouds of smoke all afternoon, but once the sun set, I has able to see the flames.

More details about the fires here: http://www.dailycamera.com/news/2009/jan/07/i-70-closed-over-vail-pass-avalanche-control/