I’ve never read Amazon’s Dynamo paper. I’ve also never had the opportunity to work with Cassandra or SimpleDB, but when Amazon announced DynamoDB I thought it was time to take a little bit of time to learn what it was just in case it was super-useful. I thought I’d share a few of my findings.
Disclaimer: I’m completely new to this style of NoSQL system and may well in fact be misusing it in places. Feel free to give me some free education if I’m doing something horrendous below.
What is DynamoDB?
DynamoDB is a NoSQL database hosted by Amazon and intended to give Amazon the burden of scaling your data until it goes to 11 (or 11,000,000). I’ve seen a number of posts describing how DynamoDB works, but not really much about what it is.
DynamoDB is (mostly) an enhanced key-value store with a few features that bring it beyond a simple KVS. Those features include:
- You do actually dump a variety of typed attributes into the system rather than just raw string data, so you can have numbers, strings, sets of numbers, and sets of strings.
- You can provide a “range key” which essentially gives you a single indexed field upon which you can perform queries. This is likely to be something like a timestamp or other “ordering” key from the use cases I’ve figured out.
- You can perform atomic increment/decrement and set add/remove operations on rows in your DynamoDB tables.
- Each table has a defined read and write throughput, so you can literally just tell Amazon how much scale you need and it takes care of the rest behind the scenes.
One thing that surprised me (perhaps I was being dense) is that if you create a table that has both a hash and a range key, you can have multiple values with the same hash key. In fact, you can only query values that share a hash key, so this is the intended use case.
Getting Started (with Ruby)
Unfortunately, Ruby is not one of the listed example languages in the DynamoDB documentation. Fortunately, Amazon does support DynamoDB through its aws-sdk
gem and it’s relatively straightforward to use.
First, you’ll need to sign up for DynamoDB through the AWS console. Luckily there is a free usage tier that gives you 100MB, 10 reads/second and 5 writes/second.
Once you’ve done that, you’ll need to fetch your AWS Security Credentials so that you can connect to DynamoDB from Ruby. Got ’em? Good.
I set out to build a very basic Twitter-like system as a proof-of-concept of DynamoDB. For that, I wanted to have two tables: tweets
and users
. I managed my schema like this:
require "aws" AWS.config( access_key_id: ENV["AWS_KEY"], secret_access_key: ENV["AWS_SECRET"] ) DB = AWS::DynamoDB.new TABLES = {} { "tweets" => { hash_key: {timeline_id: :string}, range_key: {created_at: :number} }, "users" => { hash_key: {id: :string} } }.each_pair do |table_name, schema| begin TABLES[table_name] = DB.tables[table_name].load_schema rescue AWS::DynamoDB::Errors::ResourceNotFoundException table = DB.tables.create(table_name, 10, 5, schema) print "Creating table #{table_name}..." sleep 1 while table.status == :creating print "done!n" TABLES[table_name] = table.load_schema end end
This bit of code contains the schema information for each table as a hash (you can specify a hash key and, optionally, a range key when creating a table). It then checks to see if each table exists and creates it if not (in this example using 10 reads/second and 5 writes/second). Creating tables in DynamoDB is a non-trivial operation and may take as long as a minute (probably a good deal more with a heavy throughput). Once it’s created the tables it loads the schema for them (required for later operations) and stores the resulting object reference in a TABLES
constant.
Next I needed to learn how to actually manipulate data in the tables, so I created some barebones models to accomplish my needs. You can see the full models file in this Gist but here are some of the highlights:
# Create a user with id "username" TABLES["users"].items.create(id: "username") # Dump a hash of attributes for all users TABLES["users"].items.each{|i| puts i.attributes.to_h } # Fetch a specific user user1 = TABLES["users"].items.at("username") user2 = TABLES["users"].items.at("username2") # Follow another user user1.attributes.add(following: ["username2"]) user2.attributes.add(followers: ["username"]) # Post a tweet now = Time.now user1.attributes["followers"].each do |follower| TABLES["tweets"].items.create( timeline_id: follower.attributes["id"], created_at: now.to_i, text: "This is the tweet text." ) end # Retrieve 24 hours of tweets for a user's timeline TABLES["tweets"].items.query( hash_value: "username", range_greater_than: 1.days.ago.to_i )
Hopefully reading the code above gives you some idea of the simple operations for creating records, performing an atomic operation, and querying by hash key or range.
What Next?
DynamoDB is a bit of a puzzle to me. It seems to me that it would mostly be useful for applications that have already pushed the limits of more flexible data solutions like MongoDB (or even SQL) and need intense levels of data throughput with reliable redundancy. I don’t think you would likely start your application out architected to use DynamoDB, but at least now I’ve explored it enough to add it to my toolbelt if I come across a situation where its unique blend of features makes sense. Are you using DynamoDB or looking at it for a project? If so, I’d be curious to know your use case.