AWS – Create a DynamoDB table with a global secondary index

Getting Started with the IIS Manager in IIS

Amazon DynamoDB is a managed key-value store that was one of the first NoSQL databases to be made available in the cloud. A NoSQL database is an alternative to the traditional relational databases that have been used for a variety of purposes in the industry for decades. DynamoDB does not offer many of the standard features of an RDBMS, such as table joins and ACID semantics. ACID stands for Atomic, Consistent, Isolated, and Durable. Consistency is one of the key trade-offs normally made with a NoSQL database since these systems are designed to run in a distributed fashion across a multitude of commodity machines. That being said, it is possible to make consistent reads with DynamoDB, and now you can even initiate transactions across operations, so the lines are being blurred somewhat.

There are still big differences with how you have to approach DynamoDB. It is a key-value store, and you do not use SQL to create, read, update, or delete records. You cannot write ad hoc queries across multiple tables, and in fact, tables are completely disconnected from one another. Each table has its own configuration, and in fact, if you find yourself creating multiple tables for a single service, you may not be following design best practices. Each DynamoDB table should have a single, simple job to perform, which aligns well with microservice design patterns and the overall DevOps philosophy.

How to do it…

In this recipe, you will create a DyanamoDB table to hold user account records, and you will add a global secondary index (GSI) to allow fast lookups by email address. GSIs give you an additional way to sort the data in your table, which allows for some degree of design flexibility after the table has been created:

  1. Log in to your account and go to the DynamoDB dashboard. Click Create table.
  2. Give the table a unique name and then enter UserID as the Partition key. A primary key in DynamoDB is either a partition key by itself or the combination of a partition key and a sort key. In this case, we will leave the sort key blank:

Create DynamoDB table
  1. Uncheck Use default settings.
  2. For Read-write capacity mode, choose On-demand. This will keep costs low while not using the table and enables rapid auto-scaling to meet increased demand. This should be your default choice for any unpredictable workload that does not have a relatively steady access pattern.
  3. Leave the other settings as defaults and click Create.
  4. It can take a few minutes for your table to become available. Once it does, go to the Items tab and click Create item.
  5. Note that, unlike a relational database, there is no structure for the item beyond the UserID. Enter a unique value for UserID:

Create item
  1. Click the + and enter a few more values, such as EmailAddress and LastName. With DynamoDB, each record can have unique property names, which allows for some very flexible design patterns. Save the record.
  2. Create a new item and this time, give it different properties, such as Username and FullName.

 

  1. In the item summary, note how each of the unique column names you have used is visible, but don’t let this fool you—each record only knows about the properties that were assigned to it. This is not a relational database schema. One of the benefits of DynamoDB is the ability to define new properties at runtime without the need to run Data Definition Language (DDL) commands as you would with an RDBMS:

Item summary
  1. Click on the Indexes tab, then click Create index.

 

  1. Enter EmailAddress as the Partition key and click Create index. Leaving the Projected attributes as the default of All means that all properties will be available when looking up items by the index key:

Creating an index
  1. Wait a few minutes for the index to be created. The status will change to Active when the index is ready.
  2. Go back to the Items tab and refresh the page so that the new index is recognized.
  3. Change Scan to Query and select your new index. Enter an email address to match one of your earlier entries into the table and click Start search:

Querying an index
  1. Note that the record only displays the columns that are actually entered on that item. By querying the index, you will get a response back in milliseconds, even if the table had millions or billions of records.
  2. Take some time to explore the other tabs and see the many configuration options available to you.
  3. Be sure to delete the table when you are finished with this recipe, to prevent future charges.

How it works…

The partition key for a DynamoDB table is used to spread the data out over partitions, which are 10 GB slices of data located across a large number of physical machines. It’s best to use something like a UUID (GUID) or something with a similarly high level of cardinality for the partition key to preventing too much data from being concentrated on one partition. A combination of partition and sort key can be used in cases where you want buckets of data sorted under your keys, which can be useful in many scenarios but are very application-specific.

Once a table is created, data records can be retrieved by the key very quickly, but if you want to search by any of the other properties, lookups are very slow because you are forced to scan the entire table. For a table with millions or billions of records, this is obviously not practical. That’s where GSIs come in. A GSI functions much like a separate table, holding the same data, but indexed by a different key. The benefit is that you don’t have to manage that secondary table yourself. DynamoDB handles it for you, making sure the index is always kept up to date according to changes made on the table (in an eventually consistent manner).

While primary keys on the primary table must be unique, the same rule is not enforced for index keys. A query operation specifying a key for a GSI can return multiple records. Any items with index keys that are missing from the primary table do not take up any space in the GSI, which allows for efficient storage of items with different sets of properties.

Another concept that is important to understand is the difference between auto-scaling and on-demand settings for table capacity. It’s also possible to manually set the read and write capacity of a table, but since we always want to look for ways to automate, this option should not be your first choice. Choosing auto-scaling can result in lower costs if you carefully choose your settings, and if your application scales slowly and smoothly, without abrupt changes in usage or long periods of no activity at all. On-demand is a relatively new choice that is much more appropriate for bursty workloads, and it handles quick changes better than auto-scaling, but in some scenarios, it can end up being more expensive.

Comments are closed.