loading...

AWS – Calculating Amazon DynamoDB capacity

How to install Ubuntu Server 19.10

Amazon DynamoDB is the managed NoSQL database service from AWS, described in the previous recipe.

As DDB pricing is based on the amount of read and write capacity units provisioned, it is important to be able to calculate the requirements for your use case. This recipe uses a written formula to estimate the required Read Capacity Units (RCUs) and Write Capacity Units (WCUs) that should be allocated to your DDB table.

It is also crucial to remember that, while new partitions will be automatically added to a DDB table, they cannot be automatically taken away. This means that excessive partitioning can cause long-term impacts on your performance, so you should be aware of them.

Getting ready

All of these calculations assume that you have chosen a good partition key for your data. A good partition key ensures the following:

  • Data is evenly spread across all the available partitions.
  • Read and write activity is spread evenly in time.

Unfortunately, choosing a good partition key is very data-specific, and beyond the scope of this recipe.

All reads are assumed to be strongly consistent.

How to do it…

Follow these steps to calculate your DynamoDB capacity:

  1. Start with the size of the items, in Kilobytes (KB):

ItemSize = Size of the items (rows) in KB

  1. Work out the required number of RCUs required by dividing the number by 4 and rounding up:

RCU Per Item = ItemSize / 4 (rounded up)

  1. Define the expected number of reading operations per second. This is one of the numbers you will use to provision your table with:

Required RCU = Expected Number of Reads * RCU Per Item

  1. Divide the number by 3,000 to calculate the number of DDB partitions required to reach the capacity:

Read Partitions = Required RCU / 3,000

  1. Next, work out the write capacity required by dividing the item size by 1 and rounding up:

WCU Per Item = ItemSize / 1 (rounded up)

  1. Define the expected number of write operations per second. This is one of the numbers you will use to provision your table with:

Required WCU = Expected Number of Writes * WCU Per Item

  1. Divide the number by 1,000 to calculate the number of DDB partitions required to reach the capacity:

Write Partitions = Required WCU / 1,000

  1. Add these two values to get the capacity partitions required (rounding up to a whole number):

Capacity Partitions = Read Partitions + Write Partitions (rounded up)

  1. Work out the minimum number of partitions required by the amount of data you plan to store:

Size Partitions = Total Size in GB / 10 (rounded up)

  1. Once you have the partition requirements for your use case, take the maximum of your previous calculations:

Required Partitions = Maximum value between Capacity Partitions and Size Partitions

  1. Since your allocated capacity is spread evenly across partitions, divide the RCU and WCU values to get the per-partition performance of your table:

Partition Read Throughput = Required RCU / Required Partitions

Partition Write Throughput = Required WCU / Required Partitions

For example, let’s say you have an item size of 3 KB. The RCU per item is 1 (3/4 rounded up). We expect 100 reads per second, so the expected RCUs are 100. The WCU per item is 3, and if we expect 20 writes per second, then the expected WCUs are 60. Since each of those is less than 1,000, we need two total partitions to handle the reads and writes. If we plan to store 1 billion items, then the number of partitions we expect is 286 (1 billion * 3 KB / 1024 / 1024 / 10). We choose the maximum between the capacity and the size, and we get 286.

How it works…

Behind the scenes, DDB throughput is controlled by the number of partitions that are allocated to your table. It is important to consider how your data will be spread across these partitions to ensure you get the performance you expect and have paid for.

We start this recipe by calculating the size of the items in your database, for throughput purposes. DDB has a minimum size it will consider, and even if an operation uses less than this size, it is rounded up in terms of allocated throughput used. The minimum size depends on the type of operation:

  • Read operations are calculated in 4-K blocks.
  • Write operations are calculated in 1-K blocks.

We then work out what the required RCU and WCU is, based on the expected number of operations. These values are what can then be used to provision the DDB table, as they represent the minimum required throughput (in optimal conditions).

Once you have these values, you can use them to provision your table.

Next, we calculate the throughput per partition key. These calculations rely on knowing what the performance of each partition is expected to be. The numbers 3,000 (for RCUs) and 1,000 (for WCUs) represent the capacity of a single DDB partition. By expressing the capacity in terms of partition performance (reads and writes) and adding them together, we get the minimum number of partitions required from a capacity point of view.

We then do the same calculation for the total data size. Each DDB partition can handle up to 10 GB of data. Any more than that will need to be split between multiple partitions.

The specific values for partition capacity (for reads, writes, and size) have been stable for a while but may change in the future. Double-check that the current values are the same as used here for complete accuracy.

Once we have the minimum partitions for both capacity and size, we take the highest value and work with that. This ensures we meet both the capacity and size requirements.

Finally, we take the provisioned capacity and divide it by the number of partitions. This gives us the throughput performance for each partition key, which we can then use to confirm against our use case.

There’s more…

There are many nuances to using DDB efficiently and effectively. Here are some of the more important/impactful things to note:

  • Burst capacity
  • Metrics
  • Eventually consistent reads

Burst capacity

There is a burst capacity available to tables that go over their allocated capacity. Unused read and write capacity can be retained for up to five minutes (such as 300 seconds, for calculation purposes). Relying on this capacity is not good practice, and it will undoubtedly cause issues at some stage in the future.

Metrics

DDB tables automatically send data to CloudWatch metrics. This is the quickest and easiest way to confirm that your calculations and provision capacity are meeting your needs. It also helps you to keep an eye on your usage to track your throughput needs over time. All metrics appear in the AWS/DynamoDB namespace. Some of the most interesting metrics for throughput calculations are as follows:

  • ConsumedReadCapacityUnits
  • ConsumedWriteCapacityUnits
  • ReadThrottleEvents
  • WriteThrottleEvents

There are other metrics available; refer to the Amazon DynamoDB Metrics and Dimensions documentation at https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html for more details.

Eventually consistent reads

Using eventually consistent reads (as opposed to strongly consistent reads) halves the RCU requirements for calculation purposes. In this recipe, we have used strongly consistent reads because it works with all workloads, but you should confirm that your use case actually requires it. Use eventually consistent reads if it does not.

By reducing the required provisioned capacity for reads, you effectively reduce your cost for using DDB.

See also

  • In Chapter 10, Advanced AWS CloudFormation, we use DynamoDB in the Detecting resource drift from templates with drift detection recipe

Comments are closed.

loading...