AWS DynamoDB
AWS DynamoDB
Overview
- Fully managed, highly available and has 3 AZ
- Integrated with IAM
- Max size of an item is 400KB
Primary key
- Partition key
- Partition key+ Sort key
RCU and WCU throughput
RCU- Read capacity units- throughput for reads
WCU- Write capacity units- throughput for writes
Option to setup auto-scaling to meet the demands
Usig burst credit throughput can be exceeded
If burst credit is empty, ProvisionedThroughputException is thrown
Strongly consistent read:
- When a read operation is performed after a write, updated item is returned
Eventually consistent read:
- When a read operation is perfomed after write, it’s possible to get old data, because of replication
By default DDB uses eventually consistent read. But, GetItem, Query and Scan provide option to set ConsistentRead to true
WCU:
- One WCU represents 1 write per second for an item up to 1KB size
RCU:
- One RCU represents 1 strongly consisted read per second or two eventually consisted read per second for an item up to size 4KB
DynamoDB basic APIs
Write data
- PutItem- Writes to DynamoDB or replaces fully
- UpdateItem- partial update of attributes
- ConditionalWrites- Accept a write only if conditions are met. Helps with concurrent access to items
Delete data
- DeleteItem- Delete an individual row, can also delete based on condition
- DeleteTable- Deltes a whole table. Quicker than DeleteItem for all items
BatchingWriteItem
- Upto 25 PutItem/DeleteItem in one call
- Upto 16MB of data writeen/ 400KB of data per item
- Saves latency by reducing the number of calls
- Operations done in parallel
- Possibility of part of batch items operation to fail, in which case, we’ve to retry
Reading data
- GetItem- Reading based on primary key. By default uses
eventually consistent read
, butstrongly consistent read
can be used too.ProjectExpression
can be specified to include only certain attributes - BatchGetItem- Upto 100 items can be read, upto 16MB of data, read in parallel to reduce latency
- GetItem- Reading based on primary key. By default uses
Query
- Based on partition key
- Sort key can be used too
- FilterExpression can be done too, at the client side
- Upto 1MB of data can be returned or the number of items, as specified
- Pagination can be done
GSI and LSI in DynamoDB
LSI
- Alternate ranging key(sort key) for the table
- We can have upto 5 LSIs per table
- The attribute can be String, Number or Boolean
- Must be defined at the time of creation
GSI
- Helps in speeding up queries on non-key attributes
- GSI= paritionkey+optional sortkey
- Index is a new table, we can project attributes on it
- Partition and sort keys of original table are always projected(KEYS_ONLY)
- Specific attributes projected(INCLUDE)
- All attributes projected(ALL)
- We must define RCU and WCU for the index
- We can modify GSI, but not LSI
Throttling
GSI
- The main table will be affected if the GSI write is throttled
- WCU and partition key for GSI should be choosed wisely
- Having WCU of GSI greater than that of Main table helps solve the problem
LSI
- Since this is created at the time of table creation, we can’t do much about it
- RCU and WCU of the LSI remains same as the table for which it’s created
DyanamoDB concurrency
- DynamoDB has conditional update/delete feature, with which we can check if the item that has to be updated passes the given conditions, i.e the item hasn’t changed
- With which DynamoDB is optimistic locking or concurrency database
DynamoDB DAX
- Cache for DynamoDB
- Write operation goes through DAX and then to the DynamoDB
- For read operation there is only micro second latency if DAX is used
- 5 mins TTL is default setting for cache, but, can be changed
- DAX cluster supports upto 10 nodes and multi AZ as well
- Supports encryption
DynamoDB Streams
- Using this, we can track the changes in our DynamoDB
- Can be read by AWS Lambda, EC2 instances to do different operations
- But, we have the option to choose what goes into the database
- KEYS_ONLY- Only key attributes of the modified item
- NEW_IMAGE- Entire item, after it was modified
- OLD_IMAGE- New item, after modification
- NEW_AND_OLD_IMAGE- Both the items(expensive but helpful)
- Data only stays for 24 hours in the Streams
- Records are not retroactively populated- only after the stream is enabled, it’s populated
- Streams and Lamda
- Define event-source mappings
- Have appropriate permissions for the Lambda function
- The function is invoked synchronously
- For this, enable the streams from the table, enable trigger from the triggers section in the table itself and create a Lambda function for reading the data from the streams. Make sure LambdaDynamoDBExecutionRole policy is added to the role
DynamoDB TTL
- Deleting an item from DynamoDB after a give time
- By default it’s disabled
- A column is used, based on it’s value, expiration condition is evaluated
- Due to TTL, items in the GSI and LSI also gets deleted
- But we can use Streams to recover expired items, if needed
- The expired item stays for upto 48 hours in the DB
DynamoDB CLI
- Projection
aws dynamodb scan --table-name chiras_dynamo_learning \ --projection-expression "id,priority" \ --region ap-south-1
- Filter
aws dynamodb scan --table-name chiras_dynamo_learning \ --projection-expression "id,priority" \ --filter-expression "id = :u" --expression-attribute-values '{ ":u": {"N":"1"}}' \ --region ap-south-1
- Page size
Does one API callaws dynamodb scan --table-name chiras_dynamo_learning \ --region ap-south-1
One API call per itemaws dynamodb scan --table-name chiras_dynamo_learning \ --region ap-south-1 --page-size 1
aws dynamodb scan --table-name chiras_dynamo_learning \ --region ap-south-1 --max-items 1
aws dynamodb scan --table-name chiras_dynamo_learning \ --region ap-south-1 --max-items 1 --starting-token eyJFeGNsdXNpdmVTdGFydEtleSI6IG51bGwsICJib3RvX3RydW5jYXRlX2Ftb3VudCI6IDF9
- Projection
Copying table options
- Through code
- Using AWS DataPipeline
- Backup and restore to new table
DDB Security
- Uses IAM roles to control access and use VPC for availablity
- KMS, SSL/TLS security for rest and in-flight respectively