Stream Processing Operations¶
This guide covers troubleshooting and operational procedures for DynamoDB Streams processing, which powers the usage aggregation feature.
Decision Tree¶
Troubleshooting¶
Symptoms¶
IteratorAgemetric growing- Usage snapshots delayed
- Stream iterator age alarm triggered
- Lambda throttling
Diagnostic Steps¶
Check IteratorAge metric:
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name IteratorAge \
--dimensions Name=FunctionName,Value=ZAEL-<name>-aggregator \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Maximum
Check stream status:
Check Lambda event source mapping:
Check Lambda concurrent executions:
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name ConcurrentExecutions \
--dimensions Name=FunctionName,Value=ZAEL-<name>-aggregator \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Maximum
High Iterator Age¶
Common causes and solutions:
| Cause | Solution |
|---|---|
| Lambda errors | Fix errors (check DLQ and logs) - see Lambda Operations |
| Lambda throttling | Increase reserved concurrency |
| Low Lambda concurrency | Match concurrency to shard count |
| DynamoDB throttling | Increase table capacity - see DynamoDB Operations |
| Large batch sizes | Reduce batch size in event source mapping |
Understanding IteratorAge:
IteratorAgemeasures the delay between when a record is written to the stream and when Lambda processes it- Healthy: < 1 second
- Warning: < 30 seconds
- Critical: > 30 seconds (default alarm threshold)
Lambda Throttling¶
Check if Lambda is being throttled:
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Throttles \
--dimensions Name=FunctionName,Value=ZAEL-<name>-aggregator \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 300 \
--statistics Sum
Solution: Increase reserved concurrency (see procedures below).
Procedures¶
Increase Lambda Concurrency¶
Set reserved concurrency:
aws lambda put-function-concurrency \
--function-name ZAEL-<name>-aggregator \
--reserved-concurrent-executions 10
Check current concurrency:
Remove concurrency limit (use account default):
Adjust Batch Size¶
Get current mapping UUID:
MAPPING_UUID=$(aws lambda list-event-source-mappings \
--function-name ZAEL-<name>-aggregator \
--query 'EventSourceMappings[0].UUID' \
--output text)
echo "Mapping UUID: $MAPPING_UUID"
Reduce batch size (process fewer records per invocation):
Increase batch size (higher throughput, higher latency):
Adjust batch window (wait for more records):
aws lambda update-event-source-mapping \
--uuid $MAPPING_UUID \
--maximum-batching-window-in-seconds 5
Shard Scaling¶
DynamoDB Streams automatically scales shards based on table throughput.
Check shard count:
aws dynamodbstreams describe-stream \
--stream-arn $(aws dynamodb describe-table --table-name ZAEL-<name> \
--query 'Table.LatestStreamArn' --output text) \
--query 'StreamDescription.Shards | length(@)'
Rule of thumb: Lambda concurrency should be >= shard count for optimal processing.
If you have 10 shards but only 5 concurrent Lambda executions, processing will lag.
Enable Parallelization Factor¶
Process multiple batches from the same shard concurrently:
Valid values: 1-10 (default: 1)
Ordering
Increasing parallelization factor may result in out-of-order processing within a shard. This is acceptable for usage aggregation but may not be suitable for all use cases.
Verify Stream Health¶
Check stream is enabled:
Expected output:
Check event source mapping is active:
aws lambda list-event-source-mappings \
--function-name ZAEL-<name>-aggregator \
--query 'EventSourceMappings[0].State'
Expected output: "Enabled"
Monitor After Changes¶
After tuning, monitor for 15-30 minutes:
# Watch IteratorAge
watch -n 30 "aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name IteratorAge \
--dimensions Name=FunctionName,Value=ZAEL-<name>-aggregator \
--start-time \$(date -u -d '30 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time \$(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Maximum \
--query 'Datapoints | sort_by(@, &Timestamp) | [-1].Maximum'"
Tuning Guidelines¶
| Scenario | Batch Size | Concurrency | Parallelization |
|---|---|---|---|
| Low volume (< 100 req/s) | 100 | 2 | 1 |
| Medium volume (100-1000 req/s) | 100 | 5-10 | 1 |
| High volume (> 1000 req/s) | 50-100 | 10+ | 2 |
| Real-time requirements | 10-50 | 10+ | 2-5 |
Related¶
- Lambda Operations - Lambda errors and duration issues
- DynamoDB Operations - Throttling affecting stream processing
- Monitoring Guide - CloudWatch dashboards for streams