Architecture¶
This guide covers the internal architecture of zae-limiter, including the DynamoDB schema and token bucket implementation.
DynamoDB Schema (Single Table)¶
All data is stored in a single DynamoDB table using a composite key pattern:
| Record Type | PK | SK |
|---|---|---|
| Entity metadata | ENTITY#{id} |
#META |
| Bucket | ENTITY#{id} |
#BUCKET#{resource}#{limit_name} |
| Limit config | ENTITY#{id} |
#LIMIT#{resource}#{limit_name} |
| Usage snapshot | ENTITY#{id} |
#USAGE#{resource}#{window_key} |
| System version | SYSTEM# |
#VERSION |
| Audit events | AUDIT#{entity_id} |
#AUDIT#{timestamp} |
Global Secondary Indexes¶
| Index | Purpose | Key Pattern |
|---|---|---|
| GSI1 | Parent → Children lookup | GSI1PK=PARENT#{id} → GSI1SK=CHILD#{id} |
| GSI2 | Resource aggregation | GSI2PK=RESOURCE#{name} → buckets/usage |
Access Patterns¶
| Pattern | Query |
|---|---|
| Get entity | PK=ENTITY#{id}, SK=#META |
| Get buckets | PK=ENTITY#{id}, SK begins_with #BUCKET# |
| Get children | GSI1: GSI1PK=PARENT#{id} |
| Resource capacity | GSI2: GSI2PK=RESOURCE#{name}, SK begins_with BUCKET# |
| Get version | PK=SYSTEM#, SK=#VERSION |
| Get audit events | PK=AUDIT#{entity_id}, SK begins_with #AUDIT# |
Token Bucket Implementation¶
For a conceptual overview of the token bucket algorithm, see the User Guide. This section covers implementation details for contributors.
Core Functions¶
The algorithm is implemented in bucket.py:
| Function | Purpose | Lines |
|---|---|---|
refill_bucket() |
Calculate refilled tokens with drift compensation | 27-75 |
try_consume() |
Atomic check-and-consume operation | 78-134 |
force_consume() |
Force consume (can go negative) | 224-255 |
calculate_retry_after() |
Calculate wait time for deficit | 137-159 |
Mathematical Formulas¶
Refill calculation (lazy, on-demand):
Drift compensation (prevents accumulated rounding errors):
time_used_ms = (tokens_to_add × refill_period_ms) // refill_amount_milli
new_last_refill = last_refill_ms + time_used_ms
The inverse calculation ensures we only "consume" the time that corresponds to whole tokens, preventing drift over many refill cycles.
Retry-after calculation:
time_ms = (deficit_milli × refill_period_ms) // refill_amount_milli
retry_seconds = (time_ms + 1) / 1000.0 # +1ms rounds up
Integer Arithmetic for Precision¶
All token values are stored as millitokens (×1000) to avoid floating-point precision issues in distributed systems:
Why integers matter in distributed systems:
- Floating-point operations can produce different results on different hardware
- DynamoDB stores numbers as strings, so precision loss can occur during serialization
- Rate limiting across multiple nodes requires identical calculations
Refill Rate Storage¶
Refill rates are stored as a fraction (amount/period) rather than a decimal:
# 100 tokens per minute stored as:
refill_amount_milli = 100_000 # millitokens (numerator)
refill_period_ms = 60_000 # milliseconds (denominator)
This avoids representing 1.6667 tokens/second as a float. Instead:
Lazy Refill with Drift Compensation¶
Tokens are calculated on-demand rather than via a background timer. The refill_bucket() function:
- Calculates elapsed time since last refill
- Computes tokens to add using integer division
- Tracks "time consumed" to prevent drift
# From bucket.py:refill_bucket()
tokens_to_add = (elapsed_ms * refill_amount_milli) // refill_period_ms
# Drift compensation: only advance time for tokens actually added
time_used_ms = (tokens_to_add * refill_period_ms) // refill_amount_milli
new_last_refill = last_refill_ms + time_used_ms
Without drift compensation, repeated calls with small time intervals would accumulate rounding errors.
Negative Buckets (Debt)¶
Buckets can go negative to support post-hoc reconciliation:
# Estimate 500 tokens, actually used 2000
async with limiter.acquire(consume={"tpm": 500}) as lease:
actual = await call_llm() # Returns 2000 tokens
await lease.adjust(tpm=2000 - 500) # Bucket at -1500
The force_consume() function handles this:
# From bucket.py:force_consume()
# Consume can go negative - no bounds checking
new_tokens_milli = refill.new_tokens_milli - (amount * 1000)
The debt is repaid as tokens refill over time. A bucket at -1500 millitokens needs 1.5 minutes to reach 0 (at 1000 tokens/minute).
Burst Capacity¶
Burst allows temporary exceeding of sustained rate:
# Sustained: 10k tokens/minute
# Burst: 15k tokens (one-time)
Limit.per_minute("tpm", 10_000, burst=15_000)
When burst > capacity, users can consume up to burst tokens immediately, then sustain at capacity rate.
Design Decisions¶
| Decision | Rationale |
|---|---|
| Integer over float | Identical results across distributed nodes; no precision drift |
| Lazy over continuous | No background timers; accurate retry_after; efficient |
| Negative allowed | Estimate-then-reconcile pattern; operations with unknown cost |
| Fraction over decimal | Exact representation of rates like 100/minute |
Atomicity¶
TransactWriteItems¶
Multi-entity updates (like cascade mode) use DynamoDB transactions:
# Single atomic operation:
# 1. Consume from child entity
# 2. Consume from parent entity
# Both succeed or both fail
Transaction limits: max 100 items per transaction.
Optimistic Locking¶
Entity metadata uses version numbers for optimistic locking:
# Read entity with version 5
# Update fails if version changed
condition_expression="version = :expected_version"
Project Structure¶
src/zae_limiter/
├── __init__.py # Public API exports
├── models.py # Limit, Entity, LimitStatus, BucketState, StackOptions
├── exceptions.py # RateLimitExceeded, RateLimiterUnavailable, etc.
├── naming.py # Resource name validation and ZAEL- prefix logic
├── bucket.py # Token bucket math (integer arithmetic)
├── schema.py # DynamoDB key builders
├── repository.py # DynamoDB operations
├── lease.py # Lease context manager
├── limiter.py # RateLimiter, SyncRateLimiter
├── cli.py # CLI commands
├── version.py # Version tracking and compatibility
├── migrations/ # Schema migration framework
└── infra/
├── stack_manager.py # CloudFormation stack operations
├── lambda_builder.py # Lambda deployment package builder
└── cfn_template.yaml # CloudFormation template
Key Design Decisions¶
- Lease commits only on success: If any exception occurs in the context, changes are rolled back
- Bucket can go negative:
lease.adjust()never throws, allows debt - Cascade is optional: Parent is only checked if
cascade=True - Stored limits override defaults: When
use_stored_limits=True - Transactions are atomic: Multi-entity updates succeed or fail together
Next Steps¶
- Development Setup - Setting up your environment
- Testing - Test organization and fixtures