Architecture¶

This guide covers the internal architecture of zae-limiter, including the DynamoDB schema and token bucket implementation.

DynamoDB Schema (Single Table)¶

All data is stored in a single DynamoDB table using a composite key pattern:

Record Type	PK	SK
Entity metadata	`ENTITY#{id}`	`#META`
Bucket	`ENTITY#{id}`	`#BUCKET#{resource}#{limit_name}`
Limit config	`ENTITY#{id}`	`#LIMIT#{resource}#{limit_name}`
Usage snapshot	`ENTITY#{id}`	`#USAGE#{resource}#{window_key}`
System version	`SYSTEM#`	`#VERSION`
Audit events	`AUDIT#{entity_id}`	`#AUDIT#{timestamp}`

Global Secondary Indexes¶

Index	Purpose	Key Pattern
GSI1	Parent → Children lookup	`GSI1PK=PARENT#{id}` → `GSI1SK=CHILD#{id}`
GSI2	Resource aggregation	`GSI2PK=RESOURCE#{name}` → buckets/usage

Access Patterns¶

Pattern	Query
Get entity	`PK=ENTITY#{id}, SK=#META`
Get buckets	`PK=ENTITY#{id}, SK begins_with #BUCKET#`
Get children	GSI1: `GSI1PK=PARENT#{id}`
Resource capacity	GSI2: `GSI2PK=RESOURCE#{name}, SK begins_with BUCKET#`
Get version	`PK=SYSTEM#, SK=#VERSION`
Get audit events	`PK=AUDIT#{entity_id}, SK begins_with #AUDIT#`

Token Bucket Implementation¶

For a conceptual overview of the token bucket algorithm, see the User Guide. This section covers implementation details for contributors.

Core Functions¶

The algorithm is implemented in bucket.py:

Function	Purpose	Lines
`refill_bucket()`	Calculate refilled tokens with drift compensation	27-75
`try_consume()`	Atomic check-and-consume operation	78-134
`force_consume()`	Force consume (can go negative)	224-255
`calculate_retry_after()`	Calculate wait time for deficit	137-159

Mathematical Formulas¶

Refill calculation (lazy, on-demand):

tokens_to_add = (elapsed_ms × refill_amount_milli) // refill_period_ms

Drift compensation (prevents accumulated rounding errors):

time_used_ms = (tokens_to_add × refill_period_ms) // refill_amount_milli
new_last_refill = last_refill_ms + time_used_ms

The inverse calculation ensures we only "consume" the time that corresponds to whole tokens, preventing drift over many refill cycles.

Retry-after calculation:

time_ms = (deficit_milli × refill_period_ms) // refill_amount_milli
retry_seconds = (time_ms + 1) / 1000.0  # +1ms rounds up

Integer Arithmetic for Precision¶

All token values are stored as millitokens (×1000) to avoid floating-point precision issues in distributed systems:

# User sees: 100 tokens/minute
# Stored as: 100,000 millitokens/minute
capacity_milli = 100_000

Why integers matter in distributed systems:

Floating-point operations can produce different results on different hardware
DynamoDB stores numbers as strings, so precision loss can occur during serialization
Rate limiting across multiple nodes requires identical calculations

Refill Rate Storage¶

Refill rates are stored as a fraction (amount/period) rather than a decimal:

# 100 tokens per minute stored as:
refill_amount_milli = 100_000  # millitokens (numerator)
refill_period_ms = 60_000      # milliseconds (denominator)

This avoids representing 1.6667 tokens/second as a float. Instead:

# 100 tokens/minute = 100,000 millitokens / 60,000 ms
# Integer division handles the math precisely

Lazy Refill with Drift Compensation¶

Tokens are calculated on-demand rather than via a background timer. The refill_bucket() function:

Calculates elapsed time since last refill
Computes tokens to add using integer division
Tracks "time consumed" to prevent drift

# From bucket.py:refill_bucket()
tokens_to_add = (elapsed_ms * refill_amount_milli) // refill_period_ms

# Drift compensation: only advance time for tokens actually added
time_used_ms = (tokens_to_add * refill_period_ms) // refill_amount_milli
new_last_refill = last_refill_ms + time_used_ms

Without drift compensation, repeated calls with small time intervals would accumulate rounding errors.

Negative Buckets (Debt)¶

Buckets can go negative to support post-hoc reconciliation:

# Estimate 500 tokens, actually used 2000
async with limiter.acquire(consume={"tpm": 500}) as lease:
    actual = await call_llm()  # Returns 2000 tokens
    await lease.adjust(tpm=2000 - 500)  # Bucket at -1500

The force_consume() function handles this:

# From bucket.py:force_consume()
# Consume can go negative - no bounds checking
new_tokens_milli = refill.new_tokens_milli - (amount * 1000)

The debt is repaid as tokens refill over time. A bucket at -1500 millitokens needs 1.5 minutes to reach 0 (at 1000 tokens/minute).

Burst Capacity¶

Burst allows temporary exceeding of sustained rate:

# Sustained: 10k tokens/minute
# Burst: 15k tokens (one-time)
Limit.per_minute("tpm", 10_000, burst=15_000)

When burst > capacity, users can consume up to burst tokens immediately, then sustain at capacity rate.

Design Decisions¶

Decision	Rationale
Integer over float	Identical results across distributed nodes; no precision drift
Lazy over continuous	No background timers; accurate retry_after; efficient
Negative allowed	Estimate-then-reconcile pattern; operations with unknown cost
Fraction over decimal	Exact representation of rates like 100/minute

Atomicity¶

TransactWriteItems¶

Multi-entity updates (like cascade mode) use DynamoDB transactions:

# Single atomic operation:
# 1. Consume from child entity
# 2. Consume from parent entity
# Both succeed or both fail

Transaction limits: max 100 items per transaction.

Optimistic Locking¶

Entity metadata uses version numbers for optimistic locking:

# Read entity with version 5
# Update fails if version changed
condition_expression="version = :expected_version"

Project Structure¶

src/zae_limiter/
├── __init__.py        # Public API exports
├── models.py          # Limit, Entity, LimitStatus, BucketState, StackOptions
├── exceptions.py      # RateLimitExceeded, RateLimiterUnavailable, etc.
├── naming.py          # Resource name validation and ZAEL- prefix logic
├── bucket.py          # Token bucket math (integer arithmetic)
├── schema.py          # DynamoDB key builders
├── repository.py      # DynamoDB operations
├── lease.py           # Lease context manager
├── limiter.py         # RateLimiter, SyncRateLimiter
├── cli.py             # CLI commands
├── version.py         # Version tracking and compatibility
├── migrations/        # Schema migration framework
└── infra/
    ├── stack_manager.py    # CloudFormation stack operations
    ├── lambda_builder.py   # Lambda deployment package builder
    └── cfn_template.yaml   # CloudFormation template

Key Design Decisions¶

Lease commits only on success: If any exception occurs in the context, changes are rolled back
Bucket can go negative: lease.adjust() never throws, allows debt
Cascade is optional: Parent is only checked if cascade=True
Stored limits override defaults: When use_stored_limits=True
Transactions are atomic: Multi-entity updates succeed or fail together

Next Steps¶

Development Setup - Setting up your environment
Testing - Test organization and fixtures