zae-limiter¶
A rate limiting library backed by DynamoDB using the token bucket algorithm.
Overview¶
zae-limiter excels at rate limiting scenarios where:
- Multiple limits are tracked per call (requests per minute, tokens per minute)
- Consumption is unknown upfront — adjust limits after the operation completes
- Hierarchical limits exist (API key → project, tenant → user)
- Cost matters — ~$1/1M requests (details)
Features¶
- Token Bucket Algorithm - Precise rate limiting with configurable burst capacity
- Multiple Limits - Track requests per minute, tokens per minute, etc. in a single call
- Hierarchical Entities - Two-level hierarchy (project → API keys) with cascade mode
- Atomic Transactions - Multi-key updates via DynamoDB TransactWriteItems
- Rollback on Exception - Automatic rollback if your code throws
- Stored Limits - Configure per-entity limits in DynamoDB
- Usage Analytics - Lambda aggregator for hourly/daily usage snapshots
- Audit Logging - Track entity and limit changes for compliance
- Async + Sync APIs - First-class async support with sync wrapper
Quick Example¶
from zae_limiter import RateLimiter, Limit, StackOptions
# Async rate limiter with declarative infrastructure
limiter = RateLimiter(
name="my-app",
region="us-east-1",
stack_options=StackOptions(), # Declare desired state - CloudFormation ensures it
)
# Define default limits (can be overridden per-entity)
default_limits = [
Limit.per_minute("rpm", 100),
Limit.per_minute("tpm", 10_000, burst=50_000), # Token bucket with burst
]
async with limiter.acquire(
entity_id="api-key-123",
resource="gpt-4",
limits=default_limits,
consume={"rpm": 1, "tpm": 500}, # Estimate tokens upfront
) as lease:
response = await call_llm()
# Reconcile actual usage (can go negative for post-hoc adjustment)
await lease.adjust(tpm=response.usage.total_tokens - 500)
# On success: committed | On exception: rolled back automatically
# Hierarchical entities: project → API key
await limiter.create_entity(entity_id="proj-1", name="Production")
await limiter.set_limits("proj-1", [Limit.per_minute("tpm", 100_000)])
await limiter.create_entity(entity_id="api-key-456", parent_id="proj-1")
# cascade=True enforces both key AND project limits
async with limiter.acquire(
entity_id="api-key-456",
resource="gpt-4",
limits=default_limits,
consume={"rpm": 1, "tpm": 500},
cascade=True, # Also checks parent's stored limits
use_stored_limits=True, # Uses proj-1's 100k tpm limit
) as lease:
response = await call_llm()
Why DynamoDB?¶
- Serverless - No infrastructure to manage, 99.99% SLA
- Regional - Deploy independently per region with low latency
- Scalable - Handles millions of requests per second
- Cost-effective - Pay per request, no idle costs
- Atomic - TransactWriteItems for multi-key consistency
Next Steps¶
| Guide | Description |
|---|---|
| Getting Started | Installation and first deployment |
| Basic Usage | Rate limiting patterns and error handling |
| Hierarchical Limits | Parent/child entities, cascade mode |
| LLM Integration | Token estimation and reconciliation |
| Production Guide | Security, monitoring, cost |
| CLI Reference | Deploy, status, delete commands |