Home
A rate limiting library backed by DynamoDB using the token bucket algorithm.
Overview¶
zae-limiter excels at rate limiting scenarios where:
- Multiple limits are tracked per call (requests per minute, tokens per minute)
- Consumption is unknown upfront — adjust limits after the operation completes
- Hierarchical limits exist (API key → project, tenant → user)
- Cost matters — ~$0.625/1M requests, $0 for fast rejections (details)
Features¶
- Token Bucket Algorithm - Precise rate limiting with configurable capacity and refill rates
- Multiple Limits - Track requests per minute, tokens per minute, etc. in a single call
- Hierarchical Entities - Two-level hierarchy (project → API keys) with cascade mode
- Atomic Transactions - Multi-key updates via DynamoDB TransactWriteItems
- Write-on-Enter with Rollback - Tokens consumed immediately on acquire; compensating writes on exception
- Stored Limits - Configure per-entity limits in DynamoDB
- Usage Analytics - Lambda aggregator for hourly/daily usage snapshots
- Audit Logging - Track entity and limit changes for compliance
- Multi-Tenant Isolation - Namespace-scoped data isolation with per-tenant IAM policies
- Async + Sync APIs - First-class async support with sync wrapper
Quick Example¶
from zae_limiter import Repository, RateLimiter, Limit
# Async rate limiter (auto-provisions if needed)
repo = await Repository.open()
limiter = RateLimiter(repository=repo)
# Define default limits (can be overridden per-entity)
default_limits = [
Limit.per_minute("rpm", 100),
Limit.per_minute("tpm", 10_000),
]
async with limiter.acquire(
entity_id="api-key-123",
resource="gpt-4",
limits=default_limits,
consume={"rpm": 1, "tpm": 500}, # Estimate tokens upfront
) as lease:
response = await call_llm()
# Reconcile actual usage (can go negative for post-hoc adjustment)
await lease.adjust(tpm=response.usage.total_tokens - 500)
# Tokens written to DynamoDB on enter | Rolled back on exception
# Hierarchical entities: project → API key
await limiter.create_entity(entity_id="proj-1", name="Production")
await limiter.set_limits("proj-1", [Limit.per_minute("tpm", 100_000)])
await limiter.create_entity(entity_id="api-key-456", parent_id="proj-1", cascade=True)
# cascade is an entity property — acquire() auto-cascades to parent
# limits=None auto-resolves from stored config (Entity > Resource > System)
async with limiter.acquire(
entity_id="api-key-456",
resource="gpt-4",
limits=None,
consume={"rpm": 1, "tpm": 500},
) as lease:
response = await call_llm()
Why DynamoDB?¶
- Serverless - No infrastructure to manage, 99.99% SLA
- Regional - Deploy independently per region with low latency
- Scalable - Handles millions of requests per second
- Cost-effective - Pay per request, no idle costs
- Atomic - TransactWriteItems for multi-key consistency
Next Steps¶
| Guide | Description |
|---|---|
| Getting Started | Installation and first deployment |
| Basic Usage | Rate limiting patterns and error handling |
| Hierarchical Limits | Parent/child entities, cascade mode |
| LLM Integration | Token estimation and reconciliation |
| Production Guide | Security, monitoring, cost |
| Multi-Tenant Guide | Namespace isolation, per-tenant IAM |
| CLI Reference | Deploy, status, delete commands |