Getting Started¶
This guide will help you install zae-limiter and set up rate limiting in your application.
Installation¶
Quick Start¶
zae-limiter creates its own infrastructure automatically.
Minimalist¶
For scripts and quick demos, pass limits inline:
from zae_limiter import Repository, RateLimiter, Limit, RateLimitExceeded
repo = await Repository.open()
limiter = RateLimiter(repository=repo)
try:
async with limiter.acquire(
entity_id="user-123",
resource="api",
consume={"requests": 1},
limits=[Limit.per_minute("requests", 100)],
) as lease:
await do_work()
except RateLimitExceeded as e:
print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")
# Clean up when done
await repo.delete_stack()
Stored Config (Recommended)¶
For production, configure limits once and keep application code simple.
Step 1: Deploy and configure
Step 2: Use in your application
from zae_limiter import Repository, RateLimiter, RateLimitExceeded
repo = await Repository.open()
limiter = RateLimiter(repository=repo)
try:
async with limiter.acquire(
entity_id="user-123",
resource="api",
consume={"rpm": 1, "tpm": 500}, # Limits resolved automatically
) as lease:
await do_work()
except RateLimitExceeded as e:
print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")
Infrastructure Persistence¶
When you use infrastructure builder methods, zae-limiter creates real AWS infrastructure via CloudFormation:
| Resource | Purpose | Persists? |
|---|---|---|
| DynamoDB Table | Rate limit state, entities, usage | Yes - until deleted |
| Lambda Function | Usage aggregation | Yes - until deleted |
| IAM Role | Lambda permissions | Yes - until deleted |
| CloudWatch Logs | Lambda logs | Yes - with retention |
Infrastructure Outlives Your Python Session
This infrastructure persists beyond your Python session. Restarting your application reconnects to existing resources. Rate limit state is preserved across restarts. You only pay when the limiter is used (~$0.625/1M requests, $0 for fast rejections).
Infrastructure Lifecycle¶
Both programmatic API and CLI are fully supported for managing infrastructure.
Creating Infrastructure¶
Use open() which auto-provisions infrastructure if needed:
CloudFormation ensures the infrastructure matches your declaration.
Connecting to Existing Infrastructure¶
Use Repository.open() to connect to infrastructure, auto-provisioning if the table is missing and registering the namespace if not found:
# Open repository (auto-provisions if needed)
repo = await Repository.open()
limiter = RateLimiter(repository=repo)
Stack defaults to the ZAEL_STACK environment variable or "zae-limiter". Namespace defaults to ZAEL_NAMESPACE or "default".
Declarative State Management
Builder methods declare the desired infrastructure state. If multiple applications use the same limiter name with different settings, CloudFormation will update the stack to match the most recent declaration—similar to how Terraform applies the last-written configuration.
To maintain consistent state:
- Use identical builder options across all clients sharing a limiter
- Omit infrastructure builder methods in application code and manage infrastructure externally
- Use different limiter names for different configurations
Checking Status¶
Deleting Infrastructure¶
Data Loss
Deleting infrastructure permanently removes all rate limit data, entity configurations, and usage history. This cannot be undone.
Deployment Options
For organizations requiring strict infrastructure/application separation, see CLI deployment or CloudFormation template export.
Understanding Limits¶
Rate limiting in zae-limiter tracks who is making requests, what they're accessing, and how much they can use.
The Core Concepts¶
When you call acquire(), you specify:
entity_id: Who is being rate limited (e.g.,"user-123","api-key-abc","tenant-xyz")resource: What they're accessing (e.g.,"gpt-4","api","embeddings")consume: How much capacity this request useslimits: The rate limit rules to apply (optional if using stored config)
Each entity has separate buckets per resource. A user rate limited on "gpt-4" can still access "gpt-3.5-turbo":
# User 123 accessing GPT-4 - tracked separately from GPT-3.5
async with limiter.acquire(
entity_id="user-123",
resource="gpt-4", # Bucket: user-123 + gpt-4
consume={"rpm": 1},
limits=[Limit.per_minute("rpm", 10)],
) as lease:
...
# Same user, different resource - separate bucket
async with limiter.acquire(
entity_id="user-123",
resource="gpt-3.5-turbo", # Bucket: user-123 + gpt-3.5-turbo
consume={"rpm": 1},
limits=[Limit.per_minute("rpm", 100)],
) as lease:
...
Defining Limits¶
A Limit defines a rate limit using the token bucket algorithm:
# 100 requests per minute
Limit.per_minute("rpm", 100)
# 10,000 tokens per minute with 15,000 burst capacity
Limit.per_minute("tpm", 10_000, burst=15_000)
# 1,000 requests per hour
Limit.per_hour("rph", 1_000)
# Custom: 50 requests per 30 seconds
Limit.custom("requests", capacity=50, refill_amount=50, refill_period_seconds=30)
| Parameter | Description |
|---|---|
name |
Unique identifier (e.g., "rpm", "tpm") |
rate |
Sustained tokens per period (positional) |
burst |
Optional burst ceiling (defaults to rate) |
See Token Bucket Algorithm for details on how rate, burst, and refill work together.
Handling Rate Limit Errors¶
When a rate limit is exceeded, RateLimitExceeded is raised with full details:
from zae_limiter import RateLimitExceeded
try:
async with limiter.acquire(
entity_id="user-123",
resource="gpt-4",
consume={"rpm": 2}, # Exceeds capacity to trigger error
limits=[Limit.per_minute("rpm", 1)],
):
await do_work()
except RateLimitExceeded as e:
# Get retry delay
print(f"Retry after: {e.retry_after_seconds}s")
# For HTTP responses
response = JSONResponse(
status_code=429,
content=e.as_dict(),
headers={"Retry-After": e.retry_after_header},
)
Centralized Configuration (v0.5.0+)¶
zae-limiter supports storing rate limit configurations in DynamoDB, eliminating the need to hardcode limits in application code.
Setting Up Defaults¶
Configure limits at system and resource levels (typically done by admins during deployment):
# Set system-wide defaults (applies to ALL resources)
zae-limiter system set-defaults -l rpm:100 -l tpm:10000
# Set resource-specific defaults (override system for this resource)
zae-limiter resource set-defaults gpt-4 -l rpm:50 -l tpm:100000
zae-limiter resource set-defaults gpt-3.5-turbo -l rpm:200 -l tpm:500000
# Set entity-specific limits (premium users)
zae-limiter entity set-limits user-premium --resource gpt-4 -l rpm:500 -l tpm:500000
Automatic Resolution¶
With limits configured, application code becomes simpler—no need to pass limits:
# Limits are resolved automatically from stored config
async with limiter.acquire(
entity_id="user-123",
resource="gpt-4",
consume={"rpm": 1}, # No limits parameter needed
) as lease:
await call_api()
Resolution order (highest to lowest precedence):
- Entity level - Specific limits for entity+resource
- Resource level - Default limits for a resource
- System level - Global defaults for all resources
- Override parameter - Fallback if no stored config
See Configuration Hierarchy for full details.
Multi-Tenant Namespaces¶
For multi-tenant applications, namespaces provide logical isolation within a single DynamoDB table:
from zae_limiter import Repository, RateLimiter
# Each tenant gets an isolated namespace
repo = await Repository.open("tenant-alpha")
limiter = RateLimiter(repository=repo)
# All operations are scoped to tenant-alpha's namespace
async with limiter.acquire(
entity_id="user-123",
resource="api",
consume={"rpm": 1},
) as lease:
await do_work()
For namespace lifecycle management and per-tenant IAM access control, see the Production Guide.
Next Steps¶
- Basic Usage - Multiple limits, adjustments, capacity queries
- Configuration Hierarchy - Three-tier limit resolution
- Hierarchical Limits - Parent/child entities, cascade mode
- LLM Integration - Token estimation and reconciliation
- Deployment Guide - Production deployment options
- CLI Reference - Full CLI command reference
- Namespace Keys Migration - Migrating from v0.9.x to v0.10.0