Getting Started¶
This guide will help you install zae-limiter and set up rate limiting in your application.
Installation¶
Quick Start¶
zae-limiter creates its own infrastructure automatically. Here's a complete example:
Async API (Recommended)¶
from zae_limiter import RateLimiter, Limit, StackOptions, RateLimitExceeded
# Create rate limiter with declarative infrastructure
limiter = RateLimiter(
name="my-app", # ZAEL-my-app resources in AWS
region="us-east-1",
stack_options=StackOptions(), # Declare desired state - CloudFormation ensures it
)
try:
async with limiter.acquire(
entity_id="user-123",
resource="api",
limits=[Limit.per_minute("requests", 100)],
consume={"requests": 1},
) as lease:
await do_work()
except RateLimitExceeded as e:
print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")
Sync API¶
from zae_limiter import SyncRateLimiter, Limit, StackOptions
limiter = SyncRateLimiter(
name="my-app",
region="us-east-1",
stack_options=StackOptions(),
)
with limiter.acquire(
entity_id="user-123",
resource="api",
limits=[Limit.per_minute("requests", 100)],
consume={"requests": 1},
) as lease:
response = call_api()
Infrastructure Persistence¶
When you pass stack_options=StackOptions(), zae-limiter creates real AWS infrastructure via CloudFormation:
| Resource | Purpose | Persists? |
|---|---|---|
| DynamoDB Table | Rate limit state, entities, usage | Yes - until deleted |
| Lambda Function | Usage aggregation | Yes - until deleted |
| IAM Role | Lambda permissions | Yes - until deleted |
| CloudWatch Logs | Lambda logs | Yes - with retention |
Infrastructure Outlives Your Python Session
This infrastructure persists beyond your Python session. Restarting your application reconnects to existing resources. Rate limit state is preserved across restarts. You only pay when the limiter is used (~$1/1M requests).
Infrastructure Lifecycle¶
Both programmatic API and CLI are fully supported for managing infrastructure.
Creating Infrastructure¶
Pass stack_options to declare the desired infrastructure state:
CloudFormation ensures the infrastructure matches your declaration.
Connecting to Existing Infrastructure¶
If you omit stack_options, the limiter connects to existing infrastructure without
attempting to create or modify it:
# Connect only - fails if ZAEL-my-app doesn't exist
limiter = RateLimiter(
name="my-app",
region="us-east-1",
# No stack_options = connect only, no create/update
)
This is useful when infrastructure is managed separately (e.g., via CLI or Terraform).
Declarative State Management
StackOptions declares the desired infrastructure state. If multiple applications
use the same limiter name with different settings, CloudFormation will update
the stack to match the most recent declaration—similar to how Terraform applies
the last-written configuration.
To maintain consistent state:
- Use identical
StackOptionsacross all clients sharing a limiter - Omit
stack_optionsin application code and manage infrastructure externally - Use different limiter names for different configurations
Checking Status¶
Deleting Infrastructure¶
Data Loss
Deleting infrastructure permanently removes all rate limit data, entity configurations, and usage history. This cannot be undone.
Deployment Options
For organizations requiring strict infrastructure/application separation, see CLI deployment or CloudFormation template export.
Understanding Limits¶
Rate limiting in zae-limiter tracks who is making requests, what they're accessing, and how much they can use.
The Core Concepts¶
When you call acquire(), you specify:
entity_id: Who is being rate limited (e.g.,"user-123","api-key-abc","tenant-xyz")resource: What they're accessing (e.g.,"gpt-4","api","embeddings")limits: The rate limit rules to applyconsume: How much capacity this request uses
Each entity has separate buckets per resource. A user rate limited on "gpt-4" can still access "gpt-3.5-turbo":
# User 123 accessing GPT-4 - tracked separately from GPT-3.5
async with limiter.acquire(
entity_id="user-123",
resource="gpt-4", # Bucket: user-123 + gpt-4
limits=[Limit.per_minute("rpm", 10)],
consume={"rpm": 1},
) as lease:
...
# Same user, different resource - separate bucket
async with limiter.acquire(
entity_id="user-123",
resource="gpt-3.5-turbo", # Bucket: user-123 + gpt-3.5-turbo
limits=[Limit.per_minute("rpm", 100)],
consume={"rpm": 1},
) as lease:
...
Defining Limits¶
A Limit defines a rate limit using the token bucket algorithm:
# 100 requests per minute
Limit.per_minute("rpm", 100)
# 10,000 tokens per minute with 15,000 burst capacity
Limit.per_minute("tpm", 10_000, burst=15_000)
# 1,000 requests per hour
Limit.per_hour("rph", 1_000)
# Custom: 50 requests per 30 seconds
Limit.custom("requests", capacity=50, refill_period_seconds=30)
| Parameter | Description |
|---|---|
name |
Unique identifier (e.g., "rpm", "tpm") |
capacity |
Tokens that refill per period (sustained rate) |
burst |
Maximum bucket size (defaults to capacity) |
See Token Bucket Algorithm for details on how capacity, burst, and refill work together.
Handling Rate Limit Errors¶
When a rate limit is exceeded, RateLimitExceeded is raised with full details:
from zae_limiter import RateLimitExceeded
try:
async with limiter.acquire(...):
await do_work()
except RateLimitExceeded as e:
# Get retry delay
print(f"Retry after: {e.retry_after_seconds}s")
# For HTTP responses
return JSONResponse(
status_code=429,
content=e.as_dict(),
headers={"Retry-After": e.retry_after_header},
)
Next Steps¶
- Basic Usage - Multiple limits, adjustments, capacity queries
- Hierarchical Limits - Parent/child entities, cascade mode
- LLM Integration - Token estimation and reconciliation
- Deployment Guide - Production deployment options
- CLI Reference - Full CLI command reference