Getting Started¶

This guide will help you install zae-limiter and set up rate limiting in your application.

Installation¶

pipuvpoetryconda

pip install zae-limiter

uv pip install zae-limiter

poetry add zae-limiter

conda install -c conda-forge zae-limiter

Quick Start¶

zae-limiter creates its own infrastructure automatically.

Minimalist¶

For scripts and quick demos, pass limits inline:

from zae_limiter import Repository, RateLimiter, Limit, RateLimitExceeded

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

try:
    async with limiter.acquire(
        entity_id="user-123",
        resource="api",
        consume={"requests": 1},
        limits=[Limit.per_minute("requests", 100)],
    ) as lease:
        await do_work()
except RateLimitExceeded as e:
    print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")

# Clean up when done
await repo.delete_stack()

Stored Config (Recommended)¶

For production, configure limits once and keep application code simple.

Step 1: Deploy and configure

CLIPython

# Deploy infrastructure
zae-limiter deploy --name my-app --region us-east-1

# Configure limits (apply to all entities)
zae-limiter system set-defaults --name my-app -l rpm:1000 -l tpm:100000

from zae_limiter import Repository, RateLimiter, Limit

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

await limiter.set_system_defaults(limits=[
    Limit.per_minute("rpm", 1000),
    Limit.per_minute("tpm", 100000),
])

Step 2: Use in your application

from zae_limiter import Repository, RateLimiter, RateLimitExceeded

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

try:
    async with limiter.acquire(
        entity_id="user-123",
        resource="api",
        consume={"rpm": 1, "tpm": 500},  # Limits resolved automatically
    ) as lease:
        await do_work()
except RateLimitExceeded as e:
    print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")

Infrastructure Persistence¶

When you use infrastructure builder methods, zae-limiter creates real AWS infrastructure via CloudFormation:

Resource	Purpose	Persists?
DynamoDB Table	Rate limit state, entities, usage	Yes - until deleted
Lambda Function	Usage aggregation	Yes - until deleted
IAM Role	Lambda permissions	Yes - until deleted
CloudWatch Logs	Lambda logs	Yes - with retention

Infrastructure Outlives Your Python Session

This infrastructure persists beyond your Python session. Restarting your application reconnects to existing resources. Rate limit state is preserved across restarts. You only pay when the limiter is used (~$0.625/1M requests, $0 for fast rejections).

Infrastructure Lifecycle¶

Both programmatic API and CLI are fully supported for managing infrastructure.

Creating Infrastructure¶

ProgrammaticCLI

Use open() which auto-provisions infrastructure if needed:

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

CloudFormation ensures the infrastructure matches your declaration.

zae-limiter deploy --name my-app --region us-east-1

Useful for: CI/CD pipelines, GitOps workflows, infrastructure-as-code.

Connecting to Existing Infrastructure¶

Use Repository.open() to connect to infrastructure, auto-provisioning if the table is missing and registering the namespace if not found:

# Open repository (auto-provisions if needed)
repo = await Repository.open()
limiter = RateLimiter(repository=repo)

Stack defaults to the ZAEL_STACK environment variable or "zae-limiter". Namespace defaults to ZAEL_NAMESPACE or "default".

Declarative State Management

Builder methods declare the desired infrastructure state. If multiple applications use the same limiter name with different settings, CloudFormation will update the stack to match the most recent declaration—similar to how Terraform applies the last-written configuration.

To maintain consistent state:

Use identical builder options across all clients sharing a limiter
Omit infrastructure builder methods in application code and manage infrastructure externally
Use different limiter names for different configurations

Checking Status¶

ProgrammaticCLI

available = await repo.ping()  # Async
# or
available = repo.ping()  # Sync

if available:
    print("Stack is ready")
else:
    print("DynamoDB not reachable")

For comprehensive status including CloudFormation details, use the CLI command.

zae-limiter status --name my-app --region us-east-1

Deleting Infrastructure¶

ProgrammaticCLI

# After you're done with the limiter
await repo.delete_stack()  # Async
# or
repo.delete_stack()  # Sync

zae-limiter delete --name my-app --region us-east-1 --yes

Data Loss

Deleting infrastructure permanently removes all rate limit data, entity configurations, and usage history. This cannot be undone.

Deployment Options

For organizations requiring strict infrastructure/application separation, see CLI deployment or CloudFormation template export.

Understanding Limits¶

Rate limiting in zae-limiter tracks who is making requests, what they're accessing, and how much they can use.

The Core Concepts¶

When you call acquire(), you specify:

entity_id: Who is being rate limited (e.g., "user-123", "api-key-abc", "tenant-xyz")
resource: What they're accessing (e.g., "gpt-4", "api", "embeddings")
consume: How much capacity this request uses
limits: The rate limit rules to apply (optional if using stored config)

Each entity has separate buckets per resource. A user rate limited on "gpt-4" can still access "gpt-3.5-turbo":

# User 123 accessing GPT-4 - tracked separately from GPT-3.5
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-4",        # Bucket: user-123 + gpt-4
    consume={"rpm": 1},
    limits=[Limit.per_minute("rpm", 10)],
) as lease:
    ...

# Same user, different resource - separate bucket
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-3.5-turbo",  # Bucket: user-123 + gpt-3.5-turbo
    consume={"rpm": 1},
    limits=[Limit.per_minute("rpm", 100)],
) as lease:
    ...

Defining Limits¶

A Limit defines a rate limit using the token bucket algorithm:

# 100 requests per minute
Limit.per_minute("rpm", 100)

# 10,000 tokens per minute with 15,000 burst capacity
Limit.per_minute("tpm", 10_000, burst=15_000)

# 1,000 requests per hour
Limit.per_hour("rph", 1_000)

# Custom: 50 requests per 30 seconds
Limit.custom("requests", capacity=50, refill_amount=50, refill_period_seconds=30)

Parameter	Description
`name`	Unique identifier (e.g., "rpm", "tpm")
`rate`	Sustained tokens per period (positional)
`burst`	Optional burst ceiling (defaults to `rate`)

See Token Bucket Algorithm for details on how rate, burst, and refill work together.

Handling Rate Limit Errors¶

When a rate limit is exceeded, RateLimitExceeded is raised with full details:

from zae_limiter import RateLimitExceeded

try:
    async with limiter.acquire(
        entity_id="user-123",
        resource="gpt-4",
        consume={"rpm": 2},  # Exceeds capacity to trigger error
        limits=[Limit.per_minute("rpm", 1)],
    ):
        await do_work()
except RateLimitExceeded as e:
    # Get retry delay
    print(f"Retry after: {e.retry_after_seconds}s")

    # For HTTP responses
    response = JSONResponse(
        status_code=429,
        content=e.as_dict(),
        headers={"Retry-After": e.retry_after_header},
    )

Centralized Configuration (v0.5.0+)¶

zae-limiter supports storing rate limit configurations in DynamoDB, eliminating the need to hardcode limits in application code.

Setting Up Defaults¶

Configure limits at system and resource levels (typically done by admins during deployment):

# Set system-wide defaults (applies to ALL resources)
zae-limiter system set-defaults -l rpm:100 -l tpm:10000

# Set resource-specific defaults (override system for this resource)
zae-limiter resource set-defaults gpt-4 -l rpm:50 -l tpm:100000
zae-limiter resource set-defaults gpt-3.5-turbo -l rpm:200 -l tpm:500000

# Set entity-specific limits (premium users)
zae-limiter entity set-limits user-premium --resource gpt-4 -l rpm:500 -l tpm:500000

Automatic Resolution¶

With limits configured, application code becomes simpler—no need to pass limits:

# Limits are resolved automatically from stored config
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-4",
    consume={"rpm": 1},  # No limits parameter needed
) as lease:
    await call_api()

Resolution order (highest to lowest precedence):

Entity level - Specific limits for entity+resource
Resource level - Default limits for a resource
System level - Global defaults for all resources
Override parameter - Fallback if no stored config

See Configuration Hierarchy for full details.

Multi-Tenant Namespaces¶

For multi-tenant applications, namespaces provide logical isolation within a single DynamoDB table:

from zae_limiter import Repository, RateLimiter

# Each tenant gets an isolated namespace
repo = await Repository.open("tenant-alpha")
limiter = RateLimiter(repository=repo)

# All operations are scoped to tenant-alpha's namespace
async with limiter.acquire(
    entity_id="user-123",
    resource="api",
    consume={"rpm": 1},
) as lease:
    await do_work()

For namespace lifecycle management and per-tenant IAM access control, see the Production Guide.

Next Steps¶

Basic Usage - Multiple limits, adjustments, capacity queries
Configuration Hierarchy - Three-tier limit resolution
Hierarchical Limits - Parent/child entities, cascade mode
LLM Integration - Token estimation and reconciliation
Deployment Guide - Production deployment options
CLI Reference - Full CLI command reference
Namespace Keys Migration - Migrating from v0.9.x to v0.10.0