Skip to content

Getting Started

This guide will help you install zae-limiter and set up rate limiting in your application.

Installation

pip install zae-limiter
uv pip install zae-limiter
poetry add zae-limiter
conda install -c conda-forge zae-limiter

Quick Start

zae-limiter creates its own infrastructure automatically.

Minimalist

For scripts and quick demos, pass limits inline:

from zae_limiter import Repository, RateLimiter, Limit, RateLimitExceeded

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

try:
    async with limiter.acquire(
        entity_id="user-123",
        resource="api",
        consume={"requests": 1},
        limits=[Limit.per_minute("requests", 100)],
    ) as lease:
        await do_work()
except RateLimitExceeded as e:
    print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")

# Clean up when done
await repo.delete_stack()

For production, configure limits once and keep application code simple.

Step 1: Deploy and configure

# Deploy infrastructure
zae-limiter deploy --name my-app --region us-east-1

# Configure limits (apply to all entities)
zae-limiter system set-defaults --name my-app -l rpm:1000 -l tpm:100000
from zae_limiter import Repository, RateLimiter, Limit

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

await limiter.set_system_defaults(limits=[
    Limit.per_minute("rpm", 1000),
    Limit.per_minute("tpm", 100000),
])

Step 2: Use in your application

from zae_limiter import Repository, RateLimiter, RateLimitExceeded

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

try:
    async with limiter.acquire(
        entity_id="user-123",
        resource="api",
        consume={"rpm": 1, "tpm": 500},  # Limits resolved automatically
    ) as lease:
        await do_work()
except RateLimitExceeded as e:
    print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")

Infrastructure Persistence

When you use infrastructure builder methods, zae-limiter creates real AWS infrastructure via CloudFormation:

Resource Purpose Persists?
DynamoDB Table Rate limit state, entities, usage Yes - until deleted
Lambda Function Usage aggregation Yes - until deleted
IAM Role Lambda permissions Yes - until deleted
CloudWatch Logs Lambda logs Yes - with retention

Infrastructure Outlives Your Python Session

This infrastructure persists beyond your Python session. Restarting your application reconnects to existing resources. Rate limit state is preserved across restarts. You only pay when the limiter is used (~$0.625/1M requests, $0 for fast rejections).

Infrastructure Lifecycle

Both programmatic API and CLI are fully supported for managing infrastructure.

Creating Infrastructure

Use open() which auto-provisions infrastructure if needed:

repo = await Repository.open()
limiter = RateLimiter(repository=repo)

CloudFormation ensures the infrastructure matches your declaration.

zae-limiter deploy --name my-app --region us-east-1

Useful for: CI/CD pipelines, GitOps workflows, infrastructure-as-code.

Connecting to Existing Infrastructure

Use Repository.open() to connect to infrastructure, auto-provisioning if the table is missing and registering the namespace if not found:

# Open repository (auto-provisions if needed)
repo = await Repository.open()
limiter = RateLimiter(repository=repo)

Stack defaults to the ZAEL_STACK environment variable or "zae-limiter". Namespace defaults to ZAEL_NAMESPACE or "default".

Declarative State Management

Builder methods declare the desired infrastructure state. If multiple applications use the same limiter name with different settings, CloudFormation will update the stack to match the most recent declaration—similar to how Terraform applies the last-written configuration.

To maintain consistent state:

  • Use identical builder options across all clients sharing a limiter
  • Omit infrastructure builder methods in application code and manage infrastructure externally
  • Use different limiter names for different configurations

Checking Status

available = await repo.ping()  # Async
# or
available = repo.ping()  # Sync

if available:
    print("Stack is ready")
else:
    print("DynamoDB not reachable")

For comprehensive status including CloudFormation details, use the CLI command.

zae-limiter status --name my-app --region us-east-1

Deleting Infrastructure

# After you're done with the limiter
await repo.delete_stack()  # Async
# or
repo.delete_stack()  # Sync
zae-limiter delete --name my-app --region us-east-1 --yes

Data Loss

Deleting infrastructure permanently removes all rate limit data, entity configurations, and usage history. This cannot be undone.

Deployment Options

For organizations requiring strict infrastructure/application separation, see CLI deployment or CloudFormation template export.

Understanding Limits

Rate limiting in zae-limiter tracks who is making requests, what they're accessing, and how much they can use.

The Core Concepts

When you call acquire(), you specify:

  • entity_id: Who is being rate limited (e.g., "user-123", "api-key-abc", "tenant-xyz")
  • resource: What they're accessing (e.g., "gpt-4", "api", "embeddings")
  • consume: How much capacity this request uses
  • limits: The rate limit rules to apply (optional if using stored config)

Each entity has separate buckets per resource. A user rate limited on "gpt-4" can still access "gpt-3.5-turbo":

# User 123 accessing GPT-4 - tracked separately from GPT-3.5
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-4",        # Bucket: user-123 + gpt-4
    consume={"rpm": 1},
    limits=[Limit.per_minute("rpm", 10)],
) as lease:
    ...

# Same user, different resource - separate bucket
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-3.5-turbo",  # Bucket: user-123 + gpt-3.5-turbo
    consume={"rpm": 1},
    limits=[Limit.per_minute("rpm", 100)],
) as lease:
    ...

Defining Limits

A Limit defines a rate limit using the token bucket algorithm:

# 100 requests per minute
Limit.per_minute("rpm", 100)

# 10,000 tokens per minute with 15,000 burst capacity
Limit.per_minute("tpm", 10_000, burst=15_000)

# 1,000 requests per hour
Limit.per_hour("rph", 1_000)

# Custom: 50 requests per 30 seconds
Limit.custom("requests", capacity=50, refill_amount=50, refill_period_seconds=30)
Parameter Description
name Unique identifier (e.g., "rpm", "tpm")
rate Sustained tokens per period (positional)
burst Optional burst ceiling (defaults to rate)

See Token Bucket Algorithm for details on how rate, burst, and refill work together.

Handling Rate Limit Errors

When a rate limit is exceeded, RateLimitExceeded is raised with full details:

from zae_limiter import RateLimitExceeded

try:
    async with limiter.acquire(
        entity_id="user-123",
        resource="gpt-4",
        consume={"rpm": 2},  # Exceeds capacity to trigger error
        limits=[Limit.per_minute("rpm", 1)],
    ):
        await do_work()
except RateLimitExceeded as e:
    # Get retry delay
    print(f"Retry after: {e.retry_after_seconds}s")

    # For HTTP responses
    response = JSONResponse(
        status_code=429,
        content=e.as_dict(),
        headers={"Retry-After": e.retry_after_header},
    )

Centralized Configuration (v0.5.0+)

zae-limiter supports storing rate limit configurations in DynamoDB, eliminating the need to hardcode limits in application code.

Setting Up Defaults

Configure limits at system and resource levels (typically done by admins during deployment):

# Set system-wide defaults (applies to ALL resources)
zae-limiter system set-defaults -l rpm:100 -l tpm:10000

# Set resource-specific defaults (override system for this resource)
zae-limiter resource set-defaults gpt-4 -l rpm:50 -l tpm:100000
zae-limiter resource set-defaults gpt-3.5-turbo -l rpm:200 -l tpm:500000

# Set entity-specific limits (premium users)
zae-limiter entity set-limits user-premium --resource gpt-4 -l rpm:500 -l tpm:500000

Automatic Resolution

With limits configured, application code becomes simpler—no need to pass limits:

# Limits are resolved automatically from stored config
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-4",
    consume={"rpm": 1},  # No limits parameter needed
) as lease:
    await call_api()

Resolution order (highest to lowest precedence):

  1. Entity level - Specific limits for entity+resource
  2. Resource level - Default limits for a resource
  3. System level - Global defaults for all resources
  4. Override parameter - Fallback if no stored config

See Configuration Hierarchy for full details.

Multi-Tenant Namespaces

For multi-tenant applications, namespaces provide logical isolation within a single DynamoDB table:

from zae_limiter import Repository, RateLimiter

# Each tenant gets an isolated namespace
repo = await Repository.open("tenant-alpha")
limiter = RateLimiter(repository=repo)

# All operations are scoped to tenant-alpha's namespace
async with limiter.acquire(
    entity_id="user-123",
    resource="api",
    consume={"rpm": 1},
) as lease:
    await do_work()

For namespace lifecycle management and per-tenant IAM access control, see the Production Guide.

Next Steps