Getting Started¶

This guide will help you install zae-limiter and set up rate limiting in your application.

Installation¶

pipuvpoetry

pip install zae-limiter

uv pip install zae-limiter

poetry add zae-limiter

Quick Start¶

zae-limiter creates its own infrastructure automatically. Here's a complete example:

Async API (Recommended)¶

from zae_limiter import RateLimiter, Limit, StackOptions, RateLimitExceeded

# Create rate limiter with declarative infrastructure
limiter = RateLimiter(
    name="my-app",           # ZAEL-my-app resources in AWS
    region="us-east-1",
    stack_options=StackOptions(),  # Declare desired state - CloudFormation ensures it
)

try:
    async with limiter.acquire(
        entity_id="user-123",
        resource="api",
        limits=[Limit.per_minute("requests", 100)],
        consume={"requests": 1},
    ) as lease:
        await do_work()
except RateLimitExceeded as e:
    print(f"Rate limited! Retry after {e.retry_after_seconds:.1f}s")

Sync API¶

from zae_limiter import SyncRateLimiter, Limit, StackOptions

limiter = SyncRateLimiter(
    name="my-app",
    region="us-east-1",
    stack_options=StackOptions(),
)

with limiter.acquire(
    entity_id="user-123",
    resource="api",
    limits=[Limit.per_minute("requests", 100)],
    consume={"requests": 1},
) as lease:
    response = call_api()

Infrastructure Persistence¶

When you pass stack_options=StackOptions(), zae-limiter creates real AWS infrastructure via CloudFormation:

Resource	Purpose	Persists?
DynamoDB Table	Rate limit state, entities, usage	Yes - until deleted
Lambda Function	Usage aggregation	Yes - until deleted
IAM Role	Lambda permissions	Yes - until deleted
CloudWatch Logs	Lambda logs	Yes - with retention

Infrastructure Outlives Your Python Session

This infrastructure persists beyond your Python session. Restarting your application reconnects to existing resources. Rate limit state is preserved across restarts. You only pay when the limiter is used (~$1/1M requests).

Infrastructure Lifecycle¶

Both programmatic API and CLI are fully supported for managing infrastructure.

Creating Infrastructure¶

ProgrammaticCLI

Pass stack_options to declare the desired infrastructure state:

limiter = RateLimiter(
    name="my-app",
    stack_options=StackOptions(),  # Desired state declaration
)

CloudFormation ensures the infrastructure matches your declaration.

zae-limiter deploy --name my-app --region us-east-1

Useful for: CI/CD pipelines, GitOps workflows, infrastructure-as-code.

Connecting to Existing Infrastructure¶

If you omit stack_options, the limiter connects to existing infrastructure without attempting to create or modify it:

# Connect only - fails if ZAEL-my-app doesn't exist
limiter = RateLimiter(
    name="my-app",
    region="us-east-1",
    # No stack_options = connect only, no create/update
)

This is useful when infrastructure is managed separately (e.g., via CLI or Terraform).

Declarative State Management

StackOptions declares the desired infrastructure state. If multiple applications use the same limiter name with different settings, CloudFormation will update the stack to match the most recent declaration—similar to how Terraform applies the last-written configuration.

To maintain consistent state:

Use identical StackOptions across all clients sharing a limiter
Omit stack_options in application code and manage infrastructure externally
Use different limiter names for different configurations

Checking Status¶

zae-limiter status --name my-app --region us-east-1

Deleting Infrastructure¶

ProgrammaticCLI

# After you're done with the limiter
await limiter.delete_stack()  # Async
# or
limiter.delete_stack()  # Sync

zae-limiter delete --name my-app --region us-east-1 --yes

Data Loss

Deleting infrastructure permanently removes all rate limit data, entity configurations, and usage history. This cannot be undone.

Deployment Options

For organizations requiring strict infrastructure/application separation, see CLI deployment or CloudFormation template export.

Understanding Limits¶

Rate limiting in zae-limiter tracks who is making requests, what they're accessing, and how much they can use.

The Core Concepts¶

When you call acquire(), you specify:

entity_id: Who is being rate limited (e.g., "user-123", "api-key-abc", "tenant-xyz")
resource: What they're accessing (e.g., "gpt-4", "api", "embeddings")
limits: The rate limit rules to apply
consume: How much capacity this request uses

Each entity has separate buckets per resource. A user rate limited on "gpt-4" can still access "gpt-3.5-turbo":

# User 123 accessing GPT-4 - tracked separately from GPT-3.5
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-4",        # Bucket: user-123 + gpt-4
    limits=[Limit.per_minute("rpm", 10)],
    consume={"rpm": 1},
) as lease:
    ...

# Same user, different resource - separate bucket
async with limiter.acquire(
    entity_id="user-123",
    resource="gpt-3.5-turbo",  # Bucket: user-123 + gpt-3.5-turbo
    limits=[Limit.per_minute("rpm", 100)],
    consume={"rpm": 1},
) as lease:
    ...

Defining Limits¶

A Limit defines a rate limit using the token bucket algorithm:

# 100 requests per minute
Limit.per_minute("rpm", 100)

# 10,000 tokens per minute with 15,000 burst capacity
Limit.per_minute("tpm", 10_000, burst=15_000)

# 1,000 requests per hour
Limit.per_hour("rph", 1_000)

# Custom: 50 requests per 30 seconds
Limit.custom("requests", capacity=50, refill_period_seconds=30)

Parameter	Description
`name`	Unique identifier (e.g., "rpm", "tpm")
`capacity`	Tokens that refill per period (sustained rate)
`burst`	Maximum bucket size (defaults to capacity)

See Token Bucket Algorithm for details on how capacity, burst, and refill work together.

Handling Rate Limit Errors¶

When a rate limit is exceeded, RateLimitExceeded is raised with full details:

from zae_limiter import RateLimitExceeded

try:
    async with limiter.acquire(...):
        await do_work()
except RateLimitExceeded as e:
    # Get retry delay
    print(f"Retry after: {e.retry_after_seconds}s")

    # For HTTP responses
    return JSONResponse(
        status_code=429,
        content=e.as_dict(),
        headers={"Retry-After": e.retry_after_header},
    )

Next Steps¶

Basic Usage - Multiple limits, adjustments, capacity queries
Hierarchical Limits - Parent/child entities, cascade mode
LLM Integration - Token estimation and reconciliation
Deployment Guide - Production deployment options
CLI Reference - Full CLI command reference