Skip to content

zae-limiter

PyPI version Python versions License CI codecov

A rate limiting library backed by DynamoDB using the token bucket algorithm.

Overview

zae-limiter excels at rate limiting scenarios where:

  • Multiple limits are tracked per call (requests per minute, tokens per minute)
  • Consumption is unknown upfront — adjust limits after the operation completes
  • Hierarchical limits exist (API key → project, tenant → user)
  • Cost matters — ~$1/1M requests (details)

Features

  • Token Bucket Algorithm - Precise rate limiting with configurable burst capacity
  • Multiple Limits - Track requests per minute, tokens per minute, etc. in a single call
  • Hierarchical Entities - Two-level hierarchy (project → API keys) with cascade mode
  • Atomic Transactions - Multi-key updates via DynamoDB TransactWriteItems
  • Rollback on Exception - Automatic rollback if your code throws
  • Stored Limits - Configure per-entity limits in DynamoDB
  • Usage Analytics - Lambda aggregator for hourly/daily usage snapshots
  • Audit Logging - Track entity and limit changes for compliance
  • Async + Sync APIs - First-class async support with sync wrapper

Quick Example

from zae_limiter import RateLimiter, Limit, StackOptions

# Async rate limiter with declarative infrastructure
limiter = RateLimiter(
    name="my-app",
    region="us-east-1",
    stack_options=StackOptions(),  # Declare desired state - CloudFormation ensures it
)

# Define default limits (can be overridden per-entity)
default_limits = [
    Limit.per_minute("rpm", 100),
    Limit.per_minute("tpm", 10_000, burst=50_000),  # Token bucket with burst
]

async with limiter.acquire(
    entity_id="api-key-123",
    resource="gpt-4",
    limits=default_limits,
    consume={"rpm": 1, "tpm": 500},  # Estimate tokens upfront
) as lease:
    response = await call_llm()
    # Reconcile actual usage (can go negative for post-hoc adjustment)
    await lease.adjust(tpm=response.usage.total_tokens - 500)
    # On success: committed | On exception: rolled back automatically

# Hierarchical entities: project → API key
await limiter.create_entity(entity_id="proj-1", name="Production")
await limiter.set_limits("proj-1", [Limit.per_minute("tpm", 100_000)])
await limiter.create_entity(entity_id="api-key-456", parent_id="proj-1")

# cascade=True enforces both key AND project limits
async with limiter.acquire(
    entity_id="api-key-456",
    resource="gpt-4",
    limits=default_limits,
    consume={"rpm": 1, "tpm": 500},
    cascade=True,  # Also checks parent's stored limits
    use_stored_limits=True,  # Uses proj-1's 100k tpm limit
) as lease:
    response = await call_llm()

Why DynamoDB?

  • Serverless - No infrastructure to manage, 99.99% SLA
  • Regional - Deploy independently per region with low latency
  • Scalable - Handles millions of requests per second
  • Cost-effective - Pay per request, no idle costs
  • Atomic - TransactWriteItems for multi-key consistency

Next Steps

Guide Description
Getting Started Installation and first deployment
Basic Usage Rate limiting patterns and error handling
Hierarchical Limits Parent/child entities, cascade mode
LLM Integration Token estimation and reconciliation
Production Guide Security, monitoring, cost
CLI Reference Deploy, status, delete commands