Skip to content

Home

zae-limiter zae-limiter

PyPI version Conda version Python versions License Lint Tests codecov

A rate limiting library backed by DynamoDB using the token bucket algorithm.

Overview

zae-limiter excels at rate limiting scenarios where:

  • Multiple limits are tracked per call (requests per minute, tokens per minute)
  • Consumption is unknown upfront — adjust limits after the operation completes
  • Hierarchical limits exist (API key → project, tenant → user)
  • Cost matters — ~$0.625/1M requests, $0 for fast rejections (details)

Features

  • Token Bucket Algorithm - Precise rate limiting with configurable capacity and refill rates
  • Multiple Limits - Track requests per minute, tokens per minute, etc. in a single call
  • Hierarchical Entities - Two-level hierarchy (project → API keys) with cascade mode
  • Atomic Transactions - Multi-key updates via DynamoDB TransactWriteItems
  • Write-on-Enter with Rollback - Tokens consumed immediately on acquire; compensating writes on exception
  • Stored Limits - Configure per-entity limits in DynamoDB
  • Usage Analytics - Lambda aggregator for hourly/daily usage snapshots
  • Audit Logging - Track entity and limit changes for compliance
  • Multi-Tenant Isolation - Namespace-scoped data isolation with per-tenant IAM policies
  • Async + Sync APIs - First-class async support with sync wrapper

Quick Example

from zae_limiter import Repository, RateLimiter, Limit

# Async rate limiter (auto-provisions if needed)
repo = await Repository.open()
limiter = RateLimiter(repository=repo)

# Define default limits (can be overridden per-entity)
default_limits = [
    Limit.per_minute("rpm", 100),
    Limit.per_minute("tpm", 10_000),
]

async with limiter.acquire(
    entity_id="api-key-123",
    resource="gpt-4",
    limits=default_limits,
    consume={"rpm": 1, "tpm": 500},  # Estimate tokens upfront
) as lease:
    response = await call_llm()
    # Reconcile actual usage (can go negative for post-hoc adjustment)
    await lease.adjust(tpm=response.usage.total_tokens - 500)
    # Tokens written to DynamoDB on enter | Rolled back on exception

# Hierarchical entities: project → API key
await limiter.create_entity(entity_id="proj-1", name="Production")
await limiter.set_limits("proj-1", [Limit.per_minute("tpm", 100_000)])
await limiter.create_entity(entity_id="api-key-456", parent_id="proj-1", cascade=True)

# cascade is an entity property — acquire() auto-cascades to parent
# limits=None auto-resolves from stored config (Entity > Resource > System)
async with limiter.acquire(
    entity_id="api-key-456",
    resource="gpt-4",
    limits=None,
    consume={"rpm": 1, "tpm": 500},
) as lease:
    response = await call_llm()

Why DynamoDB?

  • Serverless - No infrastructure to manage, 99.99% SLA
  • Regional - Deploy independently per region with low latency
  • Scalable - Handles millions of requests per second
  • Cost-effective - Pay per request, no idle costs
  • Atomic - TransactWriteItems for multi-key consistency

Next Steps

Guide Description
Getting Started Installation and first deployment
Basic Usage Rate limiting patterns and error handling
Hierarchical Limits Parent/child entities, cascade mode
LLM Integration Token estimation and reconciliation
Production Guide Security, monitoring, cost
Multi-Tenant Guide Namespace isolation, per-tenant IAM
CLI Reference Deploy, status, delete commands