Hierarchical Limits¶
API Changes Planned
The cascade API is being redesigned. The cascade parameter will move from acquire() to create_entity(), making cascade behavior a property of the entity rather than each call site. This prevents accidental bypass when multiple libraries share a limiter. See issue #116 for details.
zae-limiter supports two-level hierarchies for rate limiting, enabling patterns like:
- Project → API Keys: Limit total project usage while also limiting individual keys
- Organization → Users: Organization-wide limits with per-user quotas
- Tenant → Services: Multi-tenant limits with service-level controls
Creating a Hierarchy¶
# Create parent entity (project)
await limiter.create_entity(
entity_id="project-1",
name="Production Project",
)
# Create child entities (API keys)
await limiter.create_entity(
entity_id="key-abc",
parent_id="project-1",
name="Web Application Key",
)
await limiter.create_entity(
entity_id="key-xyz",
parent_id="project-1",
name="Mobile App Key",
)
Cascade Mode¶
Use cascade=True to apply rate limits to both the child and parent:
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=[
Limit.per_minute("tpm", 10_000), # Per-key limit
],
consume={"tpm": 500},
cascade=True, # Also applies to parent (project-1)
) as lease:
await call_api()
Performance Impact
Cascade mode adds overhead: +1 GetEntity + parent bucket operations. Only enable when hierarchical enforcement is needed. See Batch Operation Patterns for optimization strategies.
What happens:
- Check if
key-abchas capacity (10k tpm) - Check if
project-1has capacity (uses same limits) - If both pass, consume from both atomically
- If either fails, reject with details about which limit was exceeded
Different Limits Per Level¶
Set different limits for parents and children:
# Set project-level limits (higher)
await limiter.set_limits(
entity_id="project-1",
limits=[
Limit.per_minute("tpm", 100_000), # 100k for entire project
],
)
# Set key-level limits (lower)
await limiter.set_limits(
entity_id="key-abc",
limits=[
Limit.per_minute("tpm", 10_000), # 10k per key
],
)
# Use stored limits with cascade
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=[Limit.per_minute("tpm", 5_000)], # Default
consume={"tpm": 500},
cascade=True,
use_stored_limits=True, # Uses stored limits for both levels
) as lease:
await call_api()
Understanding Cascade Behavior¶
Without Cascade¶
# Only checks/consumes from key-abc
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=[Limit.per_minute("tpm", 10_000)],
consume={"tpm": 500},
cascade=False, # Default
) as lease:
...
With Cascade¶
# Checks/consumes from BOTH key-abc AND project-1
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=[Limit.per_minute("tpm", 10_000)],
consume={"tpm": 500},
cascade=True,
) as lease:
...
Error Handling with Hierarchies¶
When using cascade mode, RateLimitExceeded includes statuses for all entities:
try:
async with limiter.acquire(
entity_id="key-abc",
cascade=True,
...
):
...
except RateLimitExceeded as e:
for status in e.statuses:
print(f"Entity: {status.entity_id}")
print(f" Limit: {status.limit_name}")
print(f" Available: {status.available}")
print(f" Exceeded: {status.exceeded}")
Use Cases¶
Multi-Tenant SaaS¶
# Tenant has 1M tokens/day
await limiter.set_limits(
entity_id="tenant-acme",
limits=[Limit.per_day("tpd", 1_000_000)],
)
# Each user gets 100k tokens/day
await limiter.set_limits(
entity_id="user-123",
limits=[Limit.per_day("tpd", 100_000)],
)
# Rate limit user, cascade to tenant
async with limiter.acquire(
entity_id="user-123",
cascade=True,
use_stored_limits=True,
...
):
...
API Key Management¶
# Project limit: 10k RPM
await limiter.set_limits(
entity_id="project-prod",
limits=[Limit.per_minute("rpm", 10_000)],
)
# Production key: 5k RPM (half of project)
await limiter.set_limits(
entity_id="key-prod",
limits=[Limit.per_minute("rpm", 5_000)],
)
# Staging key: 1k RPM
await limiter.set_limits(
entity_id="key-staging",
limits=[Limit.per_minute("rpm", 1_000)],
)
Limitations¶
- Two levels only: Parent → Child (no grandparents)
- Single parent: Each entity can have at most one parent
- Cascade is optional: Must be explicitly enabled per call
Next Steps¶
- LLM Integration - Token estimation patterns
- Unavailability Handling - Handling service outages
- API Reference - Complete API documentation