Hierarchical Limits¶
zae-limiter supports two-level hierarchies for rate limiting, enabling patterns like:
- Project → API Keys: Limit total project usage while also limiting individual keys
- Organization → Users: Organization-wide limits with per-user quotas
- Tenant → Services: Multi-tenant limits with service-level controls
Creating a Hierarchy¶
# Create parent entity (project)
await limiter.create_entity(
entity_id="project-1",
name="Production Project",
)
# Create child entities (API keys) with cascade enabled
await limiter.create_entity(
entity_id="key-abc",
parent_id="project-1",
name="Web Application Key",
cascade=True, # Enforce parent limits on every acquire
)
await limiter.create_entity(
entity_id="key-xyz",
parent_id="project-1",
name="Mobile App Key",
cascade=True,
)
Same-Namespace Constraint
Parent and child entities must belong to the same namespace. Cross-namespace hierarchies are not supported. If you use multiple namespaces, create separate entity hierarchies within each namespace.
Cascade Mode¶
Create entities with cascade=True to apply rate limits to both the child and parent on every acquire() call:
# Cascade is set once at entity creation
await limiter.create_entity(
entity_id="key-abc",
parent_id="project-1",
cascade=True, # All acquire() calls will also check parent
)
# acquire() automatically cascades to parent — no flag needed
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=[
Limit.per_minute("tpm", 10_000), # Per-key limit
],
consume={"tpm": 500},
) as lease:
await call_api()
Performance Impact
Cascade mode adds overhead: +1 GetEntity + parent bucket operations. Only enable when hierarchical enforcement is needed. See Batch Operation Patterns for optimization strategies.
Speculative Writes with Cascade
With speculative_writes=True, the first cascade acquire completes in 2 sequential round trips (0 RCU, 2 WCU). After the first acquire populates the entity metadata cache, subsequent cascade acquires issue child and parent writes in parallel -- completing in just 1 round trip (0 RCU, 2 WCU) with the same cost but lower latency. If the parent needs refill, a deferred compensation optimization avoids unnecessary child compensation by attempting a parent-only slow path first. See Speculative Writes for details.
What happens:
- Check if
key-abchas capacity (10k tpm) - Check if
project-1has capacity (uses same limits) - If both pass, consume from both atomically
- If either fails, reject with details about which limit was exceeded
Different Limits Per Level¶
Set different limits for parents and children:
# Set project-level limits (higher)
await limiter.set_limits(
entity_id="project-1",
limits=[
Limit.per_minute("tpm", 100_000), # 100k for entire project
],
)
# Set key-level limits (lower)
await limiter.set_limits(
entity_id="key-abc",
limits=[
Limit.per_minute("tpm", 10_000), # 10k per key
],
)
# acquire() auto-cascades because key-abc was created with cascade=True
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=None, # Auto-resolves from stored config
consume={"tpm": 500},
) as lease:
await call_api()
Understanding Cascade Behavior¶
Without Cascade¶
# Entity created without cascade (default)
await limiter.create_entity(entity_id="key-abc", parent_id="project-1")
# Only checks/consumes from key-abc
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=[Limit.per_minute("tpm", 10_000)],
consume={"tpm": 500},
) as lease:
...
With Cascade¶
# Entity created with cascade enabled
await limiter.create_entity(entity_id="key-abc", parent_id="project-1", cascade=True)
# Checks/consumes from BOTH key-abc AND project-1
async with limiter.acquire(
entity_id="key-abc",
resource="gpt-4",
limits=[Limit.per_minute("tpm", 10_000)],
consume={"tpm": 500},
) as lease:
...
Error Handling with Hierarchies¶
When an entity has cascade enabled, RateLimitExceeded includes statuses for all entities:
try:
async with limiter.acquire(
entity_id="key-abc", # Has cascade=True from create_entity()
resource="gpt-4",
limits=[Limit.per_minute("rpm", 100)],
consume={"rpm": 1},
):
pass
except RateLimitExceeded as e:
for status in e.statuses:
print(f"Entity: {status.entity_id}")
print(f" Limit: {status.limit_name}")
print(f" Available: {status.available}")
print(f" Exceeded: {status.exceeded}")
Use Cases¶
Multi-Tenant SaaS¶
# Tenant has 1M tokens/day
await limiter.set_limits(
entity_id="tenant-acme",
limits=[Limit.per_day("tpd", 1_000_000)],
)
# Create user under tenant with cascade enabled
await limiter.create_entity(entity_id="user-123", parent_id="tenant-acme", cascade=True)
# Each user gets 100k tokens/day
await limiter.set_limits(
entity_id="user-123",
limits=[Limit.per_day("tpd", 100_000)],
)
# Rate limit user — auto-cascades to tenant
# limits=None auto-resolves from stored config
async with limiter.acquire(
entity_id="user-123",
resource="gpt-4",
limits=None,
consume={"tpm": 500},
) as lease:
...
API Key Management¶
# Project limit: 10k RPM
await limiter.set_limits(
entity_id="project-prod",
limits=[Limit.per_minute("rpm", 10_000)],
)
# Production key: 5k RPM (half of project)
await limiter.set_limits(
entity_id="key-prod",
limits=[Limit.per_minute("rpm", 5_000)],
)
# Staging key: 1k RPM
await limiter.set_limits(
entity_id="key-staging",
limits=[Limit.per_minute("rpm", 1_000)],
)
Limitations¶
- Two levels only: Parent → Child (no grandparents)
- Single parent: Each entity can have at most one parent
- Cascade is per-entity: Set
cascade=Trueoncreate_entity()to enable; it applies to allacquire()calls for that entity
Next Steps¶
- LLM Integration - Token estimation patterns
- Unavailability Handling - Handling service outages
- API Reference - Complete API documentation