Rate Limiting Algorithms
Seven battle-tested algorithms, all behind a unified Evaluator interface.
Fixed Window
Count requests within a fixed interval. Simple, cheap, easy to understand. Great for org-wide basic limits.
Trade-off: burst at window boundaries.
Sliding Window Log
Store timestamps of all events and count within rolling window. Very accurate, exact enforcement.
Trade-off: memory-heavy, expensive at scale.
Sliding Window Counter
Approximate rolling window using sub-buckets and interpolation. Better fairness than fixed window.
Trade-off: approximation complexity.
Token Bucket
Tokens refill over time; each request consumes tokens. Industry standard for APIs, supports bursts well.
Trade-off: refill math & atomic updates.
Leaky Bucket
Queue requests into a bucket drained at steady rate. Smooth traffic shaping for egress.
Trade-off: less intuitive than token bucket.
Concurrency Limiter
Limit simultaneously in-flight operations. Protect heavy dependencies and expensive tasks.
Trade-off: requires acquire/release lifecycle.
Quota / Budget Limiter
Long-window budget (per-day/week/month). Perfect for SaaS plans and telemetry budgets.
Trade-off: not enough for short-burst protection alone.
Action Types
Go beyond allow/deny. RLAAS supports eight actions per policy:
| Action | Description | Example |
| Allow | Request passes without modification | Within limits |
| Deny | Reject entirely — HTTP 429 or gRPC RESOURCE_EXHAUSTED | Rate exceeded |
| Delay | Allow after a configurable wait period | Egress calls, background jobs |
| Sample | Allow only a fraction of requests/events | Keep 10% of debug logs |
| Drop | Discard event without processing | Low-value debug telemetry |
| Downgrade | Reduce priority or transform handling | Standard pipeline instead of premium |
| Drop Low Priority | Preserve high-value events, drop low-priority ones | Mixed-priority event streams |
| Shadow Only | Record decision without enforcing | Pre-rollout dry run |
Policy Matching Dimensions
Match on 20+ dimensions for fine-grained control. Every field is optional — use as many or as few as your use case requires.
org_id
tenant_id
application
service
environment
signal_type
operation
endpoint
method
user_id
api_key
client_id
source_ip
region
resource
severity
span_name
topic
consumer_group
job_type
tags (key=value)
Precedence Order (most → least specific)
- User-level override
- API key / client override
- Endpoint + method
- Operation
- Service
- Application
- Tenant
- Org
- Signal type
- Global default
Advanced match_expr Expressions
For complex conditions, use match_expr in policy metadata:
match_expr: "region==us-east-1 && tag.env==production && method!=DELETE"
Supports ==, != operators with && conjunction. Fields include all scope dimensions plus tag. prefix for tag lookups.
Performance & Optimizations
Benchmark Results
All benchmarks run with go test ./benchmarks -bench . -benchmem. Zero heap allocations on hot paths.
| Operation | Latency | Allocations |
| Memory store increment (single key) | ~105–162 ns/op | 0 allocs |
| Memory store increment (many keys) | ~6–7 ns/op | 0 allocs |
| Fixed window evaluation | ~1,600 ns/op | 16 allocs |
| HTTP /v1/check handler | ~10,000 ns/op | — |
| HTTP acquire+release | ~16,500 ns/op | — |
Optimizations Implemented
- Lock-Sharded Counters: 64 shards default, FNV32a hash, per-shard mutex — eliminates global lock contention
- Async Invalidation Dispatcher: 256-item buffered queue, up to 4 workers decouple API latency from webhook delivery
- Bounded Worker Pool: Up to 8 goroutines for publishInvalidation fanout — prevents goroutine explosion
- Burst Coalescing: Sidecar drains invalidation channel to avoid redundant sync cycles
- Zero-Allocation Hot Path: Counter operations allocate nothing on the heap
Advanced Capabilities
OTEL Processor Primitives
Batch-process logs and spans through rate-limiting policies. Worker pool with configurable concurrency. Supports fail-open and fail-closed modes. Collect processor stats (checked, allowed, denied, errors).
Multi-Region Allocation
Deploy rate limiting across multiple geographic regions with intelligent limit distribution and overflow protection.
Weighted Proportional Split
Distribute a global limit across regions based on configurable weights. A region with weight 5 gets 5× the allocation of a region with weight 1. Remainder correction ensures the allocations sum to exactly the global limit.
Overflow Detection
Compare real-time per-region usage against allocated limits. Instantly identify which regions have exceeded their allocation and by how much — enabling rebalancing or alerting.
Region-Scoped Policies
Use the region scope dimension or match_expr to create policies that apply only in specific regions (e.g., region==us-east-1).
How It Works
Define region weights and call AllocateGlobalLimit:
// Global limit: 10,000 req/min split across 3 regions
weights := []region.RegionWeight{
{Region: "us-east-1", Weight: 5},
{Region: "eu-west-1", Weight: 3},
{Region: "ap-south-1", Weight: 2},
}
allocations := region.AllocateGlobalLimit(10000, weights)
// → us-east-1: 5000, eu-west-1: 3000, ap-south-1: 2000
// Check for overflow
overflow := region.RegionalOverflow(currentUsage, allocations)
// → map["us-east-1": 250] (exceeded by 250)
Approximate Global, Exact Regional
Each region enforces its allocated limit locally with exact precision. Global enforcement is approximate (sum of regional allocations) — the practical trade-off for low-latency distributed rate limiting.
Non-Go Client SDKs
Full-featured HTTP client SDKs for Python, TypeScript, Java, and .NET. All support check, acquire/release, policy CRUD, validate/rollout/rollback, audit/versions, and analytics.
Sidecar Invalidation
In-process broker with async push fanout to sidecars. Bounded workers prevent HTTP request storms. Sidecars coalesce burst invalidations into single sync operations.