Features — RLAAS

Rate Limiting Algorithms

Seven battle-tested algorithms, all behind a unified Evaluator interface.

Fixed Window

Count requests within a fixed interval. Simple, cheap, easy to understand. Great for org-wide basic limits.
Trade-off: burst at window boundaries.

Sliding Window Log

Store timestamps of all events and count within rolling window. Very accurate, exact enforcement.
Trade-off: memory-heavy, expensive at scale.

Sliding Window Counter

Approximate rolling window using sub-buckets and interpolation. Better fairness than fixed window.
Trade-off: approximation complexity.

Token Bucket

Tokens refill over time; each request consumes tokens. Industry standard for APIs, supports bursts well.
Trade-off: refill math & atomic updates.

Leaky Bucket

Queue requests into a bucket drained at steady rate. Smooth traffic shaping for egress.
Trade-off: less intuitive than token bucket.

Concurrency Limiter

Limit simultaneously in-flight operations. Protect heavy dependencies and expensive tasks.
Trade-off: requires acquire/release lifecycle.

Quota / Budget Limiter

Long-window budget (per-day/week/month). Perfect for SaaS plans and telemetry budgets.
Trade-off: not enough for short-burst protection alone.

Action Types

Go beyond allow/deny. RLAAS supports eight actions per policy:

Action	Description	Example
Allow	Request passes without modification	Within limits
Deny	Reject entirely — HTTP 429 or gRPC RESOURCE_EXHAUSTED	Rate exceeded
Delay	Allow after a configurable wait period	Egress calls, background jobs
Sample	Allow only a fraction of requests/events	Keep 10% of debug logs
Drop	Discard event without processing	Low-value debug telemetry
Downgrade	Reduce priority or transform handling	Standard pipeline instead of premium
Drop Low Priority	Preserve high-value events, drop low-priority ones	Mixed-priority event streams
Shadow Only	Record decision without enforcing	Pre-rollout dry run

Policy Matching Dimensions

Match on 20+ dimensions for fine-grained control. Every field is optional — use as many or as few as your use case requires.

Precedence Order (most → least specific)

User-level override
API key / client override
Endpoint + method
Operation
Service
Application
Tenant
Org
Signal type
Global default

Advanced match_expr Expressions

For complex conditions, use match_expr in policy metadata:

match_expr: "region==us-east-1 && tag.env==production && method!=DELETE"

Supports ==, != operators with && conjunction. Fields include all scope dimensions plus tag. prefix for tag lookups.

Deployment Models

Mode A: Embedded Go SDK

Import as a Go library. Sub-millisecond decisions, no network hop. Best for Go services, OTEL collectors, latency-sensitive workloads.

Mode B: Centralized HTTP/gRPC

Language-agnostic decision service. Centralized governance and telemetry. Best for polyglot environments and easy version management.

Mode C: Sidecar / Agent

Local proxy in Kubernetes. Low latency with central governance. Local caching + invalidation sync. Best for K8s and team-wide standardization.

Mode D: Hybrid (Recommended)

All three simultaneously. SDK for Go, centralized for non-Go, sidecar for K8s. Maximum flexibility and future extensibility.

Backend Support

Counter Stores (Hot Path)

Backend	Status	Best For
In-Memory (Sharded)	Available	Local/single-instance, fallback
Redis	Available	Distributed, multi-node
PostgreSQL	Roadmap	Low-volume, compliance
Oracle	Roadmap	Enterprise, compliance

Policy Stores (Config)

Backend	Status	Best For
File (JSON)	Available	Local dev, testing
PostgreSQL	Roadmap	Production persistence
Oracle	Roadmap	Enterprise persistence
Custom Adapter	Interface Ready	Legacy table migration

Performance & Optimizations

Benchmark Results All benchmarks run with go test ./benchmarks -bench . -benchmem. Zero heap allocations on hot paths.

Operation	Latency	Allocations
Memory store increment (single key)	~105–162 ns/op	0 allocs
Memory store increment (many keys)	~6–7 ns/op	0 allocs
Fixed window evaluation	~1,600 ns/op	16 allocs
HTTP /v1/check handler	~10,000 ns/op	—
HTTP acquire+release	~16,500 ns/op	—

Optimizations Implemented

Lock-Sharded Counters: 64 shards default, FNV32a hash, per-shard mutex — eliminates global lock contention
Async Invalidation Dispatcher: 256-item buffered queue, up to 4 workers decouple API latency from webhook delivery
Bounded Worker Pool: Up to 8 goroutines for publishInvalidation fanout — prevents goroutine explosion
Burst Coalescing: Sidecar drains invalidation channel to avoid redundant sync cycles
Zero-Allocation Hot Path: Counter operations allocate nothing on the heap

Control Plane Features

Policy CRUD

Create, read, update, delete policies via REST API. Full validation before persistence.

Audit Trail

Every policy change recorded with action type, old/new values, timestamp. Query via API.

Version History

Complete version history for each policy. Rollback to any previous version with one API call.

Gradual Rollout

Set rollout_percent from 0–100. Shadow mode for dry-run. Progressive enforcement.

Policy Validation

Validate policy definitions before deployment to catch misconfigurations early.

Analytics Summary

Event and tag aggregation with optional top-N. Understand traffic patterns and decision distribution.

Advanced Capabilities

OTEL Processor Primitives

Batch-process logs and spans through rate-limiting policies. Worker pool with configurable concurrency. Supports fail-open and fail-closed modes. Collect processor stats (checked, allowed, denied, errors).

Multi-Region Allocation

Deploy rate limiting across multiple geographic regions with intelligent limit distribution and overflow protection.

Weighted Proportional Split

Distribute a global limit across regions based on configurable weights. A region with weight 5 gets 5× the allocation of a region with weight 1. Remainder correction ensures the allocations sum to exactly the global limit.

Overflow Detection

Compare real-time per-region usage against allocated limits. Instantly identify which regions have exceeded their allocation and by how much — enabling rebalancing or alerting.

Region-Scoped Policies

Use the region scope dimension or match_expr to create policies that apply only in specific regions (e.g., region==us-east-1).

How It Works

Define region weights and call AllocateGlobalLimit:

// Global limit: 10,000 req/min split across 3 regions
weights := []region.RegionWeight{
    {Region: "us-east-1", Weight: 5},
    {Region: "eu-west-1", Weight: 3},
    {Region: "ap-south-1", Weight: 2},
}

allocations := region.AllocateGlobalLimit(10000, weights)
// → us-east-1: 5000, eu-west-1: 3000, ap-south-1: 2000

// Check for overflow
overflow := region.RegionalOverflow(currentUsage, allocations)
// → map["us-east-1": 250]  (exceeded by 250)

Approximate Global, Exact Regional Each region enforces its allocated limit locally with exact precision. Global enforcement is approximate (sum of regional allocations) — the practical trade-off for low-latency distributed rate limiting.

Non-Go Client SDKs

Full-featured HTTP client SDKs for Python, TypeScript, Java, and .NET. All support check, acquire/release, policy CRUD, validate/rollout/rollback, audit/versions, and analytics.

Sidecar Invalidation

In-process broker with async push fanout to sidecars. Bounded workers prevent HTTP request storms. Sidecars coalesce burst invalidations into single sync operations.

Feature Overview

Rate Limiting Algorithms

Fixed Window

Sliding Window Log

Sliding Window Counter

Token Bucket

Leaky Bucket

Concurrency Limiter

Quota / Budget Limiter

Action Types

Policy Matching Dimensions

Precedence Order (most → least specific)

Advanced match_expr Expressions

Deployment Models

Mode A: Embedded Go SDK

Mode B: Centralized HTTP/gRPC

Mode C: Sidecar / Agent

Mode D: Hybrid (Recommended)

Backend Support

Counter Stores (Hot Path)

Policy Stores (Config)

Performance & Optimizations

Optimizations Implemented

Control Plane Features

Policy CRUD

Audit Trail

Version History

Gradual Rollout

Policy Validation

Analytics Summary

Advanced Capabilities

OTEL Processor Primitives

Multi-Region Allocation

Weighted Proportional Split

Overflow Detection

Region-Scoped Policies

How It Works

Non-Go Client SDKs

Sidecar Invalidation

Explore the Full Design