Feature Overview

Everything RLAAS supports — algorithms, actions, matching, backends, deployment modes, and performance optimizations.

Rate Limiting Algorithms

Seven battle-tested algorithms, all behind a unified Evaluator interface.

Fixed Window

Count requests within a fixed interval. Simple, cheap, easy to understand. Great for org-wide basic limits.
Trade-off: burst at window boundaries.

Sliding Window Log

Store timestamps of all events and count within rolling window. Very accurate, exact enforcement.
Trade-off: memory-heavy, expensive at scale.

Sliding Window Counter

Approximate rolling window using sub-buckets and interpolation. Better fairness than fixed window.
Trade-off: approximation complexity.

Token Bucket

Tokens refill over time; each request consumes tokens. Industry standard for APIs, supports bursts well.
Trade-off: refill math & atomic updates.

Leaky Bucket

Queue requests into a bucket drained at steady rate. Smooth traffic shaping for egress.
Trade-off: less intuitive than token bucket.

Concurrency Limiter

Limit simultaneously in-flight operations. Protect heavy dependencies and expensive tasks.
Trade-off: requires acquire/release lifecycle.

Quota / Budget Limiter

Long-window budget (per-day/week/month). Perfect for SaaS plans and telemetry budgets.
Trade-off: not enough for short-burst protection alone.


Action Types

Go beyond allow/deny. RLAAS supports eight actions per policy:

ActionDescriptionExample
AllowRequest passes without modificationWithin limits
DenyReject entirely — HTTP 429 or gRPC RESOURCE_EXHAUSTEDRate exceeded
DelayAllow after a configurable wait periodEgress calls, background jobs
SampleAllow only a fraction of requests/eventsKeep 10% of debug logs
DropDiscard event without processingLow-value debug telemetry
DowngradeReduce priority or transform handlingStandard pipeline instead of premium
Drop Low PriorityPreserve high-value events, drop low-priority onesMixed-priority event streams
Shadow OnlyRecord decision without enforcingPre-rollout dry run

Policy Matching Dimensions

Match on 20+ dimensions for fine-grained control. Every field is optional — use as many or as few as your use case requires.

org_id tenant_id application service environment signal_type operation endpoint method user_id api_key client_id source_ip region resource severity span_name topic consumer_group job_type tags (key=value)

Precedence Order (most → least specific)

  1. User-level override
  2. API key / client override
  3. Endpoint + method
  4. Operation
  5. Service
  6. Application
  7. Tenant
  8. Org
  9. Signal type
  10. Global default

Advanced match_expr Expressions

For complex conditions, use match_expr in policy metadata:

match_expr: "region==us-east-1 && tag.env==production && method!=DELETE"

Supports ==, != operators with && conjunction. Fields include all scope dimensions plus tag. prefix for tag lookups.


Deployment Models

Mode A: Embedded Go SDK

Import as a Go library. Sub-millisecond decisions, no network hop. Best for Go services, OTEL collectors, latency-sensitive workloads.

Mode B: Centralized HTTP/gRPC

Language-agnostic decision service. Centralized governance and telemetry. Best for polyglot environments and easy version management.

Mode C: Sidecar / Agent

Local proxy in Kubernetes. Low latency with central governance. Local caching + invalidation sync. Best for K8s and team-wide standardization.

Mode D: Hybrid (Recommended)

All three simultaneously. SDK for Go, centralized for non-Go, sidecar for K8s. Maximum flexibility and future extensibility.


Backend Support

Counter Stores (Hot Path)

BackendStatusBest For
In-Memory (Sharded)AvailableLocal/single-instance, fallback
RedisAvailableDistributed, multi-node
PostgreSQLRoadmapLow-volume, compliance
OracleRoadmapEnterprise, compliance

Policy Stores (Config)

BackendStatusBest For
File (JSON)AvailableLocal dev, testing
PostgreSQLRoadmapProduction persistence
OracleRoadmapEnterprise persistence
Custom AdapterInterface ReadyLegacy table migration

Performance & Optimizations

Benchmark Results All benchmarks run with go test ./benchmarks -bench . -benchmem. Zero heap allocations on hot paths.
OperationLatencyAllocations
Memory store increment (single key)~105–162 ns/op0 allocs
Memory store increment (many keys)~6–7 ns/op0 allocs
Fixed window evaluation~1,600 ns/op16 allocs
HTTP /v1/check handler~10,000 ns/op
HTTP acquire+release~16,500 ns/op

Optimizations Implemented


Control Plane Features

Policy CRUD

Create, read, update, delete policies via REST API. Full validation before persistence.

Audit Trail

Every policy change recorded with action type, old/new values, timestamp. Query via API.

Version History

Complete version history for each policy. Rollback to any previous version with one API call.

Gradual Rollout

Set rollout_percent from 0–100. Shadow mode for dry-run. Progressive enforcement.

Policy Validation

Validate policy definitions before deployment to catch misconfigurations early.

Analytics Summary

Event and tag aggregation with optional top-N. Understand traffic patterns and decision distribution.


Advanced Capabilities

OTEL Processor Primitives

Batch-process logs and spans through rate-limiting policies. Worker pool with configurable concurrency. Supports fail-open and fail-closed modes. Collect processor stats (checked, allowed, denied, errors).

Multi-Region Allocation

Deploy rate limiting across multiple geographic regions with intelligent limit distribution and overflow protection.

Weighted Proportional Split

Distribute a global limit across regions based on configurable weights. A region with weight 5 gets 5× the allocation of a region with weight 1. Remainder correction ensures the allocations sum to exactly the global limit.

Overflow Detection

Compare real-time per-region usage against allocated limits. Instantly identify which regions have exceeded their allocation and by how much — enabling rebalancing or alerting.

Region-Scoped Policies

Use the region scope dimension or match_expr to create policies that apply only in specific regions (e.g., region==us-east-1).

How It Works

Define region weights and call AllocateGlobalLimit:

// Global limit: 10,000 req/min split across 3 regions
weights := []region.RegionWeight{
    {Region: "us-east-1", Weight: 5},
    {Region: "eu-west-1", Weight: 3},
    {Region: "ap-south-1", Weight: 2},
}

allocations := region.AllocateGlobalLimit(10000, weights)
// → us-east-1: 5000, eu-west-1: 3000, ap-south-1: 2000

// Check for overflow
overflow := region.RegionalOverflow(currentUsage, allocations)
// → map["us-east-1": 250]  (exceeded by 250)
Approximate Global, Exact Regional Each region enforces its allocated limit locally with exact precision. Global enforcement is approximate (sum of regional allocations) — the practical trade-off for low-latency distributed rate limiting.

Non-Go Client SDKs

Full-featured HTTP client SDKs for Python, TypeScript, Java, and .NET. All support check, acquire/release, policy CRUD, validate/rollout/rollback, audit/versions, and analytics.

Sidecar Invalidation

In-process broker with async push fanout to sidecars. Bounded workers prevent HTTP request storms. Sidecars coalesce burst invalidations into single sync operations.

Explore the Full Design

Dive into the complete architecture, domain model, and implementation details.

Read the Design Document →