RLAAS Design Document
Version 1.0 — Golang-first, hybrid deployment, multi-tenant, extensible
1. Executive Summary
This document defines the architecture for RLAAS (Rate Limiting as a Service) — a platform designed for broad applicability across HTTP APIs, gRPC services, OTEL signals, event streams, background jobs, external integrations, authentication flows, and custom business operations.
The recommended model is a hybrid architecture:
- Embedded SDK/library mode for low-latency local decisions.
- Centralized control plane for policy management, auditing, rollout, governance, and analytics.
- Optional centralized decision service for non-Go clients or service-to-service integration.
- Optional sidecar/agent mode for Kubernetes and polyglot environments.
Treat rate limiting as a generic policy decision engine, not as a DB-specific algorithm runner.
2. Goals
Functional Goals
- Support rate limiting for any signal or workload type
- Support per-tenant, per-org, per-application, per-service, per-user, per-endpoint, and custom dimension-based policies
- Support 7 algorithms: fixed window, sliding window (log & counter), token bucket, leaky bucket, concurrency limiter, quota/budget limiter
- Support 8 action types: allow, deny, delay, sample, drop, downgrade, drop-low-priority, shadow-only
- Support multiple integration modes: Go library, gRPC, HTTP, sidecar, middleware, OTEL processors
- Support policy persistence in PostgreSQL, Oracle, file-based stores
- Support hot-path counters in: in-memory (sharded), Redis, optional DB-backed low-volume mode
- Support multi-tenant policy separation and override hierarchy
Non-Functional Goals
- Low latency for hot-path decisions (sub-millisecond local, low-ms distributed)
- High concurrency support with lock-sharded counters
- Horizontally scalable control plane and decision plane
- Pluggable, testable, interface-driven design
- Safe rollout with shadow mode and progressive enforcement
- Observable, auditable, and production-friendly
Non-Goals (v1)
- Full UI implementation
- Billing engine
- Cross-region strong consistency for all counters
- ML-based adaptive throttling
- Distributed consensus-based exact limit enforcement across all regions
3. High-Level Architecture
The platform has four major planes/components:
Embedded Data Plane (SDK)
Load & cache policies, build evaluation context, match policies, execute algorithm, query/update counter backend, return decision, expose middleware.
Central Control Plane
Policy CRUD, tenant management, audit history, versioning, rollouts/canaries/shadow mode, analytics metadata, publishing config changes.
Central Decision Service
Expose gRPC/HTTP APIs for rate-limit checks. Execute the same policy engine logic as SDK mode. Share reusable engine packages.
Sidecar / Agent Mode
Provide local endpoint for app containers. Cache policies locally. Reduce remote decision latency. Bridge between app and centralized services.
4. Deployment Modes
Mode A — Embedded Go Library
Applications import the Go package and evaluate decisions locally. Best for internal Go ecosystem, OTEL collectors, latency-sensitive services. No network hop, fast hot path.
Mode B — Centralized Decision Service
Applications call a shared service over gRPC or HTTP. Best for polyglot environments, centralized governance, unified enforcement. Language agnostic, single version to manage.
Mode C — Sidecar / Local Agent
Applications call a local sidecar. Best for Kubernetes workloads, team-wide standardization, local caching with central governance.
Mode D — Hybrid (Recommended)
All three modes simultaneously. SDK for Go services, centralized service for non-Go apps, sidecar for K8s workloads. This is the recommended long-term approach.
5. Core Architecture Principles
- Policy storage and counter storage must be separate concerns
- DB is suitable for policy persistence, not ideal as the primary hot-path counter store
- Counters should primarily live in memory or Redis for scale and latency
- All rate limiting decisions must be driven by a canonical internal policy model
- Custom org tables should be handled through adapters, not baked into core logic
- The engine must return a rich decision object, not only a boolean
- Policy evaluation must support multiple dimensions and precedence rules
- Every policy should optionally operate in shadow mode before enforcement
- Integration adapters should be first-class citizens
- Failure strategy must be explicit and configurable
6. Supported Use Cases
HTTP / REST Ingress
Per IP, API key, org, endpoint, method, authenticated user, route group
gRPC
Per method, service, tenant, client identity
HTTP / gRPC Egress
Per partner API, destination, integration type, operation class — protect downstream
OpenTelemetry — Logs
Per org, service, log level, environment, attribute set
OpenTelemetry — Traces
Per service, span name, tenant, operation — drop noisy spans, sample selectively
Event / Messaging
Per topic, consumer group, event type, tenant, producer/consumer
Background Jobs
Per job type, org, workflow step, time window
Auth / Abuse Prevention
Login attempts, OTP generation, password reset, device registration, email verification
Business Use Cases
Invoice generation, report export, file upload/download, premium plan quotas, feature throttling
7. Canonical Domain Model
All policy sources are normalized into a single internal model. The canonical types are:
RequestContext
The caller provides a generic context for evaluation, with 20+ fields including org, tenant, service, operation, endpoint, method, user, API key, region, tags, and more.
type RequestContext struct {
RequestID string
OrgID string
TenantID string
Application string
Service string
Environment string
SignalType string // http, grpc, log, trace, span, event, auth, job, custom
Operation string
Endpoint string
Method string
UserID string
APIKey string
ClientID string
SourceIP string
Region string
Resource string
Severity string
SpanName string
Topic string
ConsumerGroup string
JobType string
Quantity int64
Priority string
Timestamp time.Time
Tags map[string]string
Attributes map[string]string
}
Decision
The engine returns a rich decision object — not just a boolean.
type Decision struct {
Allowed bool
Action ActionType // allow, deny, delay, sample, drop, ...
Reason string
MatchedPolicyID string
Remaining int64
RetryAfter time.Duration
DelayFor time.Duration
SampleRate float64
ShadowMode bool
Metadata map[string]string
}
Policy
Policies contain scope matching, algorithm configuration, action, failure mode, enforcement mode, rollout percentage, validity windows, and metadata.
type Policy struct {
PolicyID string
Name string
Enabled bool
Priority int
Scope PolicyScope
Algorithm AlgorithmConfig
Action ActionType
FailureMode FailureMode // fail_open, fail_closed
EnforcementMode EnforcementMode // enforce, shadow
RolloutPercent int
ValidFromUnix int64
ValidToUnix int64
Metadata map[string]string
}
8. Algorithms in Depth
Fixed Window
Count requests within a fixed interval (e.g., 100 req/min). Simple, cheap, easy to understand. Cons: burst issue at window boundaries. Best for org-wide basic limits.
Sliding Window Log
Store timestamps of all events and count within the rolling window. Very accurate but memory-heavy. Best for security-sensitive exact checks, lower-volume workflows.
Sliding Window Counter
Approximate rolling window using sub-buckets and interpolation. Better scalability than log, better fairness than fixed. Best for APIs, OTEL signals, distributed workloads.
Token Bucket
Tokens refill over time; each request consumes tokens. Supports bursts well, industry standard for APIs. Best for REST/gRPC throttling, downstream API protection.
Leaky Bucket
Queue requests into a bucket drained at steady rate. Smooth shaping, good for egress control. Best for smoothing event bursts, outbound traffic.
Concurrency Limiter
Limit simultaneously in-flight operations. Requires acquire/release lifecycle. Best for DB-heavy operations, file processing, outbound dependency control.
Quota / Budget Limiter
Long-window budget (per-day, per-month). Useful for SaaS plan enforcement, telemetry budgets (logs/day, traces/day, API calls/month).
Evaluator interface, making it easy to swap or test algorithms independently.
9. Supported Actions
| Action | Behavior | Example Use |
|---|---|---|
| Allow | Request passes without modification | Normal flow |
| Deny | Reject entirely (HTTP 429, gRPC RESOURCE_EXHAUSTED) | Rate exceeded |
| Delay | Allow after waiting | Egress calls, background jobs |
| Sample | Allow only a fraction | Keep 10% of debug logs |
| Drop | Discard without processing | Low-value debug telemetry |
| Downgrade | Reduce priority/transform handling | Standard pipeline instead of premium |
| Drop Low Priority | Preserve high-value, drop low-value | Mixed-priority event streams |
| Shadow Only | Record decision, do not enforce | Pre-rollout validation |
10. Counter Storage Strategy
In-Memory (Sharded)
Fastest option. 64 lock shards by default, FNV32a hash for shard selection. ~6 ns/op for multi-key contention. Best for local library mode, single-instance services, fallback.
Redis
Distributed limits, shared counters, high throughput. Atomic ops, TTL support, Lua scripts. Best for multi-node deployments.
Database (Low Volume)
Optional PostgreSQL/Oracle counter mode for low-QPS, compliance-driven persistence. Higher latency, poor fit for hot path.
Counter Store Interface
type CounterStore interface {
Increment(ctx context.Context, key string, value int64, ttl time.Duration) (int64, error)
Get(ctx context.Context, key string) (int64, error)
Set(ctx context.Context, key string, value int64, ttl time.Duration) error
CompareAndSwap(ctx context.Context, key string, old, new int64, ttl time.Duration) (bool, error)
Delete(ctx context.Context, key string) error
AddTimestamp(ctx context.Context, key string, ts time.Time, ttl time.Duration) error
CountAfter(ctx context.Context, key string, after time.Time) (int64, error)
TrimBefore(ctx context.Context, key string, before time.Time) error
AcquireLease(ctx context.Context, key string, limit int64, ttl time.Duration) (bool, int64, error)
ReleaseLease(ctx context.Context, key string) error
}
11. Policy Storage Strategy
Supported policy backends: PostgreSQL, Oracle, file/JSON (local/dev), and custom adapters for legacy enterprise sources.
type PolicyStore interface {
LoadPolicies(ctx context.Context, tenantOrOrg string) ([]model.Policy, error)
GetPolicyByID(ctx context.Context, policyID string) (*model.Policy, error)
UpsertPolicy(ctx context.Context, p model.Policy) error
DeletePolicy(ctx context.Context, policyID string) error
ListPolicies(ctx context.Context, filter map[string]string) ([]model.Policy, error)
}
Legacy or custom org-specific tables are supported through adapters that normalize data into the canonical Policy model — the core engine never couples directly to external schemas.
12. Failure Behavior
Fail Open
Allow traffic when backend is unavailable. Best for logs, traces, non-critical metrics, non-security paths.
Fail Closed
Deny when backend is unavailable. Best for abuse prevention, OTP generation, login throttling, external partner quotas.
Fail Degrade
Apply fallback local limits or downgrade action. Best for large distributed systems, best-effort high availability.
13. Shadow Mode & Safe Rollout
Every policy supports shadow mode — evaluate exactly as enforcement would, record the decision, but don't block/delay/drop.
Rollout Percentage
- 0% — disabled
- 10% — apply to 10% of matching requests
- 100% — fully active
Recommended Rollout Flow
- Create policy in shadow mode
- Validate metrics and hypothetical deny/drop rate
- Enable partial rollout (e.g. 10%)
- Gradually increase to full enforcement
14. Multi-Tenancy Model
Policies are scoped to org, tenant, application, and service. The override hierarchy from least to most specific:
- Global defaults
- Org defaults
- Tenant-specific policies
- Application policies
- Service policies
- Endpoint / operation policies
- User / client overrides
All counter keys are namespaced to prevent collisions:
rlaas:{org}:{tenant}:{signal}:{service}:{operation}:{dimension_hash}
15. Caching Strategy
- Policy Cache: Every SDK/agent/service caches policies locally with TTL-based refresh and on-demand invalidation via pub/sub.
- Decision Cache (optional): Cache deterministic allow decisions for ultra-short duration where safe.
- Counter Local Cache: Suitable for local rate limiting, best-effort approximations, and resilience fallback.
16. Policy Matching Strategy
A request can match multiple policies. The engine determines the winner using precedence:
- User-level override
- API key / client override
- Endpoint + method
- Operation
- Service
- Application
- Tenant
- Org
- Signal type
- Global default
Tie Breakers
- Higher priority wins
- Narrower scope wins
- Newest policy version wins
- Deterministic final tie-break using policy ID
Advanced Match Expressions
Policies support match_expr in metadata for compound conditions:
match_expr: "region==us-east-1 && tag.env==production && method!=DELETE"
17. Key Construction
A standardized key builder creates deterministic, namespaced counter keys:
rlaas:org=acme:tenant=retail:signal=http:service=payments:endpoint=/v1/charge:method=POST:user=123
Rules: deterministic, stable ordering, namespaced, include only matched scope dimensions, hash large tag maps, safe for Redis and logs.
18. Integration Adapters
HTTP Middleware
For net/http, Echo, Gin, Fiber, Chi. Build RequestContext, evaluate, return 429 on deny.
gRPC Interceptor
Unary and streaming interceptors for transparent rate-limit enforcement.
OTEL Processors
Log and span batch processing with worker pools e.g. drop/sample noisy spans.
Message Consumer Wrapper
For Kafka, Pub/Sub, SQS, NATS — per-topic, per-consumer-group limiting.
Generic Business API
Direct evaluator interface for custom workflows: invoice gen, report export, file upload.
19. Observability
Metrics
Logging
Policy load failures, backend failures, shadow mode high-risk decisions, policy conflicts, invalid configurations.
Tracing
Policy fetch, match time, algorithm execution, Redis/DB round trips — all traceable.
20. Security & Governance
- Multi-Tenant Safety: Every policy API enforces org/tenant scoping. No cross-tenant reads/writes.
- Admin RBAC: Platform admin, org admin, read-only auditor, developer/operator roles.
- Auditability: Every policy change is recorded in the audit trail.
- Secret Handling: DB creds via secret manager/env. Redis auth secured. TLS for production gRPC/HTTP.
21. Performance Guidelines
Hot Path Expectations
- SDK decisions target sub-millisecond to low-millisecond local
- Sharded memory store: ~6 ns/op for multi-key workloads (0 allocs)
- Fixed window evaluation: ~1.6 μs/op
- HTTP check handler: ~10 μs/op
Performance Optimizations Implemented
- Lock-sharded counters: 64 shards with FNV32a hash, per-shard mutex
- Async invalidation: Bounded worker pool (up to 8 goroutines) with 256-item buffered queue
- Burst coalescing: Sidecar drains invalidation channel to avoid redundant syncs
- Zero-allocation hot path: No heap allocations in counter read/write operations
Consistency Trade-off
Exact local/regional limits where needed. Approximate global limits are acceptable in many cases. Per-policy consistency expectations should be documented.
22. Package Structure
rlaas/
├── cmd/
│ ├── rlaas-server/ # HTTP + gRPC server
│ └── rlaas-agent/ # Sidecar proxy agent
├── api/
│ └── proto/ # Protobuf definitions
├── internal/
│ ├── model/ # Canonical domain types
│ ├── engine/
│ │ ├── matcher/ # Policy matching + match_expr
│ │ ├── evaluator/ # Main evaluation engine
│ │ ├── rollout/ # Rollout percentage logic
│ │ └── decision/ # Decision builder
│ ├── algorithm/ # 7 algorithm implementations
│ ├── store/
│ │ ├── policy/ # File, PostgreSQL, Oracle stores
│ │ └── counter/ # Memory (sharded), Redis, DB stores
│ ├── adapter/
│ │ ├── http/ # HTTP middleware
│ │ ├── grpc/ # gRPC interceptor
│ │ └── otel/ # OTEL processor primitives
│ ├── region/ # Multi-region allocation
│ ├── key/ # Counter key builder
│ └── config/ # Configuration model
├── pkg/rlaas/ # Public Go SDK
├── sdk/
│ ├── python/ # Python SDK
│ ├── typescript/ # TypeScript SDK
│ ├── java/ # Java SDK
│ └── dotnet/ # .NET SDK
├── benchmarks/ # Performance benchmark suite
├── examples/ # Sample policies and configs
└── docs/ # This documentation site
23. Implementation Phases
Phase 1 — MVP ✅
- Go SDK/library with canonical policy model
- Counter stores: in-memory (sharded) + Redis
- Policy store: file-based JSON
- Algorithms: fixed window, token bucket, sliding window counter, concurrency, quota
- Actions: allow, deny, delay, sample, drop, shadow
- HTTP middleware + gRPC interceptor
- Decision + analytics endpoints
- Policy CRUD + audit + versions + rollout + rollback
Phase 2 — Extended ✅
- Centralized gRPC/HTTP decision service
- Sidecar/agent mode with invalidation sync
- OTEL processor primitives
- Multi-region allocation primitives
- Advanced match_expr support
- Non-Go SDKs: Python, TypeScript, Java, .NET
- Benchmark suite
- Performance optimizations (sharded counters, async dispatch, burst coalescing)
Phase 3 — Enterprise (In Progress)
- PostgreSQL policy & counter stores
- Oracle policy & counter stores
- Admin/operator UX and policy governance workflows
- Advanced analytics and dashboards
End of Design Document — Back to Home