RLAAS Design Document

Version 1.0 — Golang-first, hybrid deployment, multi-tenant, extensible

1. Executive Summary

This document defines the architecture for RLAAS (Rate Limiting as a Service) — a platform designed for broad applicability across HTTP APIs, gRPC services, OTEL signals, event streams, background jobs, external integrations, authentication flows, and custom business operations.

The recommended model is a hybrid architecture:

Embedded SDK/library mode for low-latency local decisions.
Centralized control plane for policy management, auditing, rollout, governance, and analytics.
Optional centralized decision service for non-Go clients or service-to-service integration.
Optional sidecar/agent mode for Kubernetes and polyglot environments.

Treat rate limiting as a generic policy decision engine, not as a DB-specific algorithm runner.

2. Goals

Functional Goals

Support rate limiting for any signal or workload type
Support per-tenant, per-org, per-application, per-service, per-user, per-endpoint, and custom dimension-based policies
Support 7 algorithms: fixed window, sliding window (log & counter), token bucket, leaky bucket, concurrency limiter, quota/budget limiter
Support 8 action types: allow, deny, delay, sample, drop, downgrade, drop-low-priority, shadow-only
Support multiple integration modes: Go library, gRPC, HTTP, sidecar, middleware, OTEL processors
Support policy persistence in PostgreSQL, Oracle, file-based stores
Support hot-path counters in: in-memory (sharded), Redis, optional DB-backed low-volume mode
Support multi-tenant policy separation and override hierarchy

Non-Functional Goals

Low latency for hot-path decisions (sub-millisecond local, low-ms distributed)
High concurrency support with lock-sharded counters
Horizontally scalable control plane and decision plane
Pluggable, testable, interface-driven design
Safe rollout with shadow mode and progressive enforcement
Observable, auditable, and production-friendly

Non-Goals (v1)

Full UI implementation
Billing engine
Cross-region strong consistency for all counters
ML-based adaptive throttling
Distributed consensus-based exact limit enforcement across all regions

3. High-Level Architecture

The platform has four major planes/components:

Embedded Data Plane (SDK)

Load & cache policies, build evaluation context, match policies, execute algorithm, query/update counter backend, return decision, expose middleware.

Central Control Plane

Policy CRUD, tenant management, audit history, versioning, rollouts/canaries/shadow mode, analytics metadata, publishing config changes.

Central Decision Service

Expose gRPC/HTTP APIs for rate-limit checks. Execute the same policy engine logic as SDK mode. Share reusable engine packages.

Sidecar / Agent Mode

Provide local endpoint for app containers. Cache policies locally. Reduce remote decision latency. Bridge between app and centralized services.

┌─────────────────────────────────────────────────────────────────────┐ │ Clients │ │ Go SDK │ HTTP Service │ gRPC Service │ Sidecar Agent │ └─────┬─────┴───────┬────────┴───────┬─────────┴──────┬──────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Shared Policy Engine │ │ │ │ RequestContext → Matcher → Algorithm → Decision → Analytics │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Counter Stores │ │ Policy Stores │ │ │ │ ├─ Memory (sharded)│ │ ├─ File (JSON) │ │ │ │ ├─ Redis │ │ ├─ PostgreSQL │ │ │ │ └─ DB (low vol) │ │ └─ Oracle │ │ │ └──────────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘

4. Deployment Modes

Mode A — Embedded Go Library

Applications import the Go package and evaluate decisions locally. Best for internal Go ecosystem, OTEL collectors, latency-sensitive services. No network hop, fast hot path.

Mode B — Centralized Decision Service

Applications call a shared service over gRPC or HTTP. Best for polyglot environments, centralized governance, unified enforcement. Language agnostic, single version to manage.

Mode C — Sidecar / Local Agent

Applications call a local sidecar. Best for Kubernetes workloads, team-wide standardization, local caching with central governance.

Mode D — Hybrid (Recommended)

All three modes simultaneously. SDK for Go services, centralized service for non-Go apps, sidecar for K8s workloads. This is the recommended long-term approach.

5. Core Architecture Principles

Policy storage and counter storage must be separate concerns
DB is suitable for policy persistence, not ideal as the primary hot-path counter store
Counters should primarily live in memory or Redis for scale and latency
All rate limiting decisions must be driven by a canonical internal policy model
Custom org tables should be handled through adapters, not baked into core logic
The engine must return a rich decision object, not only a boolean
Policy evaluation must support multiple dimensions and precedence rules
Every policy should optionally operate in shadow mode before enforcement
Integration adapters should be first-class citizens
Failure strategy must be explicit and configurable

6. Supported Use Cases

HTTP / REST Ingress

Per IP, API key, org, endpoint, method, authenticated user, route group

gRPC

Per method, service, tenant, client identity

HTTP / gRPC Egress

Per partner API, destination, integration type, operation class — protect downstream

OpenTelemetry — Logs

Per org, service, log level, environment, attribute set

OpenTelemetry — Traces

Per service, span name, tenant, operation — drop noisy spans, sample selectively

Event / Messaging

Per topic, consumer group, event type, tenant, producer/consumer

Background Jobs

Per job type, org, workflow step, time window

Auth / Abuse Prevention

Business Use Cases

Invoice generation, report export, file upload/download, premium plan quotas, feature throttling

7. Canonical Domain Model

All policy sources are normalized into a single internal model. The canonical types are:

RequestContext

The caller provides a generic context for evaluation, with 20+ fields including org, tenant, service, operation, endpoint, method, user, API key, region, tags, and more.

type RequestContext struct {
    RequestID      string
    OrgID          string
    TenantID       string
    Application    string
    Service        string
    Environment    string
    SignalType     string  // http, grpc, log, trace, span, event, auth, job, custom
    Operation      string
    Endpoint       string
    Method         string
    UserID         string
    APIKey         string
    ClientID       string
    SourceIP       string
    Region         string
    Resource       string
    Severity       string
    SpanName       string
    Topic          string
    ConsumerGroup  string
    JobType        string
    Quantity       int64
    Priority       string
    Timestamp      time.Time
    Tags           map[string]string
    Attributes     map[string]string
}

Decision

The engine returns a rich decision object — not just a boolean.

type Decision struct {
    Allowed          bool
    Action           ActionType    // allow, deny, delay, sample, drop, ...
    Reason           string
    MatchedPolicyID  string
    Remaining        int64
    RetryAfter       time.Duration
    DelayFor         time.Duration
    SampleRate       float64
    ShadowMode       bool
    Metadata         map[string]string
}

Policy

Policies contain scope matching, algorithm configuration, action, failure mode, enforcement mode, rollout percentage, validity windows, and metadata.

type Policy struct {
    PolicyID          string
    Name              string
    Enabled           bool
    Priority          int
    Scope             PolicyScope
    Algorithm         AlgorithmConfig
    Action            ActionType
    FailureMode       FailureMode      // fail_open, fail_closed
    EnforcementMode   EnforcementMode  // enforce, shadow
    RolloutPercent    int
    ValidFromUnix     int64
    ValidToUnix       int64
    Metadata          map[string]string
}

8. Algorithms in Depth

Fixed Window

Count requests within a fixed interval (e.g., 100 req/min). Simple, cheap, easy to understand. Cons: burst issue at window boundaries. Best for org-wide basic limits.

Sliding Window Log

Store timestamps of all events and count within the rolling window. Very accurate but memory-heavy. Best for security-sensitive exact checks, lower-volume workflows.

Sliding Window Counter

Approximate rolling window using sub-buckets and interpolation. Better scalability than log, better fairness than fixed. Best for APIs, OTEL signals, distributed workloads.

Token Bucket

Tokens refill over time; each request consumes tokens. Supports bursts well, industry standard for APIs. Best for REST/gRPC throttling, downstream API protection.

Leaky Bucket

Queue requests into a bucket drained at steady rate. Smooth shaping, good for egress control. Best for smoothing event bursts, outbound traffic.

Concurrency Limiter

Limit simultaneously in-flight operations. Requires acquire/release lifecycle. Best for DB-heavy operations, file processing, outbound dependency control.

Quota / Budget Limiter

Long-window budget (per-day, per-month). Useful for SaaS plan enforcement, telemetry budgets (logs/day, traces/day, API calls/month).

All Behind One Interface Every algorithm implements the same Evaluator interface, making it easy to swap or test algorithms independently.

9. Supported Actions

Action	Behavior	Example Use
Allow	Request passes without modification	Normal flow
Deny	Reject entirely (HTTP 429, gRPC RESOURCE_EXHAUSTED)	Rate exceeded
Delay	Allow after waiting	Egress calls, background jobs
Sample	Allow only a fraction	Keep 10% of debug logs
Drop	Discard without processing	Low-value debug telemetry
Downgrade	Reduce priority/transform handling	Standard pipeline instead of premium
Drop Low Priority	Preserve high-value, drop low-value	Mixed-priority event streams
Shadow Only	Record decision, do not enforce	Pre-rollout validation

10. Counter Storage Strategy

Key Design Decision Use databases for policy/config persistence, not as the primary high-volume counter storage. Counters belong in fast stores.

In-Memory (Sharded)

Fastest option. 64 lock shards by default, FNV32a hash for shard selection. ~6 ns/op for multi-key contention. Best for local library mode, single-instance services, fallback.

Redis

Distributed limits, shared counters, high throughput. Atomic ops, TTL support, Lua scripts. Best for multi-node deployments.

Database (Low Volume)

Optional PostgreSQL/Oracle counter mode for low-QPS, compliance-driven persistence. Higher latency, poor fit for hot path.

Counter Store Interface

type CounterStore interface {
    Increment(ctx context.Context, key string, value int64, ttl time.Duration) (int64, error)
    Get(ctx context.Context, key string) (int64, error)
    Set(ctx context.Context, key string, value int64, ttl time.Duration) error
    CompareAndSwap(ctx context.Context, key string, old, new int64, ttl time.Duration) (bool, error)
    Delete(ctx context.Context, key string) error
    AddTimestamp(ctx context.Context, key string, ts time.Time, ttl time.Duration) error
    CountAfter(ctx context.Context, key string, after time.Time) (int64, error)
    TrimBefore(ctx context.Context, key string, before time.Time) error
    AcquireLease(ctx context.Context, key string, limit int64, ttl time.Duration) (bool, int64, error)
    ReleaseLease(ctx context.Context, key string) error
}

11. Policy Storage Strategy

Supported policy backends: PostgreSQL, Oracle, file/JSON (local/dev), and custom adapters for legacy enterprise sources.

type PolicyStore interface {
    LoadPolicies(ctx context.Context, tenantOrOrg string) ([]model.Policy, error)
    GetPolicyByID(ctx context.Context, policyID string) (*model.Policy, error)
    UpsertPolicy(ctx context.Context, p model.Policy) error
    DeletePolicy(ctx context.Context, policyID string) error
    ListPolicies(ctx context.Context, filter map[string]string) ([]model.Policy, error)
}

Legacy or custom org-specific tables are supported through adapters that normalize data into the canonical Policy model — the core engine never couples directly to external schemas.

12. Failure Behavior

Fail Open

Allow traffic when backend is unavailable. Best for logs, traces, non-critical metrics, non-security paths.

Fail Closed

Deny when backend is unavailable. Best for abuse prevention, OTP generation, login throttling, external partner quotas.

Fail Degrade

Apply fallback local limits or downgrade action. Best for large distributed systems, best-effort high availability.

13. Shadow Mode & Safe Rollout

Every policy supports shadow mode — evaluate exactly as enforcement would, record the decision, but don't block/delay/drop.

Rollout Percentage

0% — disabled
10% — apply to 10% of matching requests
100% — fully active

Recommended Rollout Flow

Create policy in shadow mode
Validate metrics and hypothetical deny/drop rate
Enable partial rollout (e.g. 10%)
Gradually increase to full enforcement

14. Multi-Tenancy Model

Policies are scoped to org, tenant, application, and service. The override hierarchy from least to most specific:

Global defaults
Org defaults
Tenant-specific policies
Application policies
Service policies
Endpoint / operation policies
User / client overrides

All counter keys are namespaced to prevent collisions:

rlaas:{org}:{tenant}:{signal}:{service}:{operation}:{dimension_hash}

15. Caching Strategy

Policy Cache: Every SDK/agent/service caches policies locally with TTL-based refresh and on-demand invalidation via pub/sub.
Decision Cache (optional): Cache deterministic allow decisions for ultra-short duration where safe.
Counter Local Cache: Suitable for local rate limiting, best-effort approximations, and resilience fallback.

16. Policy Matching Strategy

A request can match multiple policies. The engine determines the winner using precedence:

User-level override
API key / client override
Endpoint + method
Operation
Service
Application
Tenant
Org
Signal type
Global default

Tie Breakers

Higher priority wins
Narrower scope wins
Newest policy version wins
Deterministic final tie-break using policy ID

Advanced Match Expressions

Policies support match_expr in metadata for compound conditions:

match_expr: "region==us-east-1 && tag.env==production && method!=DELETE"

17. Key Construction

A standardized key builder creates deterministic, namespaced counter keys:

rlaas:org=acme:tenant=retail:signal=http:service=payments:endpoint=/v1/charge:method=POST:user=123

Rules: deterministic, stable ordering, namespaced, include only matched scope dimensions, hash large tag maps, safe for Redis and logs.

18. Integration Adapters

HTTP Middleware

For net/http, Echo, Gin, Fiber, Chi. Build RequestContext, evaluate, return 429 on deny.

gRPC Interceptor

Unary and streaming interceptors for transparent rate-limit enforcement.

OTEL Processors

Log and span batch processing with worker pools e.g. drop/sample noisy spans.

Message Consumer Wrapper

For Kafka, Pub/Sub, SQS, NATS — per-topic, per-consumer-group limiting.

Generic Business API

Direct evaluator interface for custom workflows: invoice gen, report export, file upload.

19. Observability

Metrics

Logging

Policy load failures, backend failures, shadow mode high-risk decisions, policy conflicts, invalid configurations.

Tracing

Policy fetch, match time, algorithm execution, Redis/DB round trips — all traceable.

20. Security & Governance

Multi-Tenant Safety: Every policy API enforces org/tenant scoping. No cross-tenant reads/writes.
Admin RBAC: Platform admin, org admin, read-only auditor, developer/operator roles.
Auditability: Every policy change is recorded in the audit trail.
Secret Handling: DB creds via secret manager/env. Redis auth secured. TLS for production gRPC/HTTP.

21. Performance Guidelines

Hot Path Expectations

SDK decisions target sub-millisecond to low-millisecond local
Sharded memory store: ~6 ns/op for multi-key workloads (0 allocs)
Fixed window evaluation: ~1.6 μs/op
HTTP check handler: ~10 μs/op

Performance Optimizations Implemented

Lock-sharded counters: 64 shards with FNV32a hash, per-shard mutex
Async invalidation: Bounded worker pool (up to 8 goroutines) with 256-item buffered queue
Burst coalescing: Sidecar drains invalidation channel to avoid redundant syncs
Zero-allocation hot path: No heap allocations in counter read/write operations

Consistency Trade-off

Exact local/regional limits where needed. Approximate global limits are acceptable in many cases. Per-policy consistency expectations should be documented.

22. Package Structure

rlaas/
├── cmd/
│   ├── rlaas-server/       # HTTP + gRPC server
│   └── rlaas-agent/        # Sidecar proxy agent
├── api/
│   └── proto/              # Protobuf definitions
├── internal/
│   ├── model/              # Canonical domain types
│   ├── engine/
│   │   ├── matcher/        # Policy matching + match_expr
│   │   ├── evaluator/      # Main evaluation engine
│   │   ├── rollout/        # Rollout percentage logic
│   │   └── decision/       # Decision builder
│   ├── algorithm/          # 7 algorithm implementations
│   ├── store/
│   │   ├── policy/         # File, PostgreSQL, Oracle stores
│   │   └── counter/        # Memory (sharded), Redis, DB stores
│   ├── adapter/
│   │   ├── http/           # HTTP middleware
│   │   ├── grpc/           # gRPC interceptor
│   │   └── otel/           # OTEL processor primitives
│   ├── region/             # Multi-region allocation
│   ├── key/                # Counter key builder
│   └── config/             # Configuration model
├── pkg/rlaas/              # Public Go SDK
├── sdk/
│   ├── python/             # Python SDK
│   ├── typescript/         # TypeScript SDK
│   ├── java/               # Java SDK
│   └── dotnet/             # .NET SDK
├── benchmarks/             # Performance benchmark suite
├── examples/               # Sample policies and configs
└── docs/                   # This documentation site

23. Implementation Phases

Phase 1 — MVP ✅

Go SDK/library with canonical policy model
Counter stores: in-memory (sharded) + Redis
Policy store: file-based JSON
Algorithms: fixed window, token bucket, sliding window counter, concurrency, quota
Actions: allow, deny, delay, sample, drop, shadow
HTTP middleware + gRPC interceptor
Decision + analytics endpoints
Policy CRUD + audit + versions + rollout + rollback

Phase 2 — Extended ✅

Centralized gRPC/HTTP decision service
Sidecar/agent mode with invalidation sync
OTEL processor primitives
Multi-region allocation primitives
Advanced match_expr support
Non-Go SDKs: Python, TypeScript, Java, .NET
Benchmark suite
Performance optimizations (sharded counters, async dispatch, burst coalescing)

Phase 3 — Enterprise (In Progress)

PostgreSQL policy & counter stores
Oracle policy & counter stores
Admin/operator UX and policy governance workflows
Advanced analytics and dashboards

End of Design Document — Back to Home