RLAAS Design Document

Version 1.0 — Golang-first, hybrid deployment, multi-tenant, extensible

1. Executive Summary

This document defines the architecture for RLAAS (Rate Limiting as a Service) — a platform designed for broad applicability across HTTP APIs, gRPC services, OTEL signals, event streams, background jobs, external integrations, authentication flows, and custom business operations.

The recommended model is a hybrid architecture:

  1. Embedded SDK/library mode for low-latency local decisions.
  2. Centralized control plane for policy management, auditing, rollout, governance, and analytics.
  3. Optional centralized decision service for non-Go clients or service-to-service integration.
  4. Optional sidecar/agent mode for Kubernetes and polyglot environments.
Treat rate limiting as a generic policy decision engine, not as a DB-specific algorithm runner.

2. Goals

Functional Goals

  • Support rate limiting for any signal or workload type
  • Support per-tenant, per-org, per-application, per-service, per-user, per-endpoint, and custom dimension-based policies
  • Support 7 algorithms: fixed window, sliding window (log & counter), token bucket, leaky bucket, concurrency limiter, quota/budget limiter
  • Support 8 action types: allow, deny, delay, sample, drop, downgrade, drop-low-priority, shadow-only
  • Support multiple integration modes: Go library, gRPC, HTTP, sidecar, middleware, OTEL processors
  • Support policy persistence in PostgreSQL, Oracle, file-based stores
  • Support hot-path counters in: in-memory (sharded), Redis, optional DB-backed low-volume mode
  • Support multi-tenant policy separation and override hierarchy

Non-Functional Goals

  • Low latency for hot-path decisions (sub-millisecond local, low-ms distributed)
  • High concurrency support with lock-sharded counters
  • Horizontally scalable control plane and decision plane
  • Pluggable, testable, interface-driven design
  • Safe rollout with shadow mode and progressive enforcement
  • Observable, auditable, and production-friendly

Non-Goals (v1)

  • Full UI implementation
  • Billing engine
  • Cross-region strong consistency for all counters
  • ML-based adaptive throttling
  • Distributed consensus-based exact limit enforcement across all regions

3. High-Level Architecture

The platform has four major planes/components:

Embedded Data Plane (SDK)

Load & cache policies, build evaluation context, match policies, execute algorithm, query/update counter backend, return decision, expose middleware.

Central Control Plane

Policy CRUD, tenant management, audit history, versioning, rollouts/canaries/shadow mode, analytics metadata, publishing config changes.

Central Decision Service

Expose gRPC/HTTP APIs for rate-limit checks. Execute the same policy engine logic as SDK mode. Share reusable engine packages.

Sidecar / Agent Mode

Provide local endpoint for app containers. Cache policies locally. Reduce remote decision latency. Bridge between app and centralized services.

┌─────────────────────────────────────────────────────────────────────┐ │ Clients │ │ Go SDK │ HTTP Service │ gRPC Service │ Sidecar Agent │ └─────┬─────┴───────┬────────┴───────┬─────────┴──────┬──────────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ Shared Policy Engine │ │ │ │ RequestContext → Matcher → Algorithm → Decision → Analytics │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Counter Stores │ │ Policy Stores │ │ │ │ ├─ Memory (sharded)│ │ ├─ File (JSON) │ │ │ │ ├─ Redis │ │ ├─ PostgreSQL │ │ │ │ └─ DB (low vol) │ │ └─ Oracle │ │ │ └──────────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘

4. Deployment Modes

Mode A — Embedded Go Library

Applications import the Go package and evaluate decisions locally. Best for internal Go ecosystem, OTEL collectors, latency-sensitive services. No network hop, fast hot path.

Mode B — Centralized Decision Service

Applications call a shared service over gRPC or HTTP. Best for polyglot environments, centralized governance, unified enforcement. Language agnostic, single version to manage.

Mode C — Sidecar / Local Agent

Applications call a local sidecar. Best for Kubernetes workloads, team-wide standardization, local caching with central governance.

Mode D — Hybrid (Recommended)

All three modes simultaneously. SDK for Go services, centralized service for non-Go apps, sidecar for K8s workloads. This is the recommended long-term approach.

5. Core Architecture Principles

  1. Policy storage and counter storage must be separate concerns
  2. DB is suitable for policy persistence, not ideal as the primary hot-path counter store
  3. Counters should primarily live in memory or Redis for scale and latency
  4. All rate limiting decisions must be driven by a canonical internal policy model
  5. Custom org tables should be handled through adapters, not baked into core logic
  6. The engine must return a rich decision object, not only a boolean
  7. Policy evaluation must support multiple dimensions and precedence rules
  8. Every policy should optionally operate in shadow mode before enforcement
  9. Integration adapters should be first-class citizens
  10. Failure strategy must be explicit and configurable

6. Supported Use Cases

HTTP / REST Ingress

Per IP, API key, org, endpoint, method, authenticated user, route group

gRPC

Per method, service, tenant, client identity

HTTP / gRPC Egress

Per partner API, destination, integration type, operation class — protect downstream

OpenTelemetry — Logs

Per org, service, log level, environment, attribute set

OpenTelemetry — Traces

Per service, span name, tenant, operation — drop noisy spans, sample selectively

Event / Messaging

Per topic, consumer group, event type, tenant, producer/consumer

Background Jobs

Per job type, org, workflow step, time window

Auth / Abuse Prevention

Login attempts, OTP generation, password reset, device registration, email verification

Business Use Cases

Invoice generation, report export, file upload/download, premium plan quotas, feature throttling

7. Canonical Domain Model

All policy sources are normalized into a single internal model. The canonical types are:

RequestContext

The caller provides a generic context for evaluation, with 20+ fields including org, tenant, service, operation, endpoint, method, user, API key, region, tags, and more.

type RequestContext struct {
    RequestID      string
    OrgID          string
    TenantID       string
    Application    string
    Service        string
    Environment    string
    SignalType     string  // http, grpc, log, trace, span, event, auth, job, custom
    Operation      string
    Endpoint       string
    Method         string
    UserID         string
    APIKey         string
    ClientID       string
    SourceIP       string
    Region         string
    Resource       string
    Severity       string
    SpanName       string
    Topic          string
    ConsumerGroup  string
    JobType        string
    Quantity       int64
    Priority       string
    Timestamp      time.Time
    Tags           map[string]string
    Attributes     map[string]string
}

Decision

The engine returns a rich decision object — not just a boolean.

type Decision struct {
    Allowed          bool
    Action           ActionType    // allow, deny, delay, sample, drop, ...
    Reason           string
    MatchedPolicyID  string
    Remaining        int64
    RetryAfter       time.Duration
    DelayFor         time.Duration
    SampleRate       float64
    ShadowMode       bool
    Metadata         map[string]string
}

Policy

Policies contain scope matching, algorithm configuration, action, failure mode, enforcement mode, rollout percentage, validity windows, and metadata.

type Policy struct {
    PolicyID          string
    Name              string
    Enabled           bool
    Priority          int
    Scope             PolicyScope
    Algorithm         AlgorithmConfig
    Action            ActionType
    FailureMode       FailureMode      // fail_open, fail_closed
    EnforcementMode   EnforcementMode  // enforce, shadow
    RolloutPercent    int
    ValidFromUnix     int64
    ValidToUnix       int64
    Metadata          map[string]string
}

8. Algorithms in Depth

Fixed Window

Count requests within a fixed interval (e.g., 100 req/min). Simple, cheap, easy to understand. Cons: burst issue at window boundaries. Best for org-wide basic limits.

Sliding Window Log

Store timestamps of all events and count within the rolling window. Very accurate but memory-heavy. Best for security-sensitive exact checks, lower-volume workflows.

Sliding Window Counter

Approximate rolling window using sub-buckets and interpolation. Better scalability than log, better fairness than fixed. Best for APIs, OTEL signals, distributed workloads.

Token Bucket

Tokens refill over time; each request consumes tokens. Supports bursts well, industry standard for APIs. Best for REST/gRPC throttling, downstream API protection.

Leaky Bucket

Queue requests into a bucket drained at steady rate. Smooth shaping, good for egress control. Best for smoothing event bursts, outbound traffic.

Concurrency Limiter

Limit simultaneously in-flight operations. Requires acquire/release lifecycle. Best for DB-heavy operations, file processing, outbound dependency control.

Quota / Budget Limiter

Long-window budget (per-day, per-month). Useful for SaaS plan enforcement, telemetry budgets (logs/day, traces/day, API calls/month).

All Behind One Interface Every algorithm implements the same Evaluator interface, making it easy to swap or test algorithms independently.

9. Supported Actions

Allow Deny Delay Sample Drop Downgrade Drop Low Priority Shadow Only
ActionBehaviorExample Use
AllowRequest passes without modificationNormal flow
DenyReject entirely (HTTP 429, gRPC RESOURCE_EXHAUSTED)Rate exceeded
DelayAllow after waitingEgress calls, background jobs
SampleAllow only a fractionKeep 10% of debug logs
DropDiscard without processingLow-value debug telemetry
DowngradeReduce priority/transform handlingStandard pipeline instead of premium
Drop Low PriorityPreserve high-value, drop low-valueMixed-priority event streams
Shadow OnlyRecord decision, do not enforcePre-rollout validation

10. Counter Storage Strategy

Key Design Decision Use databases for policy/config persistence, not as the primary high-volume counter storage. Counters belong in fast stores.

In-Memory (Sharded)

Fastest option. 64 lock shards by default, FNV32a hash for shard selection. ~6 ns/op for multi-key contention. Best for local library mode, single-instance services, fallback.

Redis

Distributed limits, shared counters, high throughput. Atomic ops, TTL support, Lua scripts. Best for multi-node deployments.

Database (Low Volume)

Optional PostgreSQL/Oracle counter mode for low-QPS, compliance-driven persistence. Higher latency, poor fit for hot path.

Counter Store Interface

type CounterStore interface {
    Increment(ctx context.Context, key string, value int64, ttl time.Duration) (int64, error)
    Get(ctx context.Context, key string) (int64, error)
    Set(ctx context.Context, key string, value int64, ttl time.Duration) error
    CompareAndSwap(ctx context.Context, key string, old, new int64, ttl time.Duration) (bool, error)
    Delete(ctx context.Context, key string) error
    AddTimestamp(ctx context.Context, key string, ts time.Time, ttl time.Duration) error
    CountAfter(ctx context.Context, key string, after time.Time) (int64, error)
    TrimBefore(ctx context.Context, key string, before time.Time) error
    AcquireLease(ctx context.Context, key string, limit int64, ttl time.Duration) (bool, int64, error)
    ReleaseLease(ctx context.Context, key string) error
}

11. Policy Storage Strategy

Supported policy backends: PostgreSQL, Oracle, file/JSON (local/dev), and custom adapters for legacy enterprise sources.

type PolicyStore interface {
    LoadPolicies(ctx context.Context, tenantOrOrg string) ([]model.Policy, error)
    GetPolicyByID(ctx context.Context, policyID string) (*model.Policy, error)
    UpsertPolicy(ctx context.Context, p model.Policy) error
    DeletePolicy(ctx context.Context, policyID string) error
    ListPolicies(ctx context.Context, filter map[string]string) ([]model.Policy, error)
}

Legacy or custom org-specific tables are supported through adapters that normalize data into the canonical Policy model — the core engine never couples directly to external schemas.

12. Failure Behavior

Fail Open

Allow traffic when backend is unavailable. Best for logs, traces, non-critical metrics, non-security paths.

Fail Closed

Deny when backend is unavailable. Best for abuse prevention, OTP generation, login throttling, external partner quotas.

Fail Degrade

Apply fallback local limits or downgrade action. Best for large distributed systems, best-effort high availability.

13. Shadow Mode & Safe Rollout

Every policy supports shadow mode — evaluate exactly as enforcement would, record the decision, but don't block/delay/drop.

Rollout Percentage

  • 0% — disabled
  • 10% — apply to 10% of matching requests
  • 100% — fully active

Recommended Rollout Flow

  1. Create policy in shadow mode
  2. Validate metrics and hypothetical deny/drop rate
  3. Enable partial rollout (e.g. 10%)
  4. Gradually increase to full enforcement

14. Multi-Tenancy Model

Policies are scoped to org, tenant, application, and service. The override hierarchy from least to most specific:

  1. Global defaults
  2. Org defaults
  3. Tenant-specific policies
  4. Application policies
  5. Service policies
  6. Endpoint / operation policies
  7. User / client overrides

All counter keys are namespaced to prevent collisions:

rlaas:{org}:{tenant}:{signal}:{service}:{operation}:{dimension_hash}

15. Caching Strategy

  • Policy Cache: Every SDK/agent/service caches policies locally with TTL-based refresh and on-demand invalidation via pub/sub.
  • Decision Cache (optional): Cache deterministic allow decisions for ultra-short duration where safe.
  • Counter Local Cache: Suitable for local rate limiting, best-effort approximations, and resilience fallback.

16. Policy Matching Strategy

A request can match multiple policies. The engine determines the winner using precedence:

  1. User-level override
  2. API key / client override
  3. Endpoint + method
  4. Operation
  5. Service
  6. Application
  7. Tenant
  8. Org
  9. Signal type
  10. Global default

Tie Breakers

  1. Higher priority wins
  2. Narrower scope wins
  3. Newest policy version wins
  4. Deterministic final tie-break using policy ID

Advanced Match Expressions

Policies support match_expr in metadata for compound conditions:

match_expr: "region==us-east-1 && tag.env==production && method!=DELETE"

17. Key Construction

A standardized key builder creates deterministic, namespaced counter keys:

rlaas:org=acme:tenant=retail:signal=http:service=payments:endpoint=/v1/charge:method=POST:user=123

Rules: deterministic, stable ordering, namespaced, include only matched scope dimensions, hash large tag maps, safe for Redis and logs.

18. Integration Adapters

HTTP Middleware

For net/http, Echo, Gin, Fiber, Chi. Build RequestContext, evaluate, return 429 on deny.

gRPC Interceptor

Unary and streaming interceptors for transparent rate-limit enforcement.

OTEL Processors

Log and span batch processing with worker pools e.g. drop/sample noisy spans.

Message Consumer Wrapper

For Kafka, Pub/Sub, SQS, NATS — per-topic, per-consumer-group limiting.

Generic Business API

Direct evaluator interface for custom workflows: invoice gen, report export, file upload.

19. Observability

Metrics

decisions_total decisions_allowed_total decisions_denied_total decisions_shadow_total algorithm_latency_ms counter_store_errors_total policy_cache_hit_total backend_fail_open_total

Logging

Policy load failures, backend failures, shadow mode high-risk decisions, policy conflicts, invalid configurations.

Tracing

Policy fetch, match time, algorithm execution, Redis/DB round trips — all traceable.

20. Security & Governance

  • Multi-Tenant Safety: Every policy API enforces org/tenant scoping. No cross-tenant reads/writes.
  • Admin RBAC: Platform admin, org admin, read-only auditor, developer/operator roles.
  • Auditability: Every policy change is recorded in the audit trail.
  • Secret Handling: DB creds via secret manager/env. Redis auth secured. TLS for production gRPC/HTTP.

21. Performance Guidelines

Hot Path Expectations

  • SDK decisions target sub-millisecond to low-millisecond local
  • Sharded memory store: ~6 ns/op for multi-key workloads (0 allocs)
  • Fixed window evaluation: ~1.6 μs/op
  • HTTP check handler: ~10 μs/op

Performance Optimizations Implemented

  • Lock-sharded counters: 64 shards with FNV32a hash, per-shard mutex
  • Async invalidation: Bounded worker pool (up to 8 goroutines) with 256-item buffered queue
  • Burst coalescing: Sidecar drains invalidation channel to avoid redundant syncs
  • Zero-allocation hot path: No heap allocations in counter read/write operations

Consistency Trade-off

Exact local/regional limits where needed. Approximate global limits are acceptable in many cases. Per-policy consistency expectations should be documented.

22. Package Structure

rlaas/
├── cmd/
│   ├── rlaas-server/       # HTTP + gRPC server
│   └── rlaas-agent/        # Sidecar proxy agent
├── api/
│   └── proto/              # Protobuf definitions
├── internal/
│   ├── model/              # Canonical domain types
│   ├── engine/
│   │   ├── matcher/        # Policy matching + match_expr
│   │   ├── evaluator/      # Main evaluation engine
│   │   ├── rollout/        # Rollout percentage logic
│   │   └── decision/       # Decision builder
│   ├── algorithm/          # 7 algorithm implementations
│   ├── store/
│   │   ├── policy/         # File, PostgreSQL, Oracle stores
│   │   └── counter/        # Memory (sharded), Redis, DB stores
│   ├── adapter/
│   │   ├── http/           # HTTP middleware
│   │   ├── grpc/           # gRPC interceptor
│   │   └── otel/           # OTEL processor primitives
│   ├── region/             # Multi-region allocation
│   ├── key/                # Counter key builder
│   └── config/             # Configuration model
├── pkg/rlaas/              # Public Go SDK
├── sdk/
│   ├── python/             # Python SDK
│   ├── typescript/         # TypeScript SDK
│   ├── java/               # Java SDK
│   └── dotnet/             # .NET SDK
├── benchmarks/             # Performance benchmark suite
├── examples/               # Sample policies and configs
└── docs/                   # This documentation site

23. Implementation Phases

Phase 1 — MVP ✅

  • Go SDK/library with canonical policy model
  • Counter stores: in-memory (sharded) + Redis
  • Policy store: file-based JSON
  • Algorithms: fixed window, token bucket, sliding window counter, concurrency, quota
  • Actions: allow, deny, delay, sample, drop, shadow
  • HTTP middleware + gRPC interceptor
  • Decision + analytics endpoints
  • Policy CRUD + audit + versions + rollout + rollback

Phase 2 — Extended ✅

  • Centralized gRPC/HTTP decision service
  • Sidecar/agent mode with invalidation sync
  • OTEL processor primitives
  • Multi-region allocation primitives
  • Advanced match_expr support
  • Non-Go SDKs: Python, TypeScript, Java, .NET
  • Benchmark suite
  • Performance optimizations (sharded counters, async dispatch, burst coalescing)

Phase 3 — Enterprise (In Progress)

  • PostgreSQL policy & counter stores
  • Oracle policy & counter stores
  • Admin/operator UX and policy governance workflows
  • Advanced analytics and dashboards

End of Design Document — Back to Home