Files
gridpilot.gg/docs/architecture/shared/FEATURE_AVAILABILITY.md
2026-01-11 13:04:33 +01:00

9.2 KiB

Feature Availability (Modes + Feature Flags)

This document defines a clean, consistent system for enabling/disabling functionality across:

  • API endpoints
  • Website links/navigation
  • Website components

It is designed to support:

  • test mode
  • maintenance mode
  • disabling features due to risk/issues
  • coming soon features
  • future super admin flag management

It is aligned with the hard separation of responsibilities in Blockers & Guards:

  • Frontend uses Blockers (UX best-effort)
  • Backend uses Guards (authoritative enforcement)

See: docs/architecture/BLOCKER_GUARDS.md


1) Core Principle

Availability is decided once, then applied in multiple places.

  • Backend Guards enforce availability for correctness and security.
  • Frontend Blockers reflect availability for UX, but must never be relied on for enforcement.

If it must be enforced, it is a Guard. If it only improves UX, it is a Blocker.


2) Definitions (Canonical Vocabulary)

2.1 Operational Mode (system-level)

A small, global state representing operational posture.

Recommended enum:

  • normal
  • maintenance
  • test

Operational Mode is:

  • authoritative in backend
  • typically environment-scoped
  • required for rapid response (maintenance must be runtime-changeable)

2.2 Feature State (capability-level)

A per-feature state machine (not a boolean).

Recommended enum:

  • enabled
  • disabled
  • coming_soon
  • hidden

Semantics:

  • enabled: feature is available and advertised
  • disabled: feature exists but must not be used (safety kill switch)
  • coming_soon: may be visible in UI as teaser, but actions are blocked
  • hidden: not visible/advertised; actions are blocked (safest default)

2.3 Capability

A named unit of functionality (stable key) used consistently across API + website.

Examples:

  • races.create
  • payments.checkout
  • sponsor.portal
  • stewarding.protests

A capability key is a contract.

2.4 Action Type

Availability decisions vary by the type of action:

  • view: read-only operations (pages, GET endpoints)
  • mutate: state-changing operations (POST/PUT/PATCH/DELETE)

3) Policy Model (What Exists)

3.1 FeatureAvailabilityPolicy (single evaluation model)

One evaluation function produces a decision.

Inputs:

  • environment (dev/test/prod)
  • operationalMode (normal/maintenance/test)
  • capabilityKey (string)
  • actionType (view/mutate)
  • actorContext (anonymous/authenticated; roles later)

Outputs:

  • allow: boolean
  • publicReason: one of maintenance | disabled | coming_soon | hidden | not_configured
  • uxHint: optional { messageKey, redirectPath, showTeaser }

The same decision model is reused by:

  • API Guard enforcement
  • Website navigation visibility
  • Website component rendering/disablement

3.2 Precedence (where values come from)

To avoid “mystery behavior”, use strict precedence:

  1. runtime overrides (highest priority)
  2. build-time environment configuration
  3. code defaults (lowest priority, should be safe: hidden/disabled)

Rationale:

  • runtime overrides enable emergency response without rebuild
  • env config enables environment-specific defaults
  • code defaults keep behavior deterministic if config is missing

4) Evaluation Rules (Deterministic, Explicit)

4.1 Maintenance mode rules

Maintenance must be able to block the platform fast and consistently.

Default behavior:

  • mutate actions: denied unless explicitly allowlisted
  • view actions: allowed only for a small allowlist (status page, login, health, static public routes)

This creates a safe “fail closed” posture.

Optional refinement:

  • define a maintenance allowlist for critical reads (e.g., dashboards for operators)

4.2 Test mode rules

Test mode should primarily exist in non-prod, and should be explicit in prod.

Recommended behavior:

  • In prod, test mode should not be enabled accidentally.
  • In test environments, test mode may:
    • enable test-only endpoints
    • bypass external integrations (through adapters)
    • relax rate limits
    • expose test banners in UI (Blocker-level display)

4.3 Feature state rules (per capability)

Given a capability state:

  • enabled:
    • allow view + mutate (subject to auth/roles)
    • visible in UI
  • coming_soon:
    • allow view of teaser pages/components
    • deny mutate and deny sensitive reads
    • visible in UI with Coming Soon affordances
  • disabled:
    • deny view + mutate
    • hidden in nav by default
  • hidden:
    • deny view + mutate
    • never visible in UI

Note:

  • “disabled” and “hidden” are both blocked; the difference is UI and information disclosure.

4.4 Missing configuration

If a capability is not configured:

  • treat as hidden (fail closed)
  • optionally log a warning (server-side)

5) Enforcement Mapping (Where Each Requirement Lives)

This section is the “wiring contract” across layers.

5.1 API endpoints (authoritative)

  • Enforce via Backend Guards (NestJS CanActivate).
  • Endpoints must declare the capability they require.

Mapping to HTTP:

  • maintenance: 503 Service Unavailable (preferred for global maintenance)
  • disabled/hidden: 404 Not Found (avoid advertising unavailable capabilities)
  • coming_soon: 404 Not Found publicly, or 409 Conflict internally if you want explicit semantics for trusted clients later

Guideline:

  • External clients should not get detailed feature availability information unless explicitly intended.
  • Enforce via Frontend Blockers.
  • Hide links when state is disabled/hidden.
  • For coming_soon, show link but route to teaser page or disable with explanation.

Rules:

  • Never assume hidden in UI equals enforced on server.
  • UI should degrade gracefully (API may still block).

5.3 Website components (UX)

  • Use Blockers to:
    • hide components for hidden/disabled
    • show teaser content for coming_soon
    • disable buttons or flows for coming_soon/disabled, with consistent messaging

Recommendation:

  • Provide a single reusable component (FeatureBlocker) that consumes policy decisions and renders:
    • children when allowed
    • teaser when coming_soon
    • null or fallback when disabled/hidden

6) Build-Time vs Runtime (Clean, Predictable)

6.1 Build-time flags (require rebuild/redeploy)

What they are good for:

  • preventing unfinished UI code from shipping in a bundle
  • cutting entire routes/components from builds for deterministic releases

Limitations:

  • NEXT_PUBLIC_* values are compiled into the client bundle; changing them does not update clients without rebuild.

Use build-time flags for:

  • experimental UI
  • “not yet shipped” components/routes
  • simplifying deployments (pre-launch vs alpha style gating)

6.2 Runtime flags (no rebuild)

What they are for:

  • maintenance mode
  • emergency disable for broken endpoints
  • quickly hiding risky features

Runtime flags must be available to:

  • API Guards (always)
  • Website SSR/middleware optionally
  • Website client optionally (for UX only)

Key tradeoff:

  • runtime access introduces caching and latency concerns
  • treat runtime policy reads as cached, fast, and resilient

Recommended approach:

  • API is authoritative source of runtime policy
  • website can optionally consume a cached policy snapshot endpoint

7) Storage and Distribution (Now + Future Super Admin)

7.1 Now (no super admin UI)

Use a single “policy snapshot” stored in one place and read by the API, with caching.

Options (in priority order):

  1. Remote KV/DB-backed policy snapshot (preferred for true runtime changes)
  2. Environment variable JSON (simpler, but changes require restart/redeploy)
  3. Static config file in repo (requires rebuild/redeploy)

7.2 Future (super admin UI)

Super admin becomes a writer to the same store.

Non-negotiable:

  • The storage schema must be stable and versioned.

Recommended schema (conceptual):

  • policyVersion
  • operationalMode
  • capabilities: map of capabilityKey -> featureState
  • allowlists: maintenance view/mutate allowlists
  • optional targeting rules later (by role/user)

8) Data Flow (Conceptual)

flowchart LR
  UI[Website UI] --> FB[Frontend Blockers]
  FB --> PC[Policy Client]
  UI --> API[API Request]
  API --> FG[Feature Guard]
  FG --> AS[API Application Service]
  AS --> UC[Core Use Case]
  PC --> PS[Policy Snapshot]
  FG --> PS

Interpretation:

  • Website reads policy for UX (best-effort).
  • API enforces policy (authoritative) before any application logic.

9) Implementation Checklist (For Code Mode)

Backend (apps/api):

  • Define capability keys and feature states as shared types in a local module.
  • Create FeaturePolicyService that resolves the current policy snapshot (cached).
  • Add FeatureFlagGuard (or FeatureAvailabilityGuard) that:
    • reads required capability metadata for an endpoint
    • evaluates allow/deny with actionType
    • maps denial to the chosen HTTP status codes

Frontend (apps/website):

  • Add a small PolicyClient that fetches policy snapshot from API (optional for phase 1).
  • Add FeatureBlocker component for consistent UI behavior.
  • Centralize navigation link definitions and filter them via policy.

Ops/Config:

  • Define how maintenance mode is toggled (KV/DB entry or config endpoint restricted to operators later).
  • Ensure defaults are safe (fail closed).

10) Non-Goals (Explicit)

  • This system is not an authorization system.
  • Roles/permissions are separate (but can be added as actorContext inputs later).
  • Blockers never replace Guards.