Files
gridpilot.gg/docs/architecture/FEATURE_AVAILABILITY.md

315 lines
9.2 KiB
Markdown

# Feature Availability (Modes + Feature Flags)
This document defines a clean, consistent system for enabling/disabling functionality across:
- API endpoints
- Website links/navigation
- Website components
It is designed to support:
- test mode
- maintenance mode
- disabling features due to risk/issues
- coming soon features
- future super admin flag management
It is aligned with the hard separation of responsibilities in `Blockers & Guards`:
- Frontend uses Blockers (UX best-effort)
- Backend uses Guards (authoritative enforcement)
See: docs/architecture/BLOCKER_GUARDS.md
---
## 1) Core Principle
Availability is decided once, then applied in multiple places.
- Backend Guards enforce availability for correctness and security.
- Frontend Blockers reflect availability for UX, but must never be relied on for enforcement.
If it must be enforced, it is a Guard.
If it only improves UX, it is a Blocker.
---
## 2) Definitions (Canonical Vocabulary)
### 2.1 Operational Mode (system-level)
A small, global state representing operational posture.
Recommended enum:
- normal
- maintenance
- test
Operational Mode is:
- authoritative in backend
- typically environment-scoped
- required for rapid response (maintenance must be runtime-changeable)
### 2.2 Feature State (capability-level)
A per-feature state machine (not a boolean).
Recommended enum:
- enabled
- disabled
- coming_soon
- hidden
Semantics:
- enabled: feature is available and advertised
- disabled: feature exists but must not be used (safety kill switch)
- coming_soon: may be visible in UI as teaser, but actions are blocked
- hidden: not visible/advertised; actions are blocked (safest default)
### 2.3 Capability
A named unit of functionality (stable key) used consistently across API + website.
Examples:
- races.create
- payments.checkout
- sponsor.portal
- stewarding.protests
A capability key is a contract.
### 2.4 Action Type
Availability decisions vary by the type of action:
- view: read-only operations (pages, GET endpoints)
- mutate: state-changing operations (POST/PUT/PATCH/DELETE)
---
## 3) Policy Model (What Exists)
### 3.1 FeatureAvailabilityPolicy (single evaluation model)
One evaluation function produces a decision.
Inputs:
- environment (dev/test/prod)
- operationalMode (normal/maintenance/test)
- capabilityKey (string)
- actionType (view/mutate)
- actorContext (anonymous/authenticated; roles later)
Outputs:
- allow: boolean
- publicReason: one of maintenance | disabled | coming_soon | hidden | not_configured
- uxHint: optional { messageKey, redirectPath, showTeaser }
The same decision model is reused by:
- API Guard enforcement
- Website navigation visibility
- Website component rendering/disablement
### 3.2 Precedence (where values come from)
To avoid “mystery behavior”, use strict precedence:
1. runtime overrides (highest priority)
2. build-time environment configuration
3. code defaults (lowest priority, should be safe: hidden/disabled)
Rationale:
- runtime overrides enable emergency response without rebuild
- env config enables environment-specific defaults
- code defaults keep behavior deterministic if config is missing
---
## 4) Evaluation Rules (Deterministic, Explicit)
### 4.1 Maintenance mode rules
Maintenance must be able to block the platform fast and consistently.
Default behavior:
- mutate actions: denied unless explicitly allowlisted
- view actions: allowed only for a small allowlist (status page, login, health, static public routes)
This creates a safe “fail closed” posture.
Optional refinement:
- define a maintenance allowlist for critical reads (e.g., dashboards for operators)
### 4.2 Test mode rules
Test mode should primarily exist in non-prod, and should be explicit in prod.
Recommended behavior:
- In prod, test mode should not be enabled accidentally.
- In test environments, test mode may:
- enable test-only endpoints
- bypass external integrations (through adapters)
- relax rate limits
- expose test banners in UI (Blocker-level display)
### 4.3 Feature state rules (per capability)
Given a capability state:
- enabled:
- allow view + mutate (subject to auth/roles)
- visible in UI
- coming_soon:
- allow view of teaser pages/components
- deny mutate and deny sensitive reads
- visible in UI with Coming Soon affordances
- disabled:
- deny view + mutate
- hidden in nav by default
- hidden:
- deny view + mutate
- never visible in UI
Note:
- “disabled” and “hidden” are both blocked; the difference is UI and information disclosure.
### 4.4 Missing configuration
If a capability is not configured:
- treat as hidden (fail closed)
- optionally log a warning (server-side)
---
## 5) Enforcement Mapping (Where Each Requirement Lives)
This section is the “wiring contract” across layers.
### 5.1 API endpoints (authoritative)
- Enforce via Backend Guards (NestJS CanActivate).
- Endpoints must declare the capability they require.
Mapping to HTTP:
- maintenance: 503 Service Unavailable (preferred for global maintenance)
- disabled/hidden: 404 Not Found (avoid advertising unavailable capabilities)
- coming_soon: 404 Not Found publicly, or 409 Conflict internally if you want explicit semantics for trusted clients later
Guideline:
- External clients should not get detailed feature availability information unless explicitly intended.
### 5.2 Website links / navigation (UX)
- Enforce via Frontend Blockers.
- Hide links when state is disabled/hidden.
- For coming_soon, show link but route to teaser page or disable with explanation.
Rules:
- Never assume hidden in UI equals enforced on server.
- UI should degrade gracefully (API may still block).
### 5.3 Website components (UX)
- Use Blockers to:
- hide components for hidden/disabled
- show teaser content for coming_soon
- disable buttons or flows for coming_soon/disabled, with consistent messaging
Recommendation:
- Provide a single reusable component (FeatureBlocker) that consumes policy decisions and renders:
- children when allowed
- teaser when coming_soon
- null or fallback when disabled/hidden
---
## 6) Build-Time vs Runtime (Clean, Predictable)
### 6.1 Build-time flags (require rebuild/redeploy)
What they are good for:
- preventing unfinished UI code from shipping in a bundle
- cutting entire routes/components from builds for deterministic releases
Limitations:
- NEXT_PUBLIC_* values are compiled into the client bundle; changing them does not update clients without rebuild.
Use build-time flags for:
- experimental UI
- “not yet shipped” components/routes
- simplifying deployments (pre-launch vs alpha style gating)
### 6.2 Runtime flags (no rebuild)
What they are for:
- maintenance mode
- emergency disable for broken endpoints
- quickly hiding risky features
Runtime flags must be available to:
- API Guards (always)
- Website SSR/middleware optionally
- Website client optionally (for UX only)
Key tradeoff:
- runtime access introduces caching and latency concerns
- treat runtime policy reads as cached, fast, and resilient
Recommended approach:
- API is authoritative source of runtime policy
- website can optionally consume a cached policy snapshot endpoint
---
## 7) Storage and Distribution (Now + Future Super Admin)
### 7.1 Now (no super admin UI)
Use a single “policy snapshot” stored in one place and read by the API, with caching.
Options (in priority order):
1. Remote KV/DB-backed policy snapshot (preferred for true runtime changes)
2. Environment variable JSON (simpler, but changes require restart/redeploy)
3. Static config file in repo (requires rebuild/redeploy)
### 7.2 Future (super admin UI)
Super admin becomes a writer to the same store.
Non-negotiable:
- The storage schema must be stable and versioned.
Recommended schema (conceptual):
- policyVersion
- operationalMode
- capabilities: map of capabilityKey -> featureState
- allowlists: maintenance view/mutate allowlists
- optional targeting rules later (by role/user)
---
## 8) Data Flow (Conceptual)
```mermaid
flowchart LR
UI[Website UI] --> FB[Frontend Blockers]
FB --> PC[Policy Client]
UI --> API[API Request]
API --> FG[Feature Guard]
FG --> AS[API Application Service]
AS --> UC[Core Use Case]
PC --> PS[Policy Snapshot]
FG --> PS
```
Interpretation:
- Website reads policy for UX (best-effort).
- API enforces policy (authoritative) before any application logic.
---
## 9) Implementation Checklist (For Code Mode)
Backend (apps/api):
- Define capability keys and feature states as shared types in a local module.
- Create FeaturePolicyService that resolves the current policy snapshot (cached).
- Add FeatureFlagGuard (or FeatureAvailabilityGuard) that:
- reads required capability metadata for an endpoint
- evaluates allow/deny with actionType
- maps denial to the chosen HTTP status codes
Frontend (apps/website):
- Add a small PolicyClient that fetches policy snapshot from API (optional for phase 1).
- Add FeatureBlocker component for consistent UI behavior.
- Centralize navigation link definitions and filter them via policy.
Ops/Config:
- Define how maintenance mode is toggled (KV/DB entry or config endpoint restricted to operators later).
- Ensure defaults are safe (fail closed).
---
## 10) Non-Goals (Explicit)
- This system is not an authorization system.
- Roles/permissions are separate (but can be added as actorContext inputs later).
- Blockers never replace Guards.