gridpilot.gg/docs/architecture/FEATURE_AVAILABILITY.md

# Feature Availability (Modes + Feature Flags)

This document defines a clean, consistent system for enabling/disabling functionality across:
- API endpoints
- Website links/navigation
- Website components

It is designed to support:
- test mode
- maintenance mode
- disabling features due to risk/issues
- coming soon features
- future super admin flag management

It is aligned with the hard separation of responsibilities in `Blockers & Guards`:
- Frontend uses Blockers (UX best-effort)
- Backend uses Guards (authoritative enforcement)

See: docs/architecture/BLOCKER_GUARDS.md

---

## 1) Core Principle

Availability is decided once, then applied in multiple places.

- Backend Guards enforce availability for correctness and security.
- Frontend Blockers reflect availability for UX, but must never be relied on for enforcement.

If it must be enforced, it is a Guard.
If it only improves UX, it is a Blocker.

---

## 2) Definitions (Canonical Vocabulary)

### 2.1 Operational Mode (system-level)
A small, global state representing operational posture.

Recommended enum:
- normal
- maintenance
- test

Operational Mode is:
- authoritative in backend
- typically environment-scoped
- required for rapid response (maintenance must be runtime-changeable)

### 2.2 Feature State (capability-level)
A per-feature state machine (not a boolean).

Recommended enum:
- enabled
- disabled
- coming_soon
- hidden

Semantics:
- enabled: feature is available and advertised
- disabled: feature exists but must not be used (safety kill switch)
- coming_soon: may be visible in UI as teaser, but actions are blocked
- hidden: not visible/advertised; actions are blocked (safest default)

### 2.3 Capability
A named unit of functionality (stable key) used consistently across API + website.

Examples:
- races.create
- payments.checkout
- sponsor.portal
- stewarding.protests

A capability key is a contract.

### 2.4 Action Type
Availability decisions vary by the type of action:
- view: read-only operations (pages, GET endpoints)
- mutate: state-changing operations (POST/PUT/PATCH/DELETE)

---

## 3) Policy Model (What Exists)

### 3.1 FeatureAvailabilityPolicy (single evaluation model)
One evaluation function produces a decision.

Inputs:
- environment (dev/test/prod)
- operationalMode (normal/maintenance/test)
- capabilityKey (string)
- actionType (view/mutate)
- actorContext (anonymous/authenticated; roles later)

Outputs:
- allow: boolean
- publicReason: one of maintenance | disabled | coming_soon | hidden | not_configured
- uxHint: optional { messageKey, redirectPath, showTeaser }

The same decision model is reused by:
- API Guard enforcement
- Website navigation visibility
- Website component rendering/disablement

### 3.2 Precedence (where values come from)
To avoid “mystery behavior”, use strict precedence:

1. runtime overrides (highest priority)
2. build-time environment configuration
3. code defaults (lowest priority, should be safe: hidden/disabled)

Rationale:
- runtime overrides enable emergency response without rebuild
- env config enables environment-specific defaults
- code defaults keep behavior deterministic if config is missing

---

## 4) Evaluation Rules (Deterministic, Explicit)

### 4.1 Maintenance mode rules
Maintenance must be able to block the platform fast and consistently.

Default behavior:
- mutate actions: denied unless explicitly allowlisted
- view actions: allowed only for a small allowlist (status page, login, health, static public routes)

This creates a safe “fail closed” posture.

Optional refinement:
- define a maintenance allowlist for critical reads (e.g., dashboards for operators)

### 4.2 Test mode rules
Test mode should primarily exist in non-prod, and should be explicit in prod.

Recommended behavior:
- In prod, test mode should not be enabled accidentally.
- In test environments, test mode may:
  - enable test-only endpoints
  - bypass external integrations (through adapters)
  - relax rate limits
  - expose test banners in UI (Blocker-level display)

### 4.3 Feature state rules (per capability)
Given a capability state:

- enabled:
  - allow view + mutate (subject to auth/roles)
  - visible in UI
- coming_soon:
  - allow view of teaser pages/components
  - deny mutate and deny sensitive reads
  - visible in UI with Coming Soon affordances
- disabled:
  - deny view + mutate
  - hidden in nav by default
- hidden:
  - deny view + mutate
  - never visible in UI

Note:
- “disabled” and “hidden” are both blocked; the difference is UI and information disclosure.

### 4.4 Missing configuration
If a capability is not configured:
- treat as hidden (fail closed)
- optionally log a warning (server-side)

---

## 5) Enforcement Mapping (Where Each Requirement Lives)

This section is the “wiring contract” across layers.

### 5.1 API endpoints (authoritative)
- Enforce via Backend Guards (NestJS CanActivate).
- Endpoints must declare the capability they require.

Mapping to HTTP:
- maintenance: 503 Service Unavailable (preferred for global maintenance)
- disabled/hidden: 404 Not Found (avoid advertising unavailable capabilities)
- coming_soon: 404 Not Found publicly, or 409 Conflict internally if you want explicit semantics for trusted clients later

Guideline:
- External clients should not get detailed feature availability information unless explicitly intended.

### 5.2 Website links / navigation (UX)
- Enforce via Frontend Blockers.
- Hide links when state is disabled/hidden.
- For coming_soon, show link but route to teaser page or disable with explanation.

Rules:
- Never assume hidden in UI equals enforced on server.
- UI should degrade gracefully (API may still block).

### 5.3 Website components (UX)
- Use Blockers to:
  - hide components for hidden/disabled
  - show teaser content for coming_soon
  - disable buttons or flows for coming_soon/disabled, with consistent messaging

Recommendation:
- Provide a single reusable component (FeatureBlocker) that consumes policy decisions and renders:
  - children when allowed
  - teaser when coming_soon
  - null or fallback when disabled/hidden

---

## 6) Build-Time vs Runtime (Clean, Predictable)

### 6.1 Build-time flags (require rebuild/redeploy)
What they are good for:
- preventing unfinished UI code from shipping in a bundle
- cutting entire routes/components from builds for deterministic releases

Limitations:
- NEXT_PUBLIC_* values are compiled into the client bundle; changing them does not update clients without rebuild.

Use build-time flags for:
- experimental UI
- “not yet shipped” components/routes
- simplifying deployments (pre-launch vs alpha style gating)

### 6.2 Runtime flags (no rebuild)
What they are for:
- maintenance mode
- emergency disable for broken endpoints
- quickly hiding risky features

Runtime flags must be available to:
- API Guards (always)
- Website SSR/middleware optionally
- Website client optionally (for UX only)

Key tradeoff:
- runtime access introduces caching and latency concerns
- treat runtime policy reads as cached, fast, and resilient

Recommended approach:
- API is authoritative source of runtime policy
- website can optionally consume a cached policy snapshot endpoint

---

## 7) Storage and Distribution (Now + Future Super Admin)

### 7.1 Now (no super admin UI)
Use a single “policy snapshot” stored in one place and read by the API, with caching.

Options (in priority order):
1. Remote KV/DB-backed policy snapshot (preferred for true runtime changes)
2. Environment variable JSON (simpler, but changes require restart/redeploy)
3. Static config file in repo (requires rebuild/redeploy)

### 7.2 Future (super admin UI)
Super admin becomes a writer to the same store.

Non-negotiable:
- The storage schema must be stable and versioned.

Recommended schema (conceptual):
- policyVersion
- operationalMode
- capabilities: map of capabilityKey -> featureState
- allowlists: maintenance view/mutate allowlists
- optional targeting rules later (by role/user)

---

## 8) Data Flow (Conceptual)

```mermaid
flowchart LR
  UI[Website UI] --> FB[Frontend Blockers]
  FB --> PC[Policy Client]
  UI --> API[API Request]
  API --> FG[Feature Guard]
  FG --> AS[API Application Service]
  AS --> UC[Core Use Case]
  PC --> PS[Policy Snapshot]
  FG --> PS
```

Interpretation:
- Website reads policy for UX (best-effort).
- API enforces policy (authoritative) before any application logic.

---

## 9) Implementation Checklist (For Code Mode)

Backend (apps/api):
- Define capability keys and feature states as shared types in a local module.
- Create FeaturePolicyService that resolves the current policy snapshot (cached).
- Add FeatureFlagGuard (or FeatureAvailabilityGuard) that:
  - reads required capability metadata for an endpoint
  - evaluates allow/deny with actionType
  - maps denial to the chosen HTTP status codes

Frontend (apps/website):
- Add a small PolicyClient that fetches policy snapshot from API (optional for phase 1).
- Add FeatureBlocker component for consistent UI behavior.
- Centralize navigation link definitions and filter them via policy.

Ops/Config:
- Define how maintenance mode is toggled (KV/DB entry or config endpoint restricted to operators later).
- Ensure defaults are safe (fail closed).

---

## 10) Non-Goals (Explicit)
- This system is not an authorization system.
- Roles/permissions are separate (but can be added as actorContext inputs later).
- Blockers never replace Guards.