authentication authorization

2025-12-26 15:32:22 +01:00
parent 68ae9da22a
commit 64377de548
54 changed files with 2833 additions and 95 deletions
--- a/docs/architecture/AUTHORIZATION.md
+++ b/docs/architecture/AUTHORIZATION.md
@@ -0,0 +1,256 @@
+# Authorization (Roles + Permissions)
+
+This document defines the **authorization concept** for GridPilot, based on a clear role taxonomy and a permission-first model that scales to:
+- system/global admins
+- league-scoped admins/stewards
+- sponsor-scoped admins
+- team-scoped admins
+- future “super admin” tooling
+
+It complements (but does not replace) feature availability:
+- Feature availability answers: “Is this capability enabled at all?”
+- Authorization answers: “Is this actor allowed to do it?”
+
+Related:
+- Feature gating concept: docs/architecture/FEATURE_AVAILABILITY.md
+
+
+---
+
+## 1) Terms
+
+### 1.1 Actor
+The authenticated user performing a request.
+
+### 1.2 Resource Scope
+A resource boundary that defines where a role applies:
+- **system**: global platform scope
+- **league**: role applies only inside a league
+- **sponsor**: role applies only inside a sponsor account
+- **team**: role applies only inside a team
+
+### 1.3 Permission
+A normalized action on a capability, expressed as:
+- `capabilityKey`
+- `actionType` (`view` or `mutate`)
+
+Examples:
+- `league.admin.members` + `mutate`
+- `league.stewarding.protests` + `view`
+- `sponsors.portal` + `view`
+
+
+---
+
+## 2) Role Taxonomy (Canonical)
+
+These are the roles you described, organized by scope.
+
+### 2.1 System Roles (global)
+- `owner`  
+  Highest authority. Intended for a tiny set of internal operators.
+- `admin`  
+  Platform admin. Can manage most platform features.
+
+### 2.2 League Roles (scoped to a leagueId)
+- `league_owner`  
+  Full control over that league.
+- `league_admin`  
+  Admin control over that league.
+- `league_steward`  
+  Stewarding workflow privileges (protests, penalties, reviews), plus any explicitly granted admin powers.
+
+### 2.3 Sponsor Roles (scoped to a sponsorId)
+- `sponsor_owner`  
+  Full control over that sponsor account.
+- `sponsor_admin`  
+  Admin control for sponsor account operations.
+
+### 2.4 Team Roles (scoped to a teamId)
+- `team_owner`  
+  Full control over that team.
+- `team_admin`  
+  Admin control for team operations.
+
+### 2.5 Default Role
+- `user`  
+  Every authenticated account has this implicitly.
+
+Notes:
+- “Role” is an access label; it is not a separate identity type. Admins, drivers, team captains are still “users”.
+
+
+---
+
+## 3) Role Composition Rules
+
+Authorization is evaluated with **role composition**:
+
+1) **System roles** apply everywhere.
+2) **Scoped roles** apply only when the request targets that scope.
+
+Examples:
+- A user can be `league_admin` in League A and just `user` in League B.
+- A system `admin` is allowed even without scoped roles (unless an endpoint explicitly requires scoped membership).
+
+
+---
+
+## 4) Permission-First Model (Recommended)
+
+Instead of scattering checks like “is admin?” across controllers/services, define:
+- a small, stable set of permissions (capabilityKey + actionType)
+- a role → permission mapping table
+- membership resolvers that answer: “what scoped roles does this actor have for this resourceId?”
+
+### 4.1 Why permission-first
+- Centralizes security logic
+- Makes audit/review simpler
+- Avoids “new endpoint forgot a check”
+- Enables future super-admin tooling by manipulating roles/permissions cleanly
+
+
+---
+
+## 5) Default Access Policy (Protect All Endpoints)
+
+To properly “protect all endpoints”, the platform must move to:
+
+### 5.1 Deny-by-default
+- Every API route requires an authenticated actor **unless explicitly marked public**.
+
+### 5.2 Explicit public routes
+A route is public only when explicitly marked as such (conceptually “Public metadata”).
+
+This prevents “we forgot to add guards” from becoming a security issue.
+
+### 5.3 Actor identity must not be caller-controlled
+Any endpoint that currently accepts identifiers like:
+- `performerDriverId`
+- `adminId`
+- `stewardId`
+must stop trusting those fields and derive the actor identity from the authenticated session.
+
+
+---
+
+## 6) 403 vs 404 (Non-Disclosure Rules)
+
+Use different status codes for different security goals:
+
+### 6.1 Forbidden (403)
+Return **403** when:
+- the resource exists
+- the actor is authenticated
+- the actor lacks permission
+
+This is the normal authorization failure.
+
+### 6.2 Not Found (404) for non-disclosure
+Return **404** when:
+- revealing the existence of the resource would leak sensitive information
+- the route is explicitly designated “non-disclosing”
+
+Use this sparingly and intentionally.
+
+### 6.3 Feature availability interaction
+Feature availability failures (disabled/hidden/coming soon) should behave as “not found” for public callers, while maintenance mode should return 503. See docs/architecture/FEATURE_AVAILABILITY.md.
+
+
+---
+
+## 7) Suggested Role → Permission Mapping (First Pass)
+
+This table is a starting point (refine as product scope increases).
+
+### 7.1 System
+- `owner`: all permissions
+- `admin`: platform-admin permissions (payments admin, sponsor portal admin, moderation)
+
+### 7.2 League
+- `league_owner`: all league permissions for that league
+- `league_admin`: league management permissions (members, config, seasons, schedule, wallet)
+- `league_steward`: stewarding permissions (review protests, apply penalties), and optionally limited admin view permissions
+
+### 7.3 Sponsor
+- `sponsor_owner`: all sponsor permissions for that sponsor
+- `sponsor_admin`: sponsor operational permissions (view dashboard, manage sponsorship requests, manage sponsor settings)
+
+### 7.4 Team
+- `team_owner`: all team permissions for that team
+- `team_admin`: team management permissions (update team, manage roster, handle join requests)
+
+
+---
+
+## 8) Membership Resolvers (Clean Architecture Boundary)
+
+Authorization needs a clean boundary for “does actor have a scoped role for this resource?”
+
+Conceptually:
+- League membership repository answers: actor’s role in leagueId
+- Team membership repository answers: actor’s role in teamId
+- Sponsor membership repository answers: actor’s role in sponsorId
+
+This keeps persistence details out of controllers and allows in-memory adapters for tests.
+
+
+---
+
+## 9) Example Endpoint Policies (Conceptual)
+
+### 9.1 Public read
+- Public league standings page:
+  - Feature availability: `league.public` view (if you want to gate)
+  - Authorization: public route (no login)
+
+### 9.2 League admin mutation
+- Remove a member from league:
+  - Requires login
+  - Requires league scope
+  - Requires `league.admin.members` mutate
+  - Returns 403 if not allowed; 404 only if non-disclosure is intended
+
+### 9.3 Stewarding review
+- Review protest:
+  - Requires login
+  - Requires league scope derived from the protest’s race/league
+  - Requires `league.stewarding.protests` mutate
+  - Actor must be derived from session, not from request body
+
+### 9.4 Payments
+- Payments endpoints:
+  - Requires login
+  - Likely requires system `admin` or `owner`
+
+
+---
+
+## 10) Data Flow (Conceptual)
+
+```mermaid
+flowchart LR
+  Req[HTTP Request] --> AuthN[Authenticate actor]
+  AuthN --> Scope[Resolve resource scope]
+  Scope --> Roles[Load actor roles for scope]
+  Roles --> Perms[Evaluate required permissions]
+  Perms --> Allow{Allow}
+  Allow -->|Yes| Handler[Route handler]
+  Allow -->|No| Deny[Deny 401 or 403 or 404]
+```
+
+Rules:
+- AuthN attaches actor identity to the request.
+- Scope resolution loads resource context (leagueId, teamId, sponsorId) from route params or from looked-up entities.
+- Required permissions must be declared at the boundary (controller/route metadata).
+- Deny-by-default means anything not marked public requires an actor.
+
+
+---
+
+## 11) What This Enables Later
+
+- A super-admin UI can manage:
+  - global roles (owner/admin)
+  - scoped roles (league_owner/admin/steward, sponsor_owner/admin, team_owner/admin)
+- Feature availability remains a separate control plane (maintenance mode, coming soon, kill switches), documented in docs/architecture/FEATURE_AVAILABILITY.md.
--- a/docs/architecture/FEATURE_AVAILABILITY.md
+++ b/docs/architecture/FEATURE_AVAILABILITY.md
@@ -0,0 +1,315 @@
+# Feature Availability (Modes + Feature Flags)
+
+This document defines a clean, consistent system for enabling/disabling functionality across:
+- API endpoints
+- Website links/navigation
+- Website components
+
+It is designed to support:
+- test mode
+- maintenance mode
+- disabling features due to risk/issues
+- coming soon features
+- future super admin flag management
+
+It is aligned with the hard separation of responsibilities in `Blockers & Guards`:
+- Frontend uses Blockers (UX best-effort)
+- Backend uses Guards (authoritative enforcement)
+
+See: docs/architecture/BLOCKER_GUARDS.md
+
+---
+
+## 1) Core Principle
+
+Availability is decided once, then applied in multiple places.
+
+- Backend Guards enforce availability for correctness and security.
+- Frontend Blockers reflect availability for UX, but must never be relied on for enforcement.
+
+If it must be enforced, it is a Guard.
+If it only improves UX, it is a Blocker.
+
+---
+
+## 2) Definitions (Canonical Vocabulary)
+
+### 2.1 Operational Mode (system-level)
+A small, global state representing operational posture.
+
+Recommended enum:
+- normal
+- maintenance
+- test
+
+Operational Mode is:
+- authoritative in backend
+- typically environment-scoped
+- required for rapid response (maintenance must be runtime-changeable)
+
+### 2.2 Feature State (capability-level)
+A per-feature state machine (not a boolean).
+
+Recommended enum:
+- enabled
+- disabled
+- coming_soon
+- hidden
+
+Semantics:
+- enabled: feature is available and advertised
+- disabled: feature exists but must not be used (safety kill switch)
+- coming_soon: may be visible in UI as teaser, but actions are blocked
+- hidden: not visible/advertised; actions are blocked (safest default)
+
+### 2.3 Capability
+A named unit of functionality (stable key) used consistently across API + website.
+
+Examples:
+- races.create
+- payments.checkout
+- sponsor.portal
+- stewarding.protests
+
+A capability key is a contract.
+
+### 2.4 Action Type
+Availability decisions vary by the type of action:
+- view: read-only operations (pages, GET endpoints)
+- mutate: state-changing operations (POST/PUT/PATCH/DELETE)
+
+---
+
+## 3) Policy Model (What Exists)
+
+### 3.1 FeatureAvailabilityPolicy (single evaluation model)
+One evaluation function produces a decision.
+
+Inputs:
+- environment (dev/test/prod)
+- operationalMode (normal/maintenance/test)
+- capabilityKey (string)
+- actionType (view/mutate)
+- actorContext (anonymous/authenticated; roles later)
+
+Outputs:
+- allow: boolean
+- publicReason: one of maintenance | disabled | coming_soon | hidden | not_configured
+- uxHint: optional { messageKey, redirectPath, showTeaser }
+
+The same decision model is reused by:
+- API Guard enforcement
+- Website navigation visibility
+- Website component rendering/disablement
+
+### 3.2 Precedence (where values come from)
+To avoid “mystery behavior”, use strict precedence:
+
+1. runtime overrides (highest priority)
+2. build-time environment configuration
+3. code defaults (lowest priority, should be safe: hidden/disabled)
+
+Rationale:
+- runtime overrides enable emergency response without rebuild
+- env config enables environment-specific defaults
+- code defaults keep behavior deterministic if config is missing
+
+---
+
+## 4) Evaluation Rules (Deterministic, Explicit)
+
+### 4.1 Maintenance mode rules
+Maintenance must be able to block the platform fast and consistently.
+
+Default behavior:
+- mutate actions: denied unless explicitly allowlisted
+- view actions: allowed only for a small allowlist (status page, login, health, static public routes)
+
+This creates a safe “fail closed” posture.
+
+Optional refinement:
+- define a maintenance allowlist for critical reads (e.g., dashboards for operators)
+
+### 4.2 Test mode rules
+Test mode should primarily exist in non-prod, and should be explicit in prod.
+
+Recommended behavior:
+- In prod, test mode should not be enabled accidentally.
+- In test environments, test mode may:
+  - enable test-only endpoints
+  - bypass external integrations (through adapters)
+  - relax rate limits
+  - expose test banners in UI (Blocker-level display)
+
+### 4.3 Feature state rules (per capability)
+Given a capability state:
+
+- enabled:
+  - allow view + mutate (subject to auth/roles)
+  - visible in UI
+- coming_soon:
+  - allow view of teaser pages/components
+  - deny mutate and deny sensitive reads
+  - visible in UI with Coming Soon affordances
+- disabled:
+  - deny view + mutate
+  - hidden in nav by default
+- hidden:
+  - deny view + mutate
+  - never visible in UI
+
+Note:
+- “disabled” and “hidden” are both blocked; the difference is UI and information disclosure.
+
+### 4.4 Missing configuration
+If a capability is not configured:
+- treat as hidden (fail closed)
+- optionally log a warning (server-side)
+
+---
+
+## 5) Enforcement Mapping (Where Each Requirement Lives)
+
+This section is the “wiring contract” across layers.
+
+### 5.1 API endpoints (authoritative)
+- Enforce via Backend Guards (NestJS CanActivate).
+- Endpoints must declare the capability they require.
+
+Mapping to HTTP:
+- maintenance: 503 Service Unavailable (preferred for global maintenance)
+- disabled/hidden: 404 Not Found (avoid advertising unavailable capabilities)
+- coming_soon: 404 Not Found publicly, or 409 Conflict internally if you want explicit semantics for trusted clients later
+
+Guideline:
+- External clients should not get detailed feature availability information unless explicitly intended.
+
+### 5.2 Website links / navigation (UX)
+- Enforce via Frontend Blockers.
+- Hide links when state is disabled/hidden.
+- For coming_soon, show link but route to teaser page or disable with explanation.
+
+Rules:
+- Never assume hidden in UI equals enforced on server.
+- UI should degrade gracefully (API may still block).
+
+### 5.3 Website components (UX)
+- Use Blockers to:
+  - hide components for hidden/disabled
+  - show teaser content for coming_soon
+  - disable buttons or flows for coming_soon/disabled, with consistent messaging
+
+Recommendation:
+- Provide a single reusable component (FeatureBlocker) that consumes policy decisions and renders:
+  - children when allowed
+  - teaser when coming_soon
+  - null or fallback when disabled/hidden
+
+---
+
+## 6) Build-Time vs Runtime (Clean, Predictable)
+
+### 6.1 Build-time flags (require rebuild/redeploy)
+What they are good for:
+- preventing unfinished UI code from shipping in a bundle
+- cutting entire routes/components from builds for deterministic releases
+
+Limitations:
+- NEXT_PUBLIC_* values are compiled into the client bundle; changing them does not update clients without rebuild.
+
+Use build-time flags for:
+- experimental UI
+- “not yet shipped” components/routes
+- simplifying deployments (pre-launch vs alpha style gating)
+
+### 6.2 Runtime flags (no rebuild)
+What they are for:
+- maintenance mode
+- emergency disable for broken endpoints
+- quickly hiding risky features
+
+Runtime flags must be available to:
+- API Guards (always)
+- Website SSR/middleware optionally
+- Website client optionally (for UX only)
+
+Key tradeoff:
+- runtime access introduces caching and latency concerns
+- treat runtime policy reads as cached, fast, and resilient
+
+Recommended approach:
+- API is authoritative source of runtime policy
+- website can optionally consume a cached policy snapshot endpoint
+
+---
+
+## 7) Storage and Distribution (Now + Future Super Admin)
+
+### 7.1 Now (no super admin UI)
+Use a single “policy snapshot” stored in one place and read by the API, with caching.
+
+Options (in priority order):
+1. Remote KV/DB-backed policy snapshot (preferred for true runtime changes)
+2. Environment variable JSON (simpler, but changes require restart/redeploy)
+3. Static config file in repo (requires rebuild/redeploy)
+
+### 7.2 Future (super admin UI)
+Super admin becomes a writer to the same store.
+
+Non-negotiable:
+- The storage schema must be stable and versioned.
+
+Recommended schema (conceptual):
+- policyVersion
+- operationalMode
+- capabilities: map of capabilityKey -> featureState
+- allowlists: maintenance view/mutate allowlists
+- optional targeting rules later (by role/user)
+
+---
+
+## 8) Data Flow (Conceptual)
+
+```mermaid
+flowchart LR
+  UI[Website UI] --> FB[Frontend Blockers]
+  FB --> PC[Policy Client]
+  UI --> API[API Request]
+  API --> FG[Feature Guard]
+  FG --> AS[API Application Service]
+  AS --> UC[Core Use Case]
+  PC --> PS[Policy Snapshot]
+  FG --> PS
+```
+
+Interpretation:
+- Website reads policy for UX (best-effort).
+- API enforces policy (authoritative) before any application logic.
+
+---
+
+## 9) Implementation Checklist (For Code Mode)
+
+Backend (apps/api):
+- Define capability keys and feature states as shared types in a local module.
+- Create FeaturePolicyService that resolves the current policy snapshot (cached).
+- Add FeatureFlagGuard (or FeatureAvailabilityGuard) that:
+  - reads required capability metadata for an endpoint
+  - evaluates allow/deny with actionType
+  - maps denial to the chosen HTTP status codes
+
+Frontend (apps/website):
+- Add a small PolicyClient that fetches policy snapshot from API (optional for phase 1).
+- Add FeatureBlocker component for consistent UI behavior.
+- Centralize navigation link definitions and filter them via policy.
+
+Ops/Config:
+- Define how maintenance mode is toggled (KV/DB entry or config endpoint restricted to operators later).
+- Ensure defaults are safe (fail closed).
+
+---
+
+## 10) Non-Goals (Explicit)
+- This system is not an authorization system.
+- Roles/permissions are separate (but can be added as actorContext inputs later).
+- Blockers never replace Guards.