authentication authorization

This commit is contained in:
2025-12-26 15:32:22 +01:00
parent 68ae9da22a
commit 64377de548
54 changed files with 2833 additions and 95 deletions

View File

@@ -0,0 +1,256 @@
# Authorization (Roles + Permissions)
This document defines the **authorization concept** for GridPilot, based on a clear role taxonomy and a permission-first model that scales to:
- system/global admins
- league-scoped admins/stewards
- sponsor-scoped admins
- team-scoped admins
- future “super admin” tooling
It complements (but does not replace) feature availability:
- Feature availability answers: “Is this capability enabled at all?”
- Authorization answers: “Is this actor allowed to do it?”
Related:
- Feature gating concept: docs/architecture/FEATURE_AVAILABILITY.md
---
## 1) Terms
### 1.1 Actor
The authenticated user performing a request.
### 1.2 Resource Scope
A resource boundary that defines where a role applies:
- **system**: global platform scope
- **league**: role applies only inside a league
- **sponsor**: role applies only inside a sponsor account
- **team**: role applies only inside a team
### 1.3 Permission
A normalized action on a capability, expressed as:
- `capabilityKey`
- `actionType` (`view` or `mutate`)
Examples:
- `league.admin.members` + `mutate`
- `league.stewarding.protests` + `view`
- `sponsors.portal` + `view`
---
## 2) Role Taxonomy (Canonical)
These are the roles you described, organized by scope.
### 2.1 System Roles (global)
- `owner`
Highest authority. Intended for a tiny set of internal operators.
- `admin`
Platform admin. Can manage most platform features.
### 2.2 League Roles (scoped to a leagueId)
- `league_owner`
Full control over that league.
- `league_admin`
Admin control over that league.
- `league_steward`
Stewarding workflow privileges (protests, penalties, reviews), plus any explicitly granted admin powers.
### 2.3 Sponsor Roles (scoped to a sponsorId)
- `sponsor_owner`
Full control over that sponsor account.
- `sponsor_admin`
Admin control for sponsor account operations.
### 2.4 Team Roles (scoped to a teamId)
- `team_owner`
Full control over that team.
- `team_admin`
Admin control for team operations.
### 2.5 Default Role
- `user`
Every authenticated account has this implicitly.
Notes:
- “Role” is an access label; it is not a separate identity type. Admins, drivers, team captains are still “users”.
---
## 3) Role Composition Rules
Authorization is evaluated with **role composition**:
1) **System roles** apply everywhere.
2) **Scoped roles** apply only when the request targets that scope.
Examples:
- A user can be `league_admin` in League A and just `user` in League B.
- A system `admin` is allowed even without scoped roles (unless an endpoint explicitly requires scoped membership).
---
## 4) Permission-First Model (Recommended)
Instead of scattering checks like “is admin?” across controllers/services, define:
- a small, stable set of permissions (capabilityKey + actionType)
- a role → permission mapping table
- membership resolvers that answer: “what scoped roles does this actor have for this resourceId?”
### 4.1 Why permission-first
- Centralizes security logic
- Makes audit/review simpler
- Avoids “new endpoint forgot a check”
- Enables future super-admin tooling by manipulating roles/permissions cleanly
---
## 5) Default Access Policy (Protect All Endpoints)
To properly “protect all endpoints”, the platform must move to:
### 5.1 Deny-by-default
- Every API route requires an authenticated actor **unless explicitly marked public**.
### 5.2 Explicit public routes
A route is public only when explicitly marked as such (conceptually “Public metadata”).
This prevents “we forgot to add guards” from becoming a security issue.
### 5.3 Actor identity must not be caller-controlled
Any endpoint that currently accepts identifiers like:
- `performerDriverId`
- `adminId`
- `stewardId`
must stop trusting those fields and derive the actor identity from the authenticated session.
---
## 6) 403 vs 404 (Non-Disclosure Rules)
Use different status codes for different security goals:
### 6.1 Forbidden (403)
Return **403** when:
- the resource exists
- the actor is authenticated
- the actor lacks permission
This is the normal authorization failure.
### 6.2 Not Found (404) for non-disclosure
Return **404** when:
- revealing the existence of the resource would leak sensitive information
- the route is explicitly designated “non-disclosing”
Use this sparingly and intentionally.
### 6.3 Feature availability interaction
Feature availability failures (disabled/hidden/coming soon) should behave as “not found” for public callers, while maintenance mode should return 503. See docs/architecture/FEATURE_AVAILABILITY.md.
---
## 7) Suggested Role → Permission Mapping (First Pass)
This table is a starting point (refine as product scope increases).
### 7.1 System
- `owner`: all permissions
- `admin`: platform-admin permissions (payments admin, sponsor portal admin, moderation)
### 7.2 League
- `league_owner`: all league permissions for that league
- `league_admin`: league management permissions (members, config, seasons, schedule, wallet)
- `league_steward`: stewarding permissions (review protests, apply penalties), and optionally limited admin view permissions
### 7.3 Sponsor
- `sponsor_owner`: all sponsor permissions for that sponsor
- `sponsor_admin`: sponsor operational permissions (view dashboard, manage sponsorship requests, manage sponsor settings)
### 7.4 Team
- `team_owner`: all team permissions for that team
- `team_admin`: team management permissions (update team, manage roster, handle join requests)
---
## 8) Membership Resolvers (Clean Architecture Boundary)
Authorization needs a clean boundary for “does actor have a scoped role for this resource?”
Conceptually:
- League membership repository answers: actors role in leagueId
- Team membership repository answers: actors role in teamId
- Sponsor membership repository answers: actors role in sponsorId
This keeps persistence details out of controllers and allows in-memory adapters for tests.
---
## 9) Example Endpoint Policies (Conceptual)
### 9.1 Public read
- Public league standings page:
- Feature availability: `league.public` view (if you want to gate)
- Authorization: public route (no login)
### 9.2 League admin mutation
- Remove a member from league:
- Requires login
- Requires league scope
- Requires `league.admin.members` mutate
- Returns 403 if not allowed; 404 only if non-disclosure is intended
### 9.3 Stewarding review
- Review protest:
- Requires login
- Requires league scope derived from the protests race/league
- Requires `league.stewarding.protests` mutate
- Actor must be derived from session, not from request body
### 9.4 Payments
- Payments endpoints:
- Requires login
- Likely requires system `admin` or `owner`
---
## 10) Data Flow (Conceptual)
```mermaid
flowchart LR
Req[HTTP Request] --> AuthN[Authenticate actor]
AuthN --> Scope[Resolve resource scope]
Scope --> Roles[Load actor roles for scope]
Roles --> Perms[Evaluate required permissions]
Perms --> Allow{Allow}
Allow -->|Yes| Handler[Route handler]
Allow -->|No| Deny[Deny 401 or 403 or 404]
```
Rules:
- AuthN attaches actor identity to the request.
- Scope resolution loads resource context (leagueId, teamId, sponsorId) from route params or from looked-up entities.
- Required permissions must be declared at the boundary (controller/route metadata).
- Deny-by-default means anything not marked public requires an actor.
---
## 11) What This Enables Later
- A super-admin UI can manage:
- global roles (owner/admin)
- scoped roles (league_owner/admin/steward, sponsor_owner/admin, team_owner/admin)
- Feature availability remains a separate control plane (maintenance mode, coming soon, kill switches), documented in docs/architecture/FEATURE_AVAILABILITY.md.

View File

@@ -0,0 +1,315 @@
# Feature Availability (Modes + Feature Flags)
This document defines a clean, consistent system for enabling/disabling functionality across:
- API endpoints
- Website links/navigation
- Website components
It is designed to support:
- test mode
- maintenance mode
- disabling features due to risk/issues
- coming soon features
- future super admin flag management
It is aligned with the hard separation of responsibilities in `Blockers & Guards`:
- Frontend uses Blockers (UX best-effort)
- Backend uses Guards (authoritative enforcement)
See: docs/architecture/BLOCKER_GUARDS.md
---
## 1) Core Principle
Availability is decided once, then applied in multiple places.
- Backend Guards enforce availability for correctness and security.
- Frontend Blockers reflect availability for UX, but must never be relied on for enforcement.
If it must be enforced, it is a Guard.
If it only improves UX, it is a Blocker.
---
## 2) Definitions (Canonical Vocabulary)
### 2.1 Operational Mode (system-level)
A small, global state representing operational posture.
Recommended enum:
- normal
- maintenance
- test
Operational Mode is:
- authoritative in backend
- typically environment-scoped
- required for rapid response (maintenance must be runtime-changeable)
### 2.2 Feature State (capability-level)
A per-feature state machine (not a boolean).
Recommended enum:
- enabled
- disabled
- coming_soon
- hidden
Semantics:
- enabled: feature is available and advertised
- disabled: feature exists but must not be used (safety kill switch)
- coming_soon: may be visible in UI as teaser, but actions are blocked
- hidden: not visible/advertised; actions are blocked (safest default)
### 2.3 Capability
A named unit of functionality (stable key) used consistently across API + website.
Examples:
- races.create
- payments.checkout
- sponsor.portal
- stewarding.protests
A capability key is a contract.
### 2.4 Action Type
Availability decisions vary by the type of action:
- view: read-only operations (pages, GET endpoints)
- mutate: state-changing operations (POST/PUT/PATCH/DELETE)
---
## 3) Policy Model (What Exists)
### 3.1 FeatureAvailabilityPolicy (single evaluation model)
One evaluation function produces a decision.
Inputs:
- environment (dev/test/prod)
- operationalMode (normal/maintenance/test)
- capabilityKey (string)
- actionType (view/mutate)
- actorContext (anonymous/authenticated; roles later)
Outputs:
- allow: boolean
- publicReason: one of maintenance | disabled | coming_soon | hidden | not_configured
- uxHint: optional { messageKey, redirectPath, showTeaser }
The same decision model is reused by:
- API Guard enforcement
- Website navigation visibility
- Website component rendering/disablement
### 3.2 Precedence (where values come from)
To avoid “mystery behavior”, use strict precedence:
1. runtime overrides (highest priority)
2. build-time environment configuration
3. code defaults (lowest priority, should be safe: hidden/disabled)
Rationale:
- runtime overrides enable emergency response without rebuild
- env config enables environment-specific defaults
- code defaults keep behavior deterministic if config is missing
---
## 4) Evaluation Rules (Deterministic, Explicit)
### 4.1 Maintenance mode rules
Maintenance must be able to block the platform fast and consistently.
Default behavior:
- mutate actions: denied unless explicitly allowlisted
- view actions: allowed only for a small allowlist (status page, login, health, static public routes)
This creates a safe “fail closed” posture.
Optional refinement:
- define a maintenance allowlist for critical reads (e.g., dashboards for operators)
### 4.2 Test mode rules
Test mode should primarily exist in non-prod, and should be explicit in prod.
Recommended behavior:
- In prod, test mode should not be enabled accidentally.
- In test environments, test mode may:
- enable test-only endpoints
- bypass external integrations (through adapters)
- relax rate limits
- expose test banners in UI (Blocker-level display)
### 4.3 Feature state rules (per capability)
Given a capability state:
- enabled:
- allow view + mutate (subject to auth/roles)
- visible in UI
- coming_soon:
- allow view of teaser pages/components
- deny mutate and deny sensitive reads
- visible in UI with Coming Soon affordances
- disabled:
- deny view + mutate
- hidden in nav by default
- hidden:
- deny view + mutate
- never visible in UI
Note:
- “disabled” and “hidden” are both blocked; the difference is UI and information disclosure.
### 4.4 Missing configuration
If a capability is not configured:
- treat as hidden (fail closed)
- optionally log a warning (server-side)
---
## 5) Enforcement Mapping (Where Each Requirement Lives)
This section is the “wiring contract” across layers.
### 5.1 API endpoints (authoritative)
- Enforce via Backend Guards (NestJS CanActivate).
- Endpoints must declare the capability they require.
Mapping to HTTP:
- maintenance: 503 Service Unavailable (preferred for global maintenance)
- disabled/hidden: 404 Not Found (avoid advertising unavailable capabilities)
- coming_soon: 404 Not Found publicly, or 409 Conflict internally if you want explicit semantics for trusted clients later
Guideline:
- External clients should not get detailed feature availability information unless explicitly intended.
### 5.2 Website links / navigation (UX)
- Enforce via Frontend Blockers.
- Hide links when state is disabled/hidden.
- For coming_soon, show link but route to teaser page or disable with explanation.
Rules:
- Never assume hidden in UI equals enforced on server.
- UI should degrade gracefully (API may still block).
### 5.3 Website components (UX)
- Use Blockers to:
- hide components for hidden/disabled
- show teaser content for coming_soon
- disable buttons or flows for coming_soon/disabled, with consistent messaging
Recommendation:
- Provide a single reusable component (FeatureBlocker) that consumes policy decisions and renders:
- children when allowed
- teaser when coming_soon
- null or fallback when disabled/hidden
---
## 6) Build-Time vs Runtime (Clean, Predictable)
### 6.1 Build-time flags (require rebuild/redeploy)
What they are good for:
- preventing unfinished UI code from shipping in a bundle
- cutting entire routes/components from builds for deterministic releases
Limitations:
- NEXT_PUBLIC_* values are compiled into the client bundle; changing them does not update clients without rebuild.
Use build-time flags for:
- experimental UI
- “not yet shipped” components/routes
- simplifying deployments (pre-launch vs alpha style gating)
### 6.2 Runtime flags (no rebuild)
What they are for:
- maintenance mode
- emergency disable for broken endpoints
- quickly hiding risky features
Runtime flags must be available to:
- API Guards (always)
- Website SSR/middleware optionally
- Website client optionally (for UX only)
Key tradeoff:
- runtime access introduces caching and latency concerns
- treat runtime policy reads as cached, fast, and resilient
Recommended approach:
- API is authoritative source of runtime policy
- website can optionally consume a cached policy snapshot endpoint
---
## 7) Storage and Distribution (Now + Future Super Admin)
### 7.1 Now (no super admin UI)
Use a single “policy snapshot” stored in one place and read by the API, with caching.
Options (in priority order):
1. Remote KV/DB-backed policy snapshot (preferred for true runtime changes)
2. Environment variable JSON (simpler, but changes require restart/redeploy)
3. Static config file in repo (requires rebuild/redeploy)
### 7.2 Future (super admin UI)
Super admin becomes a writer to the same store.
Non-negotiable:
- The storage schema must be stable and versioned.
Recommended schema (conceptual):
- policyVersion
- operationalMode
- capabilities: map of capabilityKey -> featureState
- allowlists: maintenance view/mutate allowlists
- optional targeting rules later (by role/user)
---
## 8) Data Flow (Conceptual)
```mermaid
flowchart LR
UI[Website UI] --> FB[Frontend Blockers]
FB --> PC[Policy Client]
UI --> API[API Request]
API --> FG[Feature Guard]
FG --> AS[API Application Service]
AS --> UC[Core Use Case]
PC --> PS[Policy Snapshot]
FG --> PS
```
Interpretation:
- Website reads policy for UX (best-effort).
- API enforces policy (authoritative) before any application logic.
---
## 9) Implementation Checklist (For Code Mode)
Backend (apps/api):
- Define capability keys and feature states as shared types in a local module.
- Create FeaturePolicyService that resolves the current policy snapshot (cached).
- Add FeatureFlagGuard (or FeatureAvailabilityGuard) that:
- reads required capability metadata for an endpoint
- evaluates allow/deny with actionType
- maps denial to the chosen HTTP status codes
Frontend (apps/website):
- Add a small PolicyClient that fetches policy snapshot from API (optional for phase 1).
- Add FeatureBlocker component for consistent UI behavior.
- Centralize navigation link definitions and filter them via policy.
Ops/Config:
- Define how maintenance mode is toggled (KV/DB entry or config endpoint restricted to operators later).
- Ensure defaults are safe (fail closed).
---
## 10) Non-Goals (Explicit)
- This system is not an authorization system.
- Roles/permissions are separate (but can be added as actorContext inputs later).
- Blockers never replace Guards.