This commit is contained in:
2025-12-31 19:55:43 +01:00
parent 8260bf7baf
commit 167e82a52b
66 changed files with 5124 additions and 228 deletions

247
docs/MESSAGING.md Normal file
View File

@@ -0,0 +1,247 @@
# GridPilot — Messaging & Communication System
**Design Document (Code-First, Admin-Safe)**
---
## 1. Goals
The messaging system must:
- be **code-first**
- be **fully versioned**
- be **safe by default**
- prevent admins from breaking tone, structure, or legality
- support **transactional emails**, **announcements**, and **votes**
- give admins **visibility**, not creative control
This is **not** a marketing CMS.
It is infrastructure.
---
## 2. Core Principles
### 2.1 Code is the Source of Truth
- All email templates live in the repository
- No WYSIWYG editors
- No runtime editing by admins
- Templates are reviewed like any other code
### 2.2 Admins Trigger, They Dont Author
Admins can:
- preview
- test
- trigger
- audit
Admins cannot:
- edit wording
- change layout
- inject content
This guarantees:
- consistent voice
- legal safety
- no accidental damage
---
## 3. Template System
### 3.1 Template Structure
Each template defines:
- unique ID
- version
- subject
- body (HTML + plain text)
- allowed variables
- default values
- fallback behavior
Example (conceptual):
- `league_invite_v1`
- `season_start_v2`
- `penalty_applied_v1`
Templates are immutable once deprecated.
---
### 3.2 Variables
- Strictly typed
- Explicit allow-list
- Required vs optional
- Default values for previews
Missing variables:
- never crash delivery
- always fall back safely
---
## 4. Admin Preview & Testing
### 4.1 Preview Mode
Admins can:
- open any template
- see rendered output
- switch between HTML / text
- inspect subject line
Preview uses:
- **test data only**
- never real user data by default
---
### 4.2 Test Send
Admins may:
- send a test email to themselves
- choose a predefined test dataset
- never inject arbitrary values
Purpose:
- sanity check
- formatting validation
- confidence before triggering
---
## 5. Delivery & Audit Trail
Every sent message is logged.
For each send event, store:
- template ID + version
- timestamp
- triggered by (admin/system)
- recipient(s)
- delivery status
- error details (if any)
Admins can view:
- delivery history
- failures
- resend eligibility
---
## 6. Trigger Types
### 6.1 Automatic Triggers
- season start
- race reminder
- protest resolved
- penalty applied
- standings updated
### 6.2 Manual Triggers
- league announcement
- sponsor message
- admin update
- vote launch
Manual triggers are:
- explicit
- logged
- rate-limited
---
## 7. Newsletter Handling
Newsletters follow the same system.
Characteristics:
- predefined formats
- fixed structure
- optional sections
- no free-text editing
Admins can:
- choose newsletter type
- select audience
- trigger send
Admins cannot:
- rewrite copy
- add arbitrary sections
---
## 8. Voting & Poll Messaging
Polls are also template-driven.
Flow:
1. Poll defined in code
2. Admin starts poll
3. System sends notification
4. Users vote
5. Results summarized automatically
Messaging remains:
- neutral
- consistent
- auditable
---
## 9. Admin UI Scope
Admin interface provides:
- template list
- preview button
- test send
- send history
- delivery status
- trigger actions
Admin UI explicitly excludes:
- template editing
- layout controls
- copywriting fields
---
## 10. Why This Matters
This approach ensures:
- trust
- predictability
- legal safety
- consistent brand voice
- low operational risk
- no CMS hell
GridPilot communicates like a tool, not a marketing department.
---
## 11. Non-Goals
This system will NOT:
- support custom admin HTML
- allow per-league copy editing
- replace marketing platforms
- become a newsletter builder
That is intentional.
---
## 12. Summary
**Code defines communication.
Admins execute communication.
Users receive communication they can trust.**
Simple. Stable. Scalable.

199
docs/OBSERVABILITY.md Normal file
View File

@@ -0,0 +1,199 @@
GridPilot — Observability & Data Separation Design
Purpose
This document defines how GridPilot separates business-critical domain data from infrastructure / observability data, while keeping operations simple, self-hosted, and cognitively manageable.
Goals:
• protect domain data at all costs
• avoid tool sprawl
• keep one clear mental model for operations
• enable debugging without polluting business logic
• ensure long-term maintainability
Core Principle
Domain data and infrastructure data must never share the same storage, lifecycle, or access path.
They serve different purposes, have different risk profiles, and must be handled independently.
Data Categories
1. Domain (Business) Data
Includes
• users
• leagues
• seasons
• races
• results
• penalties
• escrow balances
• sponsorship contracts
• payments & payouts
Characteristics
• legally relevant
• trust-critical
• user-facing
• must never be lost
• requires strict migrations and backups
Storage
• Relational database (PostgreSQL)
• Strong consistency (ACID)
• Backups and disaster recovery mandatory
Access
• Application backend
• Custom Admin UI (primary control surface)
2. Infrastructure / Observability Data
Includes
• application logs
• error traces
• metrics (latency, throughput, failures)
• background job status
• system health signals
Characteristics
• high volume
• ephemeral by design
• not user-facing
• safe to rotate or delete
• supports debugging, not business logic
Storage
• Dedicated observability stack
• Completely separate from domain database
Access
• Grafana UI only
• Never exposed to users
• Never queried by application logic
Observability Architecture (Self-Hosted)
GridPilot uses a single consolidated self-hosted observability stack.
Components
• Grafana
• Central UI
• Dashboards
• Alerting
• Single login
• Loki
• Log aggregation
• Append-only
• Schema-less
• Optimized for high-volume logs
• Prometheus
• Metrics collection
• Time-series data
• Alert rules
• Tempo (optional)
• Distributed traces
• Request flow analysis
All components are accessed exclusively through Grafana.
Responsibility Split
Custom Admin (GridPilot)
Handles:
• business workflows
• escrow state visibility
• payment events
• league integrity checks
• moderation actions
• audit views
Never handles:
• raw logs
• metrics
• system traces
Observability Stack (Grafana)
Handles:
• system health
• performance bottlenecks
• error rates
• background job failures
• infrastructure alerts
Never handles:
• business decisions
• user-visible data
• domain state
Logging & Metrics Policy
What is logged
• errors and exceptions
• payment and escrow failures
• background job failures
• unexpected external API responses
• startup and shutdown events
What is not logged
• user personal data
• credentials
• domain state snapshots
• high-frequency debug spam
Alerting Philosophy
Alerts are:
• minimal
• actionable
• rare
Examples:
• payment failure spike
• escrow release delay
• background jobs failing repeatedly
• sustained error rate increase
No vanity alerts.
Rationale
This separation ensures:
• domain data remains clean and safe
• observability data can scale freely
• infra failures never corrupt business data
• operational complexity stays manageable
The system favors clarity over completeness and stability over tooling hype.
Summary
• Domain data lives in PostgreSQL
• Observability data lives in a dedicated stack
• Grafana is the single infra control surface
• Custom Admin is the single business control surface
• No shared storage, no shared lifecycle
This design minimizes risk, cognitive load, and operational overhead while remaining fully extensible.