328 lines
9.2 KiB
Markdown
328 lines
9.2 KiB
Markdown
|
|
# AI Dev Roadmap
|
||
|
|
|
||
|
|
## Purpose
|
||
|
|
|
||
|
|
This document defines how TalkEdit can evolve toward highly autonomous AI-driven implementation and debugging.
|
||
|
|
|
||
|
|
Goal: AI can execute most engineering work end-to-end with minimal human feedback while preserving safety, quality, and product intent.
|
||
|
|
|
||
|
|
## Scope
|
||
|
|
|
||
|
|
- Frontend: React + TypeScript + Vite
|
||
|
|
- Desktop host: Tauri
|
||
|
|
- Backend: FastAPI + Python services
|
||
|
|
- Media pipeline: FFmpeg, transcription, audio processing
|
||
|
|
|
||
|
|
## Autonomy Target
|
||
|
|
|
||
|
|
- Near-term target: 80-90% autonomous execution for well-scoped work.
|
||
|
|
- Mid-term target: 90-95% for low/medium-risk features with CI gates.
|
||
|
|
- 100% no-feedback autonomy is not realistic for ambiguous product decisions, legal/security tradeoffs, or high-risk migrations.
|
||
|
|
|
||
|
|
## Core Principles
|
||
|
|
|
||
|
|
1. Specs are executable and machine-readable.
|
||
|
|
2. Tests are the primary source of truth for completion.
|
||
|
|
3. Every failure is diagnosable from logs/artifacts.
|
||
|
|
4. AI has bounded permissions and policy guardrails.
|
||
|
|
5. AI updates docs and memory as part of done criteria.
|
||
|
|
|
||
|
|
## Execution Status (2026-04-15)
|
||
|
|
|
||
|
|
### Completed
|
||
|
|
|
||
|
|
1. Added roadmap companion docs:
|
||
|
|
- `docs/spec-template.md`
|
||
|
|
- `docs/ai-policy.md`
|
||
|
|
- `docs/runbooks/error-codes.md`
|
||
|
|
2. Added operational scripts:
|
||
|
|
- `scripts/validate-all.sh`
|
||
|
|
- `scripts/collect-diagnostics.sh`
|
||
|
|
3. Ran Step 1 validation script (`./scripts/validate-all.sh`).
|
||
|
|
4. Ran Step 2 diagnostics script (`./scripts/collect-diagnostics.sh`).
|
||
|
|
5. Captured diagnostics archive:
|
||
|
|
- `.diagnostics/diag_20260415_163239.tar.gz`
|
||
|
|
6. Renamed roadmap file to `AI_dev_plan.md`.
|
||
|
|
|
||
|
|
### Current Blockers
|
||
|
|
|
||
|
|
1. Frontend lint baseline is not green yet.
|
||
|
|
2. Remaining lint issues are mostly pre-existing unused vars and hook dependency warnings across app components.
|
||
|
|
|
||
|
|
### Next Actions
|
||
|
|
|
||
|
|
1. Triage existing lint findings into:
|
||
|
|
- safe autofix
|
||
|
|
- manual low-risk cleanup
|
||
|
|
- intentional warnings to suppress with justification
|
||
|
|
2. Reach green `./scripts/validate-all.sh` in local dev.
|
||
|
|
3. Add CI workflow to enforce `validate-all` on pull requests.
|
||
|
|
|
||
|
|
## Roadmap Phases
|
||
|
|
|
||
|
|
## Phase 0: Foundation (1-2 weeks)
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
1. Deterministic dev and test environment.
|
||
|
|
2. Baseline lint/type/test commands working in CI and local.
|
||
|
|
3. Standardized log format across frontend, backend, and Tauri host.
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. Stabilize toolchain commands:
|
||
|
|
- frontend lint/typecheck/test
|
||
|
|
- backend lint/typecheck/test
|
||
|
|
- workspace e2e smoke command
|
||
|
|
2. Add a single script for local validation, for example `npm run validate:all`.
|
||
|
|
3. Introduce structured logging fields:
|
||
|
|
- timestamp
|
||
|
|
- request/job id
|
||
|
|
- subsystem (frontend/backend/host)
|
||
|
|
- error code
|
||
|
|
4. Add reproducible media fixtures for tests under a dedicated test-fixtures path.
|
||
|
|
|
||
|
|
### Exit Criteria
|
||
|
|
|
||
|
|
- Fresh clone can run validation with one command.
|
||
|
|
- CI produces deterministic pass/fail on clean branches.
|
||
|
|
- Failures include enough context to reproduce without manual guessing.
|
||
|
|
|
||
|
|
## Phase 1: Spec + Test Contracts (2-4 weeks)
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
1. Feature spec template used for all new work.
|
||
|
|
2. API and schema contracts versioned and validated.
|
||
|
|
3. Regression harness for previous bugs.
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. Create `docs/spec-template.md` with required sections:
|
||
|
|
- user story
|
||
|
|
- acceptance criteria
|
||
|
|
- non-goals
|
||
|
|
- edge cases
|
||
|
|
- rollback behavior
|
||
|
|
2. Add contract tests for backend routers:
|
||
|
|
- transcribe
|
||
|
|
- export
|
||
|
|
- captions
|
||
|
|
- audio
|
||
|
|
3. Add project schema validation tests for `shared/project-schema.json` and project load/save behavior.
|
||
|
|
4. For each resolved bug, add a regression test before closing issue.
|
||
|
|
|
||
|
|
### Exit Criteria
|
||
|
|
|
||
|
|
- New feature PRs must include spec and tests.
|
||
|
|
- Breaking contract changes are detected automatically in CI.
|
||
|
|
|
||
|
|
## Phase 2: Observability and Self-Debugging (2-3 weeks)
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
1. Unified diagnostics bundle command.
|
||
|
|
2. AI-readable failure artifacts from CI and local runs.
|
||
|
|
3. Error taxonomy and runbook mapping.
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. Implement diagnostics command to collect:
|
||
|
|
- frontend logs
|
||
|
|
- backend logs
|
||
|
|
- Tauri logs
|
||
|
|
- failing test outputs
|
||
|
|
- environment metadata
|
||
|
|
2. Define error codes for common classes:
|
||
|
|
- media decode
|
||
|
|
- FFmpeg pipeline
|
||
|
|
- transcription model
|
||
|
|
- project load/save
|
||
|
|
- network/IPC bridge
|
||
|
|
3. Add runbook table mapping error codes to probable causes and first fixes.
|
||
|
|
|
||
|
|
### Exit Criteria
|
||
|
|
|
||
|
|
- Agent can identify likely root cause from artifacts without asking for manual logs.
|
||
|
|
- 80%+ of recurring failures map to known error classes.
|
||
|
|
|
||
|
|
## Phase 3: Controlled Autonomous Implementation (3-5 weeks)
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
1. Policy file defining what AI can edit/run without approval.
|
||
|
|
2. Autonomous task loop for implement -> validate -> fix -> revalidate.
|
||
|
|
3. Automatic PR summary with risk and assumptions.
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. Add policy file (for example `docs/ai-policy.md`):
|
||
|
|
- allowed directories for autonomous edits
|
||
|
|
- blocked files requiring approval
|
||
|
|
- blocked commands
|
||
|
|
2. Add task template for AI execution:
|
||
|
|
- parse feature spec
|
||
|
|
- locate impacted modules
|
||
|
|
- implement smallest changes
|
||
|
|
- run validation suite
|
||
|
|
- retry up to N fix cycles
|
||
|
|
- produce summary + residual risks
|
||
|
|
3. Require AI to update:
|
||
|
|
- copilot instructions
|
||
|
|
- changelog/roadmap note
|
||
|
|
- regression tests when bugfixing
|
||
|
|
|
||
|
|
### Exit Criteria
|
||
|
|
|
||
|
|
- Low-risk feature tasks complete end-to-end without human intervention.
|
||
|
|
- CI gate pass rate for autonomous PRs remains above agreed threshold (for example 95%).
|
||
|
|
|
||
|
|
## Phase 4: High-Autonomy with Human Escalation (ongoing)
|
||
|
|
|
||
|
|
### Deliverables
|
||
|
|
|
||
|
|
1. Explicit escalation triggers for ambiguity and risk.
|
||
|
|
2. Broader autonomous scope with mandatory gates.
|
||
|
|
3. Drift monitoring for quality, velocity, and regressions.
|
||
|
|
|
||
|
|
### Tasks
|
||
|
|
|
||
|
|
1. Define escalation triggers:
|
||
|
|
- user-visible behavior changes without clear spec
|
||
|
|
- API/schema breakage
|
||
|
|
- security-sensitive modifications
|
||
|
|
- destructive migrations
|
||
|
|
2. Add quality dashboards:
|
||
|
|
- flaky tests
|
||
|
|
- escaped defects
|
||
|
|
- mean time to recovery
|
||
|
|
- autonomous task success rate
|
||
|
|
3. Monthly calibration:
|
||
|
|
- adjust autonomy scope
|
||
|
|
- update policies
|
||
|
|
- prune stale runbooks and memories
|
||
|
|
|
||
|
|
### Exit Criteria
|
||
|
|
|
||
|
|
- Autonomous throughput increases while defect rate stays stable or improves.
|
||
|
|
- Human review focuses on strategy and product decisions, not routine implementation/debugging.
|
||
|
|
|
||
|
|
## Required Engineering Systems
|
||
|
|
|
||
|
|
## 1. Spec System
|
||
|
|
|
||
|
|
Minimum implementation:
|
||
|
|
|
||
|
|
1. `docs/spec-template.md`
|
||
|
|
2. `docs/specs/` folder with one file per feature
|
||
|
|
3. CI check that new feature PRs include a spec reference
|
||
|
|
|
||
|
|
## 2. Test System
|
||
|
|
|
||
|
|
Minimum implementation:
|
||
|
|
|
||
|
|
1. Frontend unit tests for stores/components/hook logic.
|
||
|
|
2. Backend unit+integration tests for routers/services.
|
||
|
|
3. E2E smoke tests for core workflow:
|
||
|
|
- open media
|
||
|
|
- transcribe
|
||
|
|
- edit zones
|
||
|
|
- export
|
||
|
|
4. Regression tests required for every bugfix.
|
||
|
|
|
||
|
|
## 3. Environment System
|
||
|
|
|
||
|
|
Minimum implementation:
|
||
|
|
|
||
|
|
1. Locked dependencies and pinned runtimes.
|
||
|
|
2. Single bootstrap script.
|
||
|
|
3. Fixture media files for deterministic test runs.
|
||
|
|
|
||
|
|
## 4. Observability System
|
||
|
|
|
||
|
|
Minimum implementation:
|
||
|
|
|
||
|
|
1. Structured logs.
|
||
|
|
2. Standard error codes.
|
||
|
|
3. Diagnostics bundle command.
|
||
|
|
4. CI artifact retention for failed runs.
|
||
|
|
|
||
|
|
## 5. Governance System
|
||
|
|
|
||
|
|
Minimum implementation:
|
||
|
|
|
||
|
|
1. Protected branch + required checks.
|
||
|
|
2. Secret and dependency scanning.
|
||
|
|
3. Policy-based approval requirements for high-risk changes.
|
||
|
|
|
||
|
|
## Suggested Repository Additions
|
||
|
|
|
||
|
|
1. `AI_dev_plan.md` (this file)
|
||
|
|
2. `docs/spec-template.md`
|
||
|
|
3. `docs/ai-policy.md`
|
||
|
|
4. `docs/runbooks/error-codes.md`
|
||
|
|
5. `docs/runbooks/debug-playbooks.md`
|
||
|
|
6. `scripts/validate-all.sh`
|
||
|
|
7. `scripts/collect-diagnostics.sh`
|
||
|
|
|
||
|
|
## Definition of Done for Autonomous Tasks
|
||
|
|
|
||
|
|
A task is complete only if all items pass:
|
||
|
|
|
||
|
|
1. Feature spec acceptance criteria satisfied.
|
||
|
|
2. Relevant tests added/updated and passing.
|
||
|
|
3. No lint/type errors in changed scope.
|
||
|
|
4. Docs and instructions updated if behavior changed.
|
||
|
|
5. Risk summary and assumptions recorded.
|
||
|
|
|
||
|
|
## Escalation Rules (Must Ask Human)
|
||
|
|
|
||
|
|
AI must stop and ask when:
|
||
|
|
|
||
|
|
1. Requirement ambiguity changes user-visible behavior.
|
||
|
|
2. Multiple valid product decisions exist without clear preference.
|
||
|
|
3. Security/privacy/compliance implications are uncertain.
|
||
|
|
4. Data loss or destructive migration is possible.
|
||
|
|
5. CI remains failing after bounded auto-fix attempts.
|
||
|
|
|
||
|
|
## Metrics to Track
|
||
|
|
|
||
|
|
1. Autonomous task success rate.
|
||
|
|
2. Reopen rate of AI-completed tasks.
|
||
|
|
3. Regression rate per release.
|
||
|
|
4. Flaky test percentage.
|
||
|
|
5. Mean time to diagnose and resolve failures.
|
||
|
|
|
||
|
|
## 30-Day Execution Plan
|
||
|
|
|
||
|
|
Week 1:
|
||
|
|
|
||
|
|
1. Baseline scripts and deterministic environment.
|
||
|
|
2. Restore lint/test commands to green status.
|
||
|
|
3. Add structured logging and IDs.
|
||
|
|
|
||
|
|
Week 2:
|
||
|
|
|
||
|
|
1. Spec template and mandatory spec policy.
|
||
|
|
2. Contract tests for core backend routes.
|
||
|
|
3. First diagnostics bundle version.
|
||
|
|
|
||
|
|
Week 3:
|
||
|
|
|
||
|
|
1. AI policy and bounded autonomous edit/run loop.
|
||
|
|
2. Regression-test-first bugfix workflow.
|
||
|
|
3. CI artifact enrichment and runbook mapping.
|
||
|
|
|
||
|
|
Week 4:
|
||
|
|
|
||
|
|
1. Pilot autonomous feature tasks in low-risk areas.
|
||
|
|
2. Measure success/failure patterns.
|
||
|
|
3. Expand scope only if quality gates hold.
|
||
|
|
|
||
|
|
## Notes for TalkEdit
|
||
|
|
|
||
|
|
1. Keep router files thin and service logic isolated to improve AI edit precision.
|
||
|
|
2. Preserve compatibility in desktop bridge contracts to avoid frontend breakage.
|
||
|
|
3. Treat export/transcription pipeline changes as high-risk and always require regression tests.
|
||
|
|
4. Keep Linux WebKit startup and media URL consistency as explicit regression targets.
|