9.2 KiB
9.2 KiB
AI Dev Roadmap
Purpose
This document defines how TalkEdit can evolve toward highly autonomous AI-driven implementation and debugging.
Goal: AI can execute most engineering work end-to-end with minimal human feedback while preserving safety, quality, and product intent.
Scope
- Frontend: React + TypeScript + Vite
- Desktop host: Tauri
- Backend: FastAPI + Python services
- Media pipeline: FFmpeg, transcription, audio processing
Autonomy Target
- Near-term target: 80-90% autonomous execution for well-scoped work.
- Mid-term target: 90-95% for low/medium-risk features with CI gates.
- 100% no-feedback autonomy is not realistic for ambiguous product decisions, legal/security tradeoffs, or high-risk migrations.
Core Principles
- Specs are executable and machine-readable.
- Tests are the primary source of truth for completion.
- Every failure is diagnosable from logs/artifacts.
- AI has bounded permissions and policy guardrails.
- AI updates docs and memory as part of done criteria.
Execution Status (2026-04-15)
Completed
- Added roadmap companion docs:
docs/spec-template.mddocs/ai-policy.mddocs/runbooks/error-codes.md
- Added operational scripts:
scripts/validate-all.shscripts/collect-diagnostics.sh
- Ran Step 1 validation script (
./scripts/validate-all.sh). - Ran Step 2 diagnostics script (
./scripts/collect-diagnostics.sh). - Captured diagnostics archive:
.diagnostics/diag_20260415_163239.tar.gz
- Renamed roadmap file to
AI_dev_plan.md.
Current Blockers
- Frontend lint baseline is not green yet.
- Remaining lint issues are mostly pre-existing unused vars and hook dependency warnings across app components.
Next Actions
- Triage existing lint findings into:
- safe autofix
- manual low-risk cleanup
- intentional warnings to suppress with justification
- Reach green
./scripts/validate-all.shin local dev. - Add CI workflow to enforce
validate-allon pull requests.
Roadmap Phases
Phase 0: Foundation (1-2 weeks)
Deliverables
- Deterministic dev and test environment.
- Baseline lint/type/test commands working in CI and local.
- Standardized log format across frontend, backend, and Tauri host.
Tasks
- Stabilize toolchain commands:
- frontend lint/typecheck/test
- backend lint/typecheck/test
- workspace e2e smoke command
- Add a single script for local validation, for example
npm run validate:all. - Introduce structured logging fields:
- timestamp
- request/job id
- subsystem (frontend/backend/host)
- error code
- Add reproducible media fixtures for tests under a dedicated test-fixtures path.
Exit Criteria
- Fresh clone can run validation with one command.
- CI produces deterministic pass/fail on clean branches.
- Failures include enough context to reproduce without manual guessing.
Phase 1: Spec + Test Contracts (2-4 weeks)
Deliverables
- Feature spec template used for all new work.
- API and schema contracts versioned and validated.
- Regression harness for previous bugs.
Tasks
- Create
docs/spec-template.mdwith required sections:- user story
- acceptance criteria
- non-goals
- edge cases
- rollback behavior
- Add contract tests for backend routers:
- transcribe
- export
- captions
- audio
- Add project schema validation tests for
shared/project-schema.jsonand project load/save behavior. - For each resolved bug, add a regression test before closing issue.
Exit Criteria
- New feature PRs must include spec and tests.
- Breaking contract changes are detected automatically in CI.
Phase 2: Observability and Self-Debugging (2-3 weeks)
Deliverables
- Unified diagnostics bundle command.
- AI-readable failure artifacts from CI and local runs.
- Error taxonomy and runbook mapping.
Tasks
- Implement diagnostics command to collect:
- frontend logs
- backend logs
- Tauri logs
- failing test outputs
- environment metadata
- Define error codes for common classes:
- media decode
- FFmpeg pipeline
- transcription model
- project load/save
- network/IPC bridge
- Add runbook table mapping error codes to probable causes and first fixes.
Exit Criteria
- Agent can identify likely root cause from artifacts without asking for manual logs.
- 80%+ of recurring failures map to known error classes.
Phase 3: Controlled Autonomous Implementation (3-5 weeks)
Deliverables
- Policy file defining what AI can edit/run without approval.
- Autonomous task loop for implement -> validate -> fix -> revalidate.
- Automatic PR summary with risk and assumptions.
Tasks
- Add policy file (for example
docs/ai-policy.md):- allowed directories for autonomous edits
- blocked files requiring approval
- blocked commands
- Add task template for AI execution:
- parse feature spec
- locate impacted modules
- implement smallest changes
- run validation suite
- retry up to N fix cycles
- produce summary + residual risks
- Require AI to update:
- copilot instructions
- changelog/roadmap note
- regression tests when bugfixing
Exit Criteria
- Low-risk feature tasks complete end-to-end without human intervention.
- CI gate pass rate for autonomous PRs remains above agreed threshold (for example 95%).
Phase 4: High-Autonomy with Human Escalation (ongoing)
Deliverables
- Explicit escalation triggers for ambiguity and risk.
- Broader autonomous scope with mandatory gates.
- Drift monitoring for quality, velocity, and regressions.
Tasks
- Define escalation triggers:
- user-visible behavior changes without clear spec
- API/schema breakage
- security-sensitive modifications
- destructive migrations
- Add quality dashboards:
- flaky tests
- escaped defects
- mean time to recovery
- autonomous task success rate
- Monthly calibration:
- adjust autonomy scope
- update policies
- prune stale runbooks and memories
Exit Criteria
- Autonomous throughput increases while defect rate stays stable or improves.
- Human review focuses on strategy and product decisions, not routine implementation/debugging.
Required Engineering Systems
1. Spec System
Minimum implementation:
docs/spec-template.mddocs/specs/folder with one file per feature- CI check that new feature PRs include a spec reference
2. Test System
Minimum implementation:
- Frontend unit tests for stores/components/hook logic.
- Backend unit+integration tests for routers/services.
- E2E smoke tests for core workflow:
- open media
- transcribe
- edit zones
- export
- Regression tests required for every bugfix.
3. Environment System
Minimum implementation:
- Locked dependencies and pinned runtimes.
- Single bootstrap script.
- Fixture media files for deterministic test runs.
4. Observability System
Minimum implementation:
- Structured logs.
- Standard error codes.
- Diagnostics bundle command.
- CI artifact retention for failed runs.
5. Governance System
Minimum implementation:
- Protected branch + required checks.
- Secret and dependency scanning.
- Policy-based approval requirements for high-risk changes.
Suggested Repository Additions
AI_dev_plan.md(this file)docs/spec-template.mddocs/ai-policy.mddocs/runbooks/error-codes.mddocs/runbooks/debug-playbooks.mdscripts/validate-all.shscripts/collect-diagnostics.sh
Definition of Done for Autonomous Tasks
A task is complete only if all items pass:
- Feature spec acceptance criteria satisfied.
- Relevant tests added/updated and passing.
- No lint/type errors in changed scope.
- Docs and instructions updated if behavior changed.
- Risk summary and assumptions recorded.
Escalation Rules (Must Ask Human)
AI must stop and ask when:
- Requirement ambiguity changes user-visible behavior.
- Multiple valid product decisions exist without clear preference.
- Security/privacy/compliance implications are uncertain.
- Data loss or destructive migration is possible.
- CI remains failing after bounded auto-fix attempts.
Metrics to Track
- Autonomous task success rate.
- Reopen rate of AI-completed tasks.
- Regression rate per release.
- Flaky test percentage.
- Mean time to diagnose and resolve failures.
30-Day Execution Plan
Week 1:
- Baseline scripts and deterministic environment.
- Restore lint/test commands to green status.
- Add structured logging and IDs.
Week 2:
- Spec template and mandatory spec policy.
- Contract tests for core backend routes.
- First diagnostics bundle version.
Week 3:
- AI policy and bounded autonomous edit/run loop.
- Regression-test-first bugfix workflow.
- CI artifact enrichment and runbook mapping.
Week 4:
- Pilot autonomous feature tasks in low-risk areas.
- Measure success/failure patterns.
- Expand scope only if quality gates hold.
Notes for TalkEdit
- Keep router files thin and service logic isolated to improve AI edit precision.
- Preserve compatibility in desktop bridge contracts to avoid frontend breakage.
- Treat export/transcription pipeline changes as high-risk and always require regression tests.
- Keep Linux WebKit startup and media URL consistency as explicit regression targets.