ai tools finished
This commit is contained in:
356
AI_dev_plan.md
356
AI_dev_plan.md
@ -1,327 +1,129 @@
|
||||
# AI Dev Roadmap
|
||||
# AI Dev Plan (Must-Haves Only)
|
||||
|
||||
## Purpose
|
||||
|
||||
This document defines how TalkEdit can evolve toward highly autonomous AI-driven implementation and debugging.
|
||||
This is the minimum implementation needed for AI to reliably build, test, and debug TalkEdit with high confidence.
|
||||
|
||||
Goal: AI can execute most engineering work end-to-end with minimal human feedback while preserving safety, quality, and product intent.
|
||||
Target: reliable 80-90% autonomous implementation/debugging on scoped tasks.
|
||||
|
||||
## Scope
|
||||
## Must-Have Pillars
|
||||
|
||||
- Frontend: React + TypeScript + Vite
|
||||
- Desktop host: Tauri
|
||||
- Backend: FastAPI + Python services
|
||||
- Media pipeline: FFmpeg, transcription, audio processing
|
||||
## 1. Single Validation Command
|
||||
|
||||
## Autonomy Target
|
||||
Required:
|
||||
|
||||
- Near-term target: 80-90% autonomous execution for well-scoped work.
|
||||
- Mid-term target: 90-95% for low/medium-risk features with CI gates.
|
||||
- 100% no-feedback autonomy is not realistic for ambiguous product decisions, legal/security tradeoffs, or high-risk migrations.
|
||||
1. One command that runs lint, build, backend tests, and smoke checks.
|
||||
2. Works locally and in CI.
|
||||
|
||||
## Core Principles
|
||||
Current status:
|
||||
|
||||
1. Specs are executable and machine-readable.
|
||||
2. Tests are the primary source of truth for completion.
|
||||
3. Every failure is diagnosable from logs/artifacts.
|
||||
4. AI has bounded permissions and policy guardrails.
|
||||
5. AI updates docs and memory as part of done criteria.
|
||||
1. Implemented via scripts/validate-all.sh.
|
||||
2. Enforced in CI via .github/workflows/validate-all.yml.
|
||||
|
||||
## Execution Status (2026-04-15)
|
||||
## 2. CI Quality Gate
|
||||
|
||||
### Completed
|
||||
Required:
|
||||
|
||||
1. Added roadmap companion docs:
|
||||
- `docs/spec-template.md`
|
||||
- `docs/ai-policy.md`
|
||||
- `docs/runbooks/error-codes.md`
|
||||
2. Added operational scripts:
|
||||
- `scripts/validate-all.sh`
|
||||
- `scripts/collect-diagnostics.sh`
|
||||
3. Ran Step 1 validation script (`./scripts/validate-all.sh`).
|
||||
4. Ran Step 2 diagnostics script (`./scripts/collect-diagnostics.sh`).
|
||||
5. Captured diagnostics archive:
|
||||
- `.diagnostics/diag_20260415_163239.tar.gz`
|
||||
6. Renamed roadmap file to `AI_dev_plan.md`.
|
||||
1. Pull requests fail if validation fails.
|
||||
2. Failures produce diagnostics artifacts.
|
||||
|
||||
### Current Blockers
|
||||
Current status:
|
||||
|
||||
1. Frontend lint baseline is not green yet.
|
||||
2. Remaining lint issues are mostly pre-existing unused vars and hook dependency warnings across app components.
|
||||
1. Implemented in .github/workflows/validate-all.yml.
|
||||
2. Diagnostics collected by scripts/collect-diagnostics.sh on failure.
|
||||
|
||||
### Next Actions
|
||||
## 3. Spec Requirement for Feature Changes
|
||||
|
||||
1. Triage existing lint findings into:
|
||||
- safe autofix
|
||||
- manual low-risk cleanup
|
||||
- intentional warnings to suppress with justification
|
||||
2. Reach green `./scripts/validate-all.sh` in local dev.
|
||||
3. Add CI workflow to enforce `validate-all` on pull requests.
|
||||
Required:
|
||||
|
||||
## Roadmap Phases
|
||||
1. Feature code changes must include a spec file update.
|
||||
2. Spec format must be standardized.
|
||||
|
||||
## Phase 0: Foundation (1-2 weeks)
|
||||
Current status:
|
||||
|
||||
### Deliverables
|
||||
1. Implemented via scripts/check-feature-spec.sh.
|
||||
2. Spec template exists at docs/spec-template.md.
|
||||
3. Specs folder guidance exists at docs/specs/README.md.
|
||||
|
||||
1. Deterministic dev and test environment.
|
||||
2. Baseline lint/type/test commands working in CI and local.
|
||||
3. Standardized log format across frontend, backend, and Tauri host.
|
||||
## 4. Backend Contract Test Coverage
|
||||
|
||||
### Tasks
|
||||
|
||||
1. Stabilize toolchain commands:
|
||||
- frontend lint/typecheck/test
|
||||
- backend lint/typecheck/test
|
||||
- workspace e2e smoke command
|
||||
2. Add a single script for local validation, for example `npm run validate:all`.
|
||||
3. Introduce structured logging fields:
|
||||
- timestamp
|
||||
- request/job id
|
||||
- subsystem (frontend/backend/host)
|
||||
- error code
|
||||
4. Add reproducible media fixtures for tests under a dedicated test-fixtures path.
|
||||
Required:
|
||||
|
||||
### Exit Criteria
|
||||
1. Router-level contract tests for success and error paths.
|
||||
2. Tests are deterministic and mock heavy services.
|
||||
|
||||
- Fresh clone can run validation with one command.
|
||||
- CI produces deterministic pass/fail on clean branches.
|
||||
- Failures include enough context to reproduce without manual guessing.
|
||||
Current status:
|
||||
|
||||
## Phase 1: Spec + Test Contracts (2-4 weeks)
|
||||
1. Implemented in backend/tests/test_router_contracts.py.
|
||||
2. Cache utility baseline tests implemented in backend/tests/test_cache_utils.py.
|
||||
|
||||
### Deliverables
|
||||
## 5. Error-Tolerant Router Contracts
|
||||
|
||||
1. Feature spec template used for all new work.
|
||||
2. API and schema contracts versioned and validated.
|
||||
3. Regression harness for previous bugs.
|
||||
Required:
|
||||
|
||||
### Tasks
|
||||
|
||||
1. Create `docs/spec-template.md` with required sections:
|
||||
- user story
|
||||
- acceptance criteria
|
||||
- non-goals
|
||||
- edge cases
|
||||
- rollback behavior
|
||||
2. Add contract tests for backend routers:
|
||||
- transcribe
|
||||
- export
|
||||
- captions
|
||||
- audio
|
||||
3. Add project schema validation tests for `shared/project-schema.json` and project load/save behavior.
|
||||
4. For each resolved bug, add a regression test before closing issue.
|
||||
1. Expected client errors must remain 4xx.
|
||||
2. Server failures must return 5xx with useful detail.
|
||||
|
||||
### Exit Criteria
|
||||
Current status:
|
||||
|
||||
- New feature PRs must include spec and tests.
|
||||
- Breaking contract changes are detected automatically in CI.
|
||||
1. Implemented for captions/export HTTPException passthrough.
|
||||
2. Covered by contract tests.
|
||||
|
||||
## Phase 2: Observability and Self-Debugging (2-3 weeks)
|
||||
## 6. Basic Autonomy Policy
|
||||
|
||||
### Deliverables
|
||||
Required:
|
||||
|
||||
1. Unified diagnostics bundle command.
|
||||
2. AI-readable failure artifacts from CI and local runs.
|
||||
3. Error taxonomy and runbook mapping.
|
||||
1. Clear autonomous scope and escalation rules.
|
||||
2. Clear restrictions for high-risk changes.
|
||||
|
||||
### Tasks
|
||||
Current status:
|
||||
|
||||
1. Implement diagnostics command to collect:
|
||||
- frontend logs
|
||||
- backend logs
|
||||
- Tauri logs
|
||||
- failing test outputs
|
||||
- environment metadata
|
||||
2. Define error codes for common classes:
|
||||
- media decode
|
||||
- FFmpeg pipeline
|
||||
- transcription model
|
||||
- project load/save
|
||||
- network/IPC bridge
|
||||
3. Add runbook table mapping error codes to probable causes and first fixes.
|
||||
1. Implemented in docs/ai-policy.md.
|
||||
|
||||
### Exit Criteria
|
||||
## Must-Have Remaining Work
|
||||
|
||||
- Agent can identify likely root cause from artifacts without asking for manual logs.
|
||||
- 80%+ of recurring failures map to known error classes.
|
||||
No remaining must-have items.
|
||||
|
||||
## Phase 3: Controlled Autonomous Implementation (3-5 weeks)
|
||||
Completed in this pass:
|
||||
|
||||
### Deliverables
|
||||
1. Added lightweight frontend tests and integrated them into scripts/validate-all.sh.
|
||||
2. Added pull request template with required spec link and acceptance criteria checklist.
|
||||
3. Added endpoint-level contract assertions for /file range requests and /audio/waveform cache-hit/cache-miss behavior.
|
||||
4. Confirmed scripts/validate-all.sh passes end-to-end with frontend tests + expanded backend contracts.
|
||||
|
||||
1. Policy file defining what AI can edit/run without approval.
|
||||
2. Autonomous task loop for implement -> validate -> fix -> revalidate.
|
||||
3. Automatic PR summary with risk and assumptions.
|
||||
## Out of Scope for Must-Have Baseline
|
||||
|
||||
### Tasks
|
||||
Useful later, but not required for strong day-to-day autonomous implementation:
|
||||
|
||||
1. Add policy file (for example `docs/ai-policy.md`):
|
||||
- allowed directories for autonomous edits
|
||||
- blocked files requiring approval
|
||||
- blocked commands
|
||||
2. Add task template for AI execution:
|
||||
- parse feature spec
|
||||
- locate impacted modules
|
||||
- implement smallest changes
|
||||
- run validation suite
|
||||
- retry up to N fix cycles
|
||||
- produce summary + residual risks
|
||||
3. Require AI to update:
|
||||
- copilot instructions
|
||||
- changelog/roadmap note
|
||||
- regression tests when bugfixing
|
||||
1. Full quality dashboards.
|
||||
2. Advanced autonomy telemetry.
|
||||
3. Complete long-term governance expansion.
|
||||
4. High-autonomy optimization beyond 90% reliability target.
|
||||
|
||||
### Exit Criteria
|
||||
## Definition of Done (Must-Have Plan)
|
||||
|
||||
- Low-risk feature tasks complete end-to-end without human intervention.
|
||||
- CI gate pass rate for autonomous PRs remains above agreed threshold (for example 95%).
|
||||
Must-have plan is complete when all are true:
|
||||
|
||||
## Phase 4: High-Autonomy with Human Escalation (ongoing)
|
||||
1. scripts/validate-all.sh passes locally and in CI.
|
||||
2. Feature PRs without spec updates are blocked.
|
||||
3. Backend router contracts cover core success and error paths.
|
||||
4. Frontend has at least one stable test command integrated into validation.
|
||||
5. AI policy + diagnostics workflow are active.
|
||||
|
||||
### Deliverables
|
||||
## Current State Summary
|
||||
|
||||
1. Explicit escalation triggers for ambiguity and risk.
|
||||
2. Broader autonomous scope with mandatory gates.
|
||||
3. Drift monitoring for quality, velocity, and regressions.
|
||||
Completed:
|
||||
|
||||
### Tasks
|
||||
1. Validation and CI enforcement.
|
||||
2. Diagnostics capture.
|
||||
3. Spec policy and templates.
|
||||
4. Backend contract test foundation (including AI endpoints).
|
||||
5. Core router error-path correctness.
|
||||
6. Autonomy policy baseline.
|
||||
7. Frontend test command integrated into validation.
|
||||
8. PR template requirement added.
|
||||
9. /file and /audio/waveform contract assertions implemented.
|
||||
|
||||
1. Define escalation triggers:
|
||||
- user-visible behavior changes without clear spec
|
||||
- API/schema breakage
|
||||
- security-sensitive modifications
|
||||
- destructive migrations
|
||||
2. Add quality dashboards:
|
||||
- flaky tests
|
||||
- escaped defects
|
||||
- mean time to recovery
|
||||
- autonomous task success rate
|
||||
3. Monthly calibration:
|
||||
- adjust autonomy scope
|
||||
- update policies
|
||||
- prune stale runbooks and memories
|
||||
Remaining:
|
||||
|
||||
### Exit Criteria
|
||||
|
||||
- Autonomous throughput increases while defect rate stays stable or improves.
|
||||
- Human review focuses on strategy and product decisions, not routine implementation/debugging.
|
||||
|
||||
## Required Engineering Systems
|
||||
|
||||
## 1. Spec System
|
||||
|
||||
Minimum implementation:
|
||||
|
||||
1. `docs/spec-template.md`
|
||||
2. `docs/specs/` folder with one file per feature
|
||||
3. CI check that new feature PRs include a spec reference
|
||||
|
||||
## 2. Test System
|
||||
|
||||
Minimum implementation:
|
||||
|
||||
1. Frontend unit tests for stores/components/hook logic.
|
||||
2. Backend unit+integration tests for routers/services.
|
||||
3. E2E smoke tests for core workflow:
|
||||
- open media
|
||||
- transcribe
|
||||
- edit zones
|
||||
- export
|
||||
4. Regression tests required for every bugfix.
|
||||
|
||||
## 3. Environment System
|
||||
|
||||
Minimum implementation:
|
||||
|
||||
1. Locked dependencies and pinned runtimes.
|
||||
2. Single bootstrap script.
|
||||
3. Fixture media files for deterministic test runs.
|
||||
|
||||
## 4. Observability System
|
||||
|
||||
Minimum implementation:
|
||||
|
||||
1. Structured logs.
|
||||
2. Standard error codes.
|
||||
3. Diagnostics bundle command.
|
||||
4. CI artifact retention for failed runs.
|
||||
|
||||
## 5. Governance System
|
||||
|
||||
Minimum implementation:
|
||||
|
||||
1. Protected branch + required checks.
|
||||
2. Secret and dependency scanning.
|
||||
3. Policy-based approval requirements for high-risk changes.
|
||||
|
||||
## Suggested Repository Additions
|
||||
|
||||
1. `AI_dev_plan.md` (this file)
|
||||
2. `docs/spec-template.md`
|
||||
3. `docs/ai-policy.md`
|
||||
4. `docs/runbooks/error-codes.md`
|
||||
5. `docs/runbooks/debug-playbooks.md`
|
||||
6. `scripts/validate-all.sh`
|
||||
7. `scripts/collect-diagnostics.sh`
|
||||
|
||||
## Definition of Done for Autonomous Tasks
|
||||
|
||||
A task is complete only if all items pass:
|
||||
|
||||
1. Feature spec acceptance criteria satisfied.
|
||||
2. Relevant tests added/updated and passing.
|
||||
3. No lint/type errors in changed scope.
|
||||
4. Docs and instructions updated if behavior changed.
|
||||
5. Risk summary and assumptions recorded.
|
||||
|
||||
## Escalation Rules (Must Ask Human)
|
||||
|
||||
AI must stop and ask when:
|
||||
|
||||
1. Requirement ambiguity changes user-visible behavior.
|
||||
2. Multiple valid product decisions exist without clear preference.
|
||||
3. Security/privacy/compliance implications are uncertain.
|
||||
4. Data loss or destructive migration is possible.
|
||||
5. CI remains failing after bounded auto-fix attempts.
|
||||
|
||||
## Metrics to Track
|
||||
|
||||
1. Autonomous task success rate.
|
||||
2. Reopen rate of AI-completed tasks.
|
||||
3. Regression rate per release.
|
||||
4. Flaky test percentage.
|
||||
5. Mean time to diagnose and resolve failures.
|
||||
|
||||
## 30-Day Execution Plan
|
||||
|
||||
Week 1:
|
||||
|
||||
1. Baseline scripts and deterministic environment.
|
||||
2. Restore lint/test commands to green status.
|
||||
3. Add structured logging and IDs.
|
||||
|
||||
Week 2:
|
||||
|
||||
1. Spec template and mandatory spec policy.
|
||||
2. Contract tests for core backend routes.
|
||||
3. First diagnostics bundle version.
|
||||
|
||||
Week 3:
|
||||
|
||||
1. AI policy and bounded autonomous edit/run loop.
|
||||
2. Regression-test-first bugfix workflow.
|
||||
3. CI artifact enrichment and runbook mapping.
|
||||
|
||||
Week 4:
|
||||
|
||||
1. Pilot autonomous feature tasks in low-risk areas.
|
||||
2. Measure success/failure patterns.
|
||||
3. Expand scope only if quality gates hold.
|
||||
|
||||
## Notes for TalkEdit
|
||||
|
||||
1. Keep router files thin and service logic isolated to improve AI edit precision.
|
||||
2. Preserve compatibility in desktop bridge contracts to avoid frontend breakage.
|
||||
3. Treat export/transcription pipeline changes as high-risk and always require regression tests.
|
||||
4. Keep Linux WebKit startup and media URL consistency as explicit regression targets.
|
||||
1. No must-have items remaining.
|
||||
|
||||
Reference in New Issue
Block a user