ai tools finished

This commit is contained in:
2026-04-15 17:13:56 -06:00
parent d11e26cf2d
commit 024b9bd806
17 changed files with 566 additions and 328 deletions

View File

@ -1,327 +1,129 @@
# AI Dev Roadmap
# AI Dev Plan (Must-Haves Only)
## Purpose
This document defines how TalkEdit can evolve toward highly autonomous AI-driven implementation and debugging.
This is the minimum implementation needed for AI to reliably build, test, and debug TalkEdit with high confidence.
Goal: AI can execute most engineering work end-to-end with minimal human feedback while preserving safety, quality, and product intent.
Target: reliable 80-90% autonomous implementation/debugging on scoped tasks.
## Scope
## Must-Have Pillars
- Frontend: React + TypeScript + Vite
- Desktop host: Tauri
- Backend: FastAPI + Python services
- Media pipeline: FFmpeg, transcription, audio processing
## 1. Single Validation Command
## Autonomy Target
Required:
- Near-term target: 80-90% autonomous execution for well-scoped work.
- Mid-term target: 90-95% for low/medium-risk features with CI gates.
- 100% no-feedback autonomy is not realistic for ambiguous product decisions, legal/security tradeoffs, or high-risk migrations.
1. One command that runs lint, build, backend tests, and smoke checks.
2. Works locally and in CI.
## Core Principles
Current status:
1. Specs are executable and machine-readable.
2. Tests are the primary source of truth for completion.
3. Every failure is diagnosable from logs/artifacts.
4. AI has bounded permissions and policy guardrails.
5. AI updates docs and memory as part of done criteria.
1. Implemented via scripts/validate-all.sh.
2. Enforced in CI via .github/workflows/validate-all.yml.
## Execution Status (2026-04-15)
## 2. CI Quality Gate
### Completed
Required:
1. Added roadmap companion docs:
- `docs/spec-template.md`
- `docs/ai-policy.md`
- `docs/runbooks/error-codes.md`
2. Added operational scripts:
- `scripts/validate-all.sh`
- `scripts/collect-diagnostics.sh`
3. Ran Step 1 validation script (`./scripts/validate-all.sh`).
4. Ran Step 2 diagnostics script (`./scripts/collect-diagnostics.sh`).
5. Captured diagnostics archive:
- `.diagnostics/diag_20260415_163239.tar.gz`
6. Renamed roadmap file to `AI_dev_plan.md`.
1. Pull requests fail if validation fails.
2. Failures produce diagnostics artifacts.
### Current Blockers
Current status:
1. Frontend lint baseline is not green yet.
2. Remaining lint issues are mostly pre-existing unused vars and hook dependency warnings across app components.
1. Implemented in .github/workflows/validate-all.yml.
2. Diagnostics collected by scripts/collect-diagnostics.sh on failure.
### Next Actions
## 3. Spec Requirement for Feature Changes
1. Triage existing lint findings into:
- safe autofix
- manual low-risk cleanup
- intentional warnings to suppress with justification
2. Reach green `./scripts/validate-all.sh` in local dev.
3. Add CI workflow to enforce `validate-all` on pull requests.
Required:
## Roadmap Phases
1. Feature code changes must include a spec file update.
2. Spec format must be standardized.
## Phase 0: Foundation (1-2 weeks)
Current status:
### Deliverables
1. Implemented via scripts/check-feature-spec.sh.
2. Spec template exists at docs/spec-template.md.
3. Specs folder guidance exists at docs/specs/README.md.
1. Deterministic dev and test environment.
2. Baseline lint/type/test commands working in CI and local.
3. Standardized log format across frontend, backend, and Tauri host.
## 4. Backend Contract Test Coverage
### Tasks
1. Stabilize toolchain commands:
- frontend lint/typecheck/test
- backend lint/typecheck/test
- workspace e2e smoke command
2. Add a single script for local validation, for example `npm run validate:all`.
3. Introduce structured logging fields:
- timestamp
- request/job id
- subsystem (frontend/backend/host)
- error code
4. Add reproducible media fixtures for tests under a dedicated test-fixtures path.
Required:
### Exit Criteria
1. Router-level contract tests for success and error paths.
2. Tests are deterministic and mock heavy services.
- Fresh clone can run validation with one command.
- CI produces deterministic pass/fail on clean branches.
- Failures include enough context to reproduce without manual guessing.
Current status:
## Phase 1: Spec + Test Contracts (2-4 weeks)
1. Implemented in backend/tests/test_router_contracts.py.
2. Cache utility baseline tests implemented in backend/tests/test_cache_utils.py.
### Deliverables
## 5. Error-Tolerant Router Contracts
1. Feature spec template used for all new work.
2. API and schema contracts versioned and validated.
3. Regression harness for previous bugs.
Required:
### Tasks
1. Create `docs/spec-template.md` with required sections:
- user story
- acceptance criteria
- non-goals
- edge cases
- rollback behavior
2. Add contract tests for backend routers:
- transcribe
- export
- captions
- audio
3. Add project schema validation tests for `shared/project-schema.json` and project load/save behavior.
4. For each resolved bug, add a regression test before closing issue.
1. Expected client errors must remain 4xx.
2. Server failures must return 5xx with useful detail.
### Exit Criteria
Current status:
- New feature PRs must include spec and tests.
- Breaking contract changes are detected automatically in CI.
1. Implemented for captions/export HTTPException passthrough.
2. Covered by contract tests.
## Phase 2: Observability and Self-Debugging (2-3 weeks)
## 6. Basic Autonomy Policy
### Deliverables
Required:
1. Unified diagnostics bundle command.
2. AI-readable failure artifacts from CI and local runs.
3. Error taxonomy and runbook mapping.
1. Clear autonomous scope and escalation rules.
2. Clear restrictions for high-risk changes.
### Tasks
Current status:
1. Implement diagnostics command to collect:
- frontend logs
- backend logs
- Tauri logs
- failing test outputs
- environment metadata
2. Define error codes for common classes:
- media decode
- FFmpeg pipeline
- transcription model
- project load/save
- network/IPC bridge
3. Add runbook table mapping error codes to probable causes and first fixes.
1. Implemented in docs/ai-policy.md.
### Exit Criteria
## Must-Have Remaining Work
- Agent can identify likely root cause from artifacts without asking for manual logs.
- 80%+ of recurring failures map to known error classes.
No remaining must-have items.
## Phase 3: Controlled Autonomous Implementation (3-5 weeks)
Completed in this pass:
### Deliverables
1. Added lightweight frontend tests and integrated them into scripts/validate-all.sh.
2. Added pull request template with required spec link and acceptance criteria checklist.
3. Added endpoint-level contract assertions for /file range requests and /audio/waveform cache-hit/cache-miss behavior.
4. Confirmed scripts/validate-all.sh passes end-to-end with frontend tests + expanded backend contracts.
1. Policy file defining what AI can edit/run without approval.
2. Autonomous task loop for implement -> validate -> fix -> revalidate.
3. Automatic PR summary with risk and assumptions.
## Out of Scope for Must-Have Baseline
### Tasks
Useful later, but not required for strong day-to-day autonomous implementation:
1. Add policy file (for example `docs/ai-policy.md`):
- allowed directories for autonomous edits
- blocked files requiring approval
- blocked commands
2. Add task template for AI execution:
- parse feature spec
- locate impacted modules
- implement smallest changes
- run validation suite
- retry up to N fix cycles
- produce summary + residual risks
3. Require AI to update:
- copilot instructions
- changelog/roadmap note
- regression tests when bugfixing
1. Full quality dashboards.
2. Advanced autonomy telemetry.
3. Complete long-term governance expansion.
4. High-autonomy optimization beyond 90% reliability target.
### Exit Criteria
## Definition of Done (Must-Have Plan)
- Low-risk feature tasks complete end-to-end without human intervention.
- CI gate pass rate for autonomous PRs remains above agreed threshold (for example 95%).
Must-have plan is complete when all are true:
## Phase 4: High-Autonomy with Human Escalation (ongoing)
1. scripts/validate-all.sh passes locally and in CI.
2. Feature PRs without spec updates are blocked.
3. Backend router contracts cover core success and error paths.
4. Frontend has at least one stable test command integrated into validation.
5. AI policy + diagnostics workflow are active.
### Deliverables
## Current State Summary
1. Explicit escalation triggers for ambiguity and risk.
2. Broader autonomous scope with mandatory gates.
3. Drift monitoring for quality, velocity, and regressions.
Completed:
### Tasks
1. Validation and CI enforcement.
2. Diagnostics capture.
3. Spec policy and templates.
4. Backend contract test foundation (including AI endpoints).
5. Core router error-path correctness.
6. Autonomy policy baseline.
7. Frontend test command integrated into validation.
8. PR template requirement added.
9. /file and /audio/waveform contract assertions implemented.
1. Define escalation triggers:
- user-visible behavior changes without clear spec
- API/schema breakage
- security-sensitive modifications
- destructive migrations
2. Add quality dashboards:
- flaky tests
- escaped defects
- mean time to recovery
- autonomous task success rate
3. Monthly calibration:
- adjust autonomy scope
- update policies
- prune stale runbooks and memories
Remaining:
### Exit Criteria
- Autonomous throughput increases while defect rate stays stable or improves.
- Human review focuses on strategy and product decisions, not routine implementation/debugging.
## Required Engineering Systems
## 1. Spec System
Minimum implementation:
1. `docs/spec-template.md`
2. `docs/specs/` folder with one file per feature
3. CI check that new feature PRs include a spec reference
## 2. Test System
Minimum implementation:
1. Frontend unit tests for stores/components/hook logic.
2. Backend unit+integration tests for routers/services.
3. E2E smoke tests for core workflow:
- open media
- transcribe
- edit zones
- export
4. Regression tests required for every bugfix.
## 3. Environment System
Minimum implementation:
1. Locked dependencies and pinned runtimes.
2. Single bootstrap script.
3. Fixture media files for deterministic test runs.
## 4. Observability System
Minimum implementation:
1. Structured logs.
2. Standard error codes.
3. Diagnostics bundle command.
4. CI artifact retention for failed runs.
## 5. Governance System
Minimum implementation:
1. Protected branch + required checks.
2. Secret and dependency scanning.
3. Policy-based approval requirements for high-risk changes.
## Suggested Repository Additions
1. `AI_dev_plan.md` (this file)
2. `docs/spec-template.md`
3. `docs/ai-policy.md`
4. `docs/runbooks/error-codes.md`
5. `docs/runbooks/debug-playbooks.md`
6. `scripts/validate-all.sh`
7. `scripts/collect-diagnostics.sh`
## Definition of Done for Autonomous Tasks
A task is complete only if all items pass:
1. Feature spec acceptance criteria satisfied.
2. Relevant tests added/updated and passing.
3. No lint/type errors in changed scope.
4. Docs and instructions updated if behavior changed.
5. Risk summary and assumptions recorded.
## Escalation Rules (Must Ask Human)
AI must stop and ask when:
1. Requirement ambiguity changes user-visible behavior.
2. Multiple valid product decisions exist without clear preference.
3. Security/privacy/compliance implications are uncertain.
4. Data loss or destructive migration is possible.
5. CI remains failing after bounded auto-fix attempts.
## Metrics to Track
1. Autonomous task success rate.
2. Reopen rate of AI-completed tasks.
3. Regression rate per release.
4. Flaky test percentage.
5. Mean time to diagnose and resolve failures.
## 30-Day Execution Plan
Week 1:
1. Baseline scripts and deterministic environment.
2. Restore lint/test commands to green status.
3. Add structured logging and IDs.
Week 2:
1. Spec template and mandatory spec policy.
2. Contract tests for core backend routes.
3. First diagnostics bundle version.
Week 3:
1. AI policy and bounded autonomous edit/run loop.
2. Regression-test-first bugfix workflow.
3. CI artifact enrichment and runbook mapping.
Week 4:
1. Pilot autonomous feature tasks in low-risk areas.
2. Measure success/failure patterns.
3. Expand scope only if quality gates hold.
## Notes for TalkEdit
1. Keep router files thin and service logic isolated to improve AI edit precision.
2. Preserve compatibility in desktop bridge contracts to avoid frontend breakage.
3. Treat export/transcription pipeline changes as high-risk and always require regression tests.
4. Keep Linux WebKit startup and media URL consistency as explicit regression targets.
1. No must-have items remaining.