ai tools finished

2026-04-15 17:13:56 -06:00
parent d11e26cf2d
commit 024b9bd806
17 changed files with 566 additions and 328 deletions
--- a/AI_dev_plan.md
+++ b/AI_dev_plan.md
@ -1,327 +1,129 @@
-# AI Dev Roadmap
+# AI Dev Plan (Must-Haves Only)

 ## Purpose

-This document defines how TalkEdit can evolve toward highly autonomous AI-driven implementation and debugging.
+This is the minimum implementation needed for AI to reliably build, test, and debug TalkEdit with high confidence.

-Goal: AI can execute most engineering work end-to-end with minimal human feedback while preserving safety, quality, and product intent.
+Target: reliable 80-90% autonomous implementation/debugging on scoped tasks.

-## Scope
+## Must-Have Pillars

- Frontend: React + TypeScript + Vite
- Desktop host: Tauri
- Backend: FastAPI + Python services
- Media pipeline: FFmpeg, transcription, audio processing
+## 1. Single Validation Command

-## Autonomy Target
+Required:

- Near-term target: 80-90% autonomous execution for well-scoped work.
- Mid-term target: 90-95% for low/medium-risk features with CI gates.
- 100% no-feedback autonomy is not realistic for ambiguous product decisions, legal/security tradeoffs, or high-risk migrations.
+1. One command that runs lint, build, backend tests, and smoke checks.
+2. Works locally and in CI.

-## Core Principles
+Current status:

-1. Specs are executable and machine-readable.
-2. Tests are the primary source of truth for completion.
-3. Every failure is diagnosable from logs/artifacts.
-4. AI has bounded permissions and policy guardrails.
-5. AI updates docs and memory as part of done criteria.
+1. Implemented via scripts/validate-all.sh.
+2. Enforced in CI via .github/workflows/validate-all.yml.

-## Execution Status (2026-04-15)
+## 2. CI Quality Gate

-### Completed
+Required:

-1. Added roadmap companion docs:
-   - `docs/spec-template.md`
-   - `docs/ai-policy.md`
-   - `docs/runbooks/error-codes.md`
-2. Added operational scripts:
-   - `scripts/validate-all.sh`
-   - `scripts/collect-diagnostics.sh`
-3. Ran Step 1 validation script (`./scripts/validate-all.sh`).
-4. Ran Step 2 diagnostics script (`./scripts/collect-diagnostics.sh`).
-5. Captured diagnostics archive:
-   - `.diagnostics/diag_20260415_163239.tar.gz`
-6. Renamed roadmap file to `AI_dev_plan.md`.
+1. Pull requests fail if validation fails.
+2. Failures produce diagnostics artifacts.

-### Current Blockers
+Current status:

-1. Frontend lint baseline is not green yet.
-2. Remaining lint issues are mostly pre-existing unused vars and hook dependency warnings across app components.
+1. Implemented in .github/workflows/validate-all.yml.
+2. Diagnostics collected by scripts/collect-diagnostics.sh on failure.

-### Next Actions
+## 3. Spec Requirement for Feature Changes

-1. Triage existing lint findings into:
-   - safe autofix
-   - manual low-risk cleanup
-   - intentional warnings to suppress with justification
-2. Reach green `./scripts/validate-all.sh` in local dev.
-3. Add CI workflow to enforce `validate-all` on pull requests.
+Required:

-## Roadmap Phases
+1. Feature code changes must include a spec file update.
+2. Spec format must be standardized.

-## Phase 0: Foundation (1-2 weeks)
+Current status:

-### Deliverables
+1. Implemented via scripts/check-feature-spec.sh.
+2. Spec template exists at docs/spec-template.md.
+3. Specs folder guidance exists at docs/specs/README.md.

-1. Deterministic dev and test environment.
-2. Baseline lint/type/test commands working in CI and local.
-3. Standardized log format across frontend, backend, and Tauri host.
+## 4. Backend Contract Test Coverage

-### Tasks
-
-1. Stabilize toolchain commands:
-   - frontend lint/typecheck/test
-   - backend lint/typecheck/test
-   - workspace e2e smoke command
-2. Add a single script for local validation, for example `npm run validate:all`.
-3. Introduce structured logging fields:
-   - timestamp
-   - request/job id
-   - subsystem (frontend/backend/host)
-   - error code
-4. Add reproducible media fixtures for tests under a dedicated test-fixtures path.
+Required:

-### Exit Criteria
+1. Router-level contract tests for success and error paths.
+2. Tests are deterministic and mock heavy services.

- Fresh clone can run validation with one command.
- CI produces deterministic pass/fail on clean branches.
- Failures include enough context to reproduce without manual guessing.
+Current status:

-## Phase 1: Spec + Test Contracts (2-4 weeks)
+1. Implemented in backend/tests/test_router_contracts.py.
+2. Cache utility baseline tests implemented in backend/tests/test_cache_utils.py.

-### Deliverables
+## 5. Error-Tolerant Router Contracts

-1. Feature spec template used for all new work.
-2. API and schema contracts versioned and validated.
-3. Regression harness for previous bugs.
+Required:

-### Tasks
-
-1. Create `docs/spec-template.md` with required sections:
-   - user story
-   - acceptance criteria
-   - non-goals
-   - edge cases
-   - rollback behavior
-2. Add contract tests for backend routers:
-   - transcribe
-   - export
-   - captions
-   - audio
-3. Add project schema validation tests for `shared/project-schema.json` and project load/save behavior.
-4. For each resolved bug, add a regression test before closing issue.
+1. Expected client errors must remain 4xx.
+2. Server failures must return 5xx with useful detail.

-### Exit Criteria
+Current status:

- New feature PRs must include spec and tests.
- Breaking contract changes are detected automatically in CI.
+1. Implemented for captions/export HTTPException passthrough.
+2. Covered by contract tests.

-## Phase 2: Observability and Self-Debugging (2-3 weeks)
+## 6. Basic Autonomy Policy

-### Deliverables
+Required:

-1. Unified diagnostics bundle command.
-2. AI-readable failure artifacts from CI and local runs.
-3. Error taxonomy and runbook mapping.
+1. Clear autonomous scope and escalation rules.
+2. Clear restrictions for high-risk changes.

-### Tasks
+Current status:

-1. Implement diagnostics command to collect:
-   - frontend logs
-   - backend logs
-   - Tauri logs
-   - failing test outputs
-   - environment metadata
-2. Define error codes for common classes:
-   - media decode
-   - FFmpeg pipeline
-   - transcription model
-   - project load/save
-   - network/IPC bridge
-3. Add runbook table mapping error codes to probable causes and first fixes.
+1. Implemented in docs/ai-policy.md.

-### Exit Criteria
+## Must-Have Remaining Work

- Agent can identify likely root cause from artifacts without asking for manual logs.
- 80%+ of recurring failures map to known error classes.
+No remaining must-have items.

-## Phase 3: Controlled Autonomous Implementation (3-5 weeks)
+Completed in this pass:

-### Deliverables
+1. Added lightweight frontend tests and integrated them into scripts/validate-all.sh.
+2. Added pull request template with required spec link and acceptance criteria checklist.
+3. Added endpoint-level contract assertions for /file range requests and /audio/waveform cache-hit/cache-miss behavior.
+4. Confirmed scripts/validate-all.sh passes end-to-end with frontend tests + expanded backend contracts.

-1. Policy file defining what AI can edit/run without approval.
-2. Autonomous task loop for implement -> validate -> fix -> revalidate.
-3. Automatic PR summary with risk and assumptions.
+## Out of Scope for Must-Have Baseline

-### Tasks
+Useful later, but not required for strong day-to-day autonomous implementation:

-1. Add policy file (for example `docs/ai-policy.md`):
-   - allowed directories for autonomous edits
-   - blocked files requiring approval
-   - blocked commands
-2. Add task template for AI execution:
-   - parse feature spec
-   - locate impacted modules
-   - implement smallest changes
-   - run validation suite
-   - retry up to N fix cycles
-   - produce summary + residual risks
-3. Require AI to update:
-   - copilot instructions
-   - changelog/roadmap note
-   - regression tests when bugfixing
+1. Full quality dashboards.
+2. Advanced autonomy telemetry.
+3. Complete long-term governance expansion.
+4. High-autonomy optimization beyond 90% reliability target.

-### Exit Criteria
+## Definition of Done (Must-Have Plan)

- Low-risk feature tasks complete end-to-end without human intervention.
- CI gate pass rate for autonomous PRs remains above agreed threshold (for example 95%).
+Must-have plan is complete when all are true:

-## Phase 4: High-Autonomy with Human Escalation (ongoing)
+1. scripts/validate-all.sh passes locally and in CI.
+2. Feature PRs without spec updates are blocked.
+3. Backend router contracts cover core success and error paths.
+4. Frontend has at least one stable test command integrated into validation.
+5. AI policy + diagnostics workflow are active.

-### Deliverables
+## Current State Summary

-1. Explicit escalation triggers for ambiguity and risk.
-2. Broader autonomous scope with mandatory gates.
-3. Drift monitoring for quality, velocity, and regressions.
+Completed:

-### Tasks
+1. Validation and CI enforcement.
+2. Diagnostics capture.
+3. Spec policy and templates.
+4. Backend contract test foundation (including AI endpoints).
+5. Core router error-path correctness.
+6. Autonomy policy baseline.
+7. Frontend test command integrated into validation.
+8. PR template requirement added.
+9. /file and /audio/waveform contract assertions implemented.

-1. Define escalation triggers:
-   - user-visible behavior changes without clear spec
-   - API/schema breakage
-   - security-sensitive modifications
-   - destructive migrations
-2. Add quality dashboards:
-   - flaky tests
-   - escaped defects
-   - mean time to recovery
-   - autonomous task success rate
-3. Monthly calibration:
-   - adjust autonomy scope
-   - update policies
-   - prune stale runbooks and memories
+Remaining:

-### Exit Criteria
-
- Autonomous throughput increases while defect rate stays stable or improves.
- Human review focuses on strategy and product decisions, not routine implementation/debugging.
-
-## Required Engineering Systems
-
-## 1. Spec System
-
-Minimum implementation:
-
-1. `docs/spec-template.md`
-2. `docs/specs/` folder with one file per feature
-3. CI check that new feature PRs include a spec reference
-
-## 2. Test System
-
-Minimum implementation:
-
-1. Frontend unit tests for stores/components/hook logic.
-2. Backend unit+integration tests for routers/services.
-3. E2E smoke tests for core workflow:
-   - open media
-   - transcribe
-   - edit zones
-   - export
-4. Regression tests required for every bugfix.
-
-## 3. Environment System
-
-Minimum implementation:
-
-1. Locked dependencies and pinned runtimes.
-2. Single bootstrap script.
-3. Fixture media files for deterministic test runs.
-
-## 4. Observability System
-
-Minimum implementation:
-
-1. Structured logs.
-2. Standard error codes.
-3. Diagnostics bundle command.
-4. CI artifact retention for failed runs.
-
-## 5. Governance System
-
-Minimum implementation:
-
-1. Protected branch + required checks.
-2. Secret and dependency scanning.
-3. Policy-based approval requirements for high-risk changes.
-
-## Suggested Repository Additions
-
-1. `AI_dev_plan.md` (this file)
-2. `docs/spec-template.md`
-3. `docs/ai-policy.md`
-4. `docs/runbooks/error-codes.md`
-5. `docs/runbooks/debug-playbooks.md`
-6. `scripts/validate-all.sh`
-7. `scripts/collect-diagnostics.sh`
-
-## Definition of Done for Autonomous Tasks
-
-A task is complete only if all items pass:
-
-1. Feature spec acceptance criteria satisfied.
-2. Relevant tests added/updated and passing.
-3. No lint/type errors in changed scope.
-4. Docs and instructions updated if behavior changed.
-5. Risk summary and assumptions recorded.
-
-## Escalation Rules (Must Ask Human)
-
-AI must stop and ask when:
-
-1. Requirement ambiguity changes user-visible behavior.
-2. Multiple valid product decisions exist without clear preference.
-3. Security/privacy/compliance implications are uncertain.
-4. Data loss or destructive migration is possible.
-5. CI remains failing after bounded auto-fix attempts.
-
-## Metrics to Track
-
-1. Autonomous task success rate.
-2. Reopen rate of AI-completed tasks.
-3. Regression rate per release.
-4. Flaky test percentage.
-5. Mean time to diagnose and resolve failures.
-
-## 30-Day Execution Plan
-
-Week 1:
-
-1. Baseline scripts and deterministic environment.
-2. Restore lint/test commands to green status.
-3. Add structured logging and IDs.
-
-Week 2:
-
-1. Spec template and mandatory spec policy.
-2. Contract tests for core backend routes.
-3. First diagnostics bundle version.
-
-Week 3:
-
-1. AI policy and bounded autonomous edit/run loop.
-2. Regression-test-first bugfix workflow.
-3. CI artifact enrichment and runbook mapping.
-
-Week 4:
-
-1. Pilot autonomous feature tasks in low-risk areas.
-2. Measure success/failure patterns.
-3. Expand scope only if quality gates hold.
-
-## Notes for TalkEdit
-
-1. Keep router files thin and service logic isolated to improve AI edit precision.
-2. Preserve compatibility in desktop bridge contracts to avoid frontend breakage.
-3. Treat export/transcription pipeline changes as high-risk and always require regression tests.
-4. Keep Linux WebKit startup and media URL consistency as explicit regression targets.
+1. No must-have items remaining.