dillon_stuff/TalkEdit

Fork 0

Files

dillonj d11e26cf2d improved tools for ai

2026-04-15 16:36:21 -06:00

9.2 KiB

Raw Blame History

AI Dev Roadmap

Purpose

This document defines how TalkEdit can evolve toward highly autonomous AI-driven implementation and debugging.

Goal: AI can execute most engineering work end-to-end with minimal human feedback while preserving safety, quality, and product intent.

Scope

Frontend: React + TypeScript + Vite
Desktop host: Tauri
Backend: FastAPI + Python services
Media pipeline: FFmpeg, transcription, audio processing

Autonomy Target

Near-term target: 80-90% autonomous execution for well-scoped work.
Mid-term target: 90-95% for low/medium-risk features with CI gates.
100% no-feedback autonomy is not realistic for ambiguous product decisions, legal/security tradeoffs, or high-risk migrations.

Core Principles

Specs are executable and machine-readable.
Tests are the primary source of truth for completion.
Every failure is diagnosable from logs/artifacts.
AI has bounded permissions and policy guardrails.
AI updates docs and memory as part of done criteria.

Execution Status (2026-04-15)

Completed

Added roadmap companion docs:
- docs/spec-template.md
- docs/ai-policy.md
- docs/runbooks/error-codes.md
Added operational scripts:
- scripts/validate-all.sh
- scripts/collect-diagnostics.sh
Ran Step 1 validation script (./scripts/validate-all.sh).
Ran Step 2 diagnostics script (./scripts/collect-diagnostics.sh).
Captured diagnostics archive:
- .diagnostics/diag_20260415_163239.tar.gz
Renamed roadmap file to AI_dev_plan.md.

Current Blockers

Frontend lint baseline is not green yet.
Remaining lint issues are mostly pre-existing unused vars and hook dependency warnings across app components.

Next Actions

Triage existing lint findings into:
- safe autofix
- manual low-risk cleanup
- intentional warnings to suppress with justification
Reach green ./scripts/validate-all.sh in local dev.
Add CI workflow to enforce validate-all on pull requests.

Roadmap Phases

Phase 0: Foundation (1-2 weeks)

Deliverables

Deterministic dev and test environment.
Baseline lint/type/test commands working in CI and local.
Standardized log format across frontend, backend, and Tauri host.

Tasks

Stabilize toolchain commands:
- frontend lint/typecheck/test
- backend lint/typecheck/test
- workspace e2e smoke command
Add a single script for local validation, for example npm run validate:all.
Introduce structured logging fields:
- timestamp
- request/job id
- subsystem (frontend/backend/host)
- error code
Add reproducible media fixtures for tests under a dedicated test-fixtures path.

Exit Criteria

Fresh clone can run validation with one command.
CI produces deterministic pass/fail on clean branches.
Failures include enough context to reproduce without manual guessing.

Phase 1: Spec + Test Contracts (2-4 weeks)

Deliverables

Feature spec template used for all new work.
API and schema contracts versioned and validated.
Regression harness for previous bugs.

Tasks

Create docs/spec-template.md with required sections:
- user story
- acceptance criteria
- non-goals
- edge cases
- rollback behavior
Add contract tests for backend routers:
- transcribe
- export
- captions
- audio
Add project schema validation tests for shared/project-schema.json and project load/save behavior.
For each resolved bug, add a regression test before closing issue.

Exit Criteria

New feature PRs must include spec and tests.
Breaking contract changes are detected automatically in CI.

Phase 2: Observability and Self-Debugging (2-3 weeks)

Deliverables

Unified diagnostics bundle command.
AI-readable failure artifacts from CI and local runs.
Error taxonomy and runbook mapping.

Tasks

Implement diagnostics command to collect:
- frontend logs
- backend logs
- Tauri logs
- failing test outputs
- environment metadata
Define error codes for common classes:
- media decode
- FFmpeg pipeline
- transcription model
- project load/save
- network/IPC bridge
Add runbook table mapping error codes to probable causes and first fixes.

Exit Criteria

Agent can identify likely root cause from artifacts without asking for manual logs.
80%+ of recurring failures map to known error classes.

Phase 3: Controlled Autonomous Implementation (3-5 weeks)

Deliverables

Policy file defining what AI can edit/run without approval.
Autonomous task loop for implement -> validate -> fix -> revalidate.
Automatic PR summary with risk and assumptions.

Tasks

Add policy file (for example docs/ai-policy.md):
- allowed directories for autonomous edits
- blocked files requiring approval
- blocked commands
Add task template for AI execution:
- parse feature spec
- locate impacted modules
- implement smallest changes
- run validation suite
- retry up to N fix cycles
- produce summary + residual risks
Require AI to update:
- copilot instructions
- changelog/roadmap note
- regression tests when bugfixing

Exit Criteria

Low-risk feature tasks complete end-to-end without human intervention.
CI gate pass rate for autonomous PRs remains above agreed threshold (for example 95%).

Phase 4: High-Autonomy with Human Escalation (ongoing)

Deliverables

Explicit escalation triggers for ambiguity and risk.
Broader autonomous scope with mandatory gates.
Drift monitoring for quality, velocity, and regressions.

Tasks

Define escalation triggers:
- user-visible behavior changes without clear spec
- API/schema breakage
- security-sensitive modifications
- destructive migrations
Add quality dashboards:
- flaky tests
- escaped defects
- mean time to recovery
- autonomous task success rate
Monthly calibration:
- adjust autonomy scope
- update policies
- prune stale runbooks and memories

Exit Criteria

Autonomous throughput increases while defect rate stays stable or improves.
Human review focuses on strategy and product decisions, not routine implementation/debugging.

Required Engineering Systems

1. Spec System

Minimum implementation:

docs/spec-template.md
docs/specs/ folder with one file per feature
CI check that new feature PRs include a spec reference

2. Test System

Minimum implementation:

Frontend unit tests for stores/components/hook logic.
Backend unit+integration tests for routers/services.
E2E smoke tests for core workflow:
- open media
- transcribe
- edit zones
- export
Regression tests required for every bugfix.

3. Environment System

Minimum implementation:

Locked dependencies and pinned runtimes.
Single bootstrap script.
Fixture media files for deterministic test runs.

4. Observability System

Minimum implementation:

Structured logs.
Standard error codes.
Diagnostics bundle command.
CI artifact retention for failed runs.

5. Governance System

Minimum implementation:

Protected branch + required checks.
Secret and dependency scanning.
Policy-based approval requirements for high-risk changes.

Suggested Repository Additions

AI_dev_plan.md (this file)
docs/spec-template.md
docs/ai-policy.md
docs/runbooks/error-codes.md
docs/runbooks/debug-playbooks.md
scripts/validate-all.sh
scripts/collect-diagnostics.sh

Definition of Done for Autonomous Tasks

A task is complete only if all items pass:

Feature spec acceptance criteria satisfied.
Relevant tests added/updated and passing.
No lint/type errors in changed scope.
Docs and instructions updated if behavior changed.
Risk summary and assumptions recorded.

Escalation Rules (Must Ask Human)

AI must stop and ask when:

Requirement ambiguity changes user-visible behavior.
Multiple valid product decisions exist without clear preference.
Security/privacy/compliance implications are uncertain.
Data loss or destructive migration is possible.
CI remains failing after bounded auto-fix attempts.

Metrics to Track

Autonomous task success rate.
Reopen rate of AI-completed tasks.
Regression rate per release.
Flaky test percentage.
Mean time to diagnose and resolve failures.

30-Day Execution Plan

Week 1:

Baseline scripts and deterministic environment.
Restore lint/test commands to green status.
Add structured logging and IDs.

Week 2:

Spec template and mandatory spec policy.
Contract tests for core backend routes.
First diagnostics bundle version.

Week 3:

AI policy and bounded autonomous edit/run loop.
Regression-test-first bugfix workflow.
CI artifact enrichment and runbook mapping.

Week 4:

Pilot autonomous feature tasks in low-risk areas.
Measure success/failure patterns.
Expand scope only if quality gates hold.

Notes for TalkEdit

Keep router files thin and service logic isolated to improve AI edit precision.
Preserve compatibility in desktop bridge contracts to avoid frontend breakage.
Treat export/transcription pipeline changes as high-risk and always require regression tests.
Keep Linux WebKit startup and media URL consistency as explicit regression targets.

9.2 KiB Raw Blame History

AI Dev Roadmap

Purpose

Scope

Autonomy Target

Core Principles

Execution Status (2026-04-15)

Completed

Current Blockers

Next Actions

Roadmap Phases

Phase 0: Foundation (1-2 weeks)

Deliverables

Tasks

Exit Criteria

Phase 1: Spec + Test Contracts (2-4 weeks)

Deliverables

Tasks

Exit Criteria

Phase 2: Observability and Self-Debugging (2-3 weeks)

Deliverables

Tasks

Exit Criteria

Phase 3: Controlled Autonomous Implementation (3-5 weeks)

Deliverables

Tasks

Exit Criteria

Phase 4: High-Autonomy with Human Escalation (ongoing)

Deliverables

Tasks

Exit Criteria

Required Engineering Systems

1. Spec System

2. Test System

3. Environment System

4. Observability System

5. Governance System

Suggested Repository Additions

Definition of Done for Autonomous Tasks

Escalation Rules (Must Ask Human)

Metrics to Track

30-Day Execution Plan

Notes for TalkEdit

9.2 KiB

Raw Blame History