106 lines
6.8 KiB
Markdown
106 lines
6.8 KiB
Markdown
# TalkEdit Copilot Instructions (Living Project Context)
|
||
|
||
Purpose: give AI assistants immediate, accurate context for this repository and define what must be kept in sync when the project evolves.
|
||
|
||
## How To Use This File
|
||
|
||
- This is a workspace instruction file for VS Code Chat/Copilot.
|
||
- Treat this as the first source of truth for architecture and workflow expectations.
|
||
- If your code changes make any section outdated, update this file in the same change.
|
||
|
||
## Project Snapshot
|
||
|
||
- Name: TalkEdit
|
||
- Product: local-first, AI-powered, text-based audio/video editor.
|
||
- Primary runtime: Tauri + React frontend + Python FastAPI backend.
|
||
- Desktop only (Electron has been removed; Tauri is the exclusive desktop runtime).
|
||
|
||
## Tech Stack
|
||
|
||
- Frontend: React 19, TypeScript, Vite, Tailwind, Zustand.
|
||
- Desktop bridge: Tauri API (IPC commands via `window.electronAPI` polyfill in `frontend/src/lib/tauri-bridge.ts` for unified call-site interface).
|
||
- Backend: FastAPI + Uvicorn (`backend/main.py`) with routers in `backend/routers` and core services in `backend/services`.
|
||
- Media tooling: FFmpeg for edit/export and codec operations.
|
||
- AI tooling: WhisperX/faster-whisper for transcription; provider layer supports OpenAI/Anthropic/Ollama.
|
||
|
||
## Code Map
|
||
|
||
- `frontend/src/components`: editor UI (player, transcript, waveform, settings, export, AI panel).
|
||
- `frontend/src/store`: Zustand state (`editorStore`, `aiStore`).
|
||
- `frontend/src/hooks`: keyboard/video sync behavior.
|
||
- `backend/routers`: API surface (`/transcribe`, `/export`, `/ai/*`, `/captions`, `/audio/*`).
|
||
- `backend/services`: heavy operations (transcription, captioning, diarization, video editing, cleanup).
|
||
- `shared/project-schema.json`: saved project schema contract.
|
||
- `src-tauri`: Rust/Tauri host code and app configuration.
|
||
|
||
## Run And Build (Preferred)
|
||
|
||
- Frontend dev: `npm run dev`
|
||
- Backend dev: `npm run dev:backend`
|
||
- Tauri dev: `npm run dev:tauri`
|
||
- Tauri build: `npm run build:tauri`
|
||
|
||
Use project virtualenvs where available (`.venv312`, `.venv`, or `venv`) for backend execution.
|
||
|
||
## Working Conventions
|
||
|
||
- Keep router files thin; put heavy logic in `backend/services`.
|
||
- Preserve response compatibility for existing frontend callers unless task explicitly allows API breakage.
|
||
- Frontend uses unified `window.electronAPI` interface (Tauri-backed via tauri-bridge.ts); desktop APIs are implemented exclusively in Tauri.
|
||
- Prefer small, focused edits over broad refactors.
|
||
|
||
## Known Risk Areas
|
||
|
||
- Startup/rendering on Linux WebKit can regress when reintroducing remote fonts/CSP allowances; prefer local font assets.
|
||
- Media URL handling between project load paths should remain consistent to avoid format-specific regressions (especially WAV/MP3 behavior).
|
||
- Export pipeline changes must preserve caption modes (`none`, `sidecar`, `burn-in`) and audio enhancement behavior.
|
||
- WAV export uses `pcm_s16le` codec — only available for audio-only inputs (no video stream). Format selector conditionally shows WAV based on input file extension.
|
||
- `<select>` dropdowns need `[color-scheme:dark]` Tailwind class on Linux WebKit or the native popup renders white-on-light-gray.
|
||
- Frontend gain ranges use camelCase (`gainDb`) but the backend expects snake_case (`gain_db`). The ExportDialog maps them before sending. Any new call sites must do the same.
|
||
|
||
## Recent Changes
|
||
|
||
### 2026-05-04 — Word text correction, low-confidence highlighting, audio normalization
|
||
|
||
- **Word text correction (#015)**: Double-click any word in the transcript editor to edit its text inline. Press Enter to commit, Escape to cancel. State is updated in both `words[]` and `segments[]` arrays (segment text recomposed from updated words). Pure frontend; no backend changes needed.
|
||
- **Low-confidence word highlighting (#012)**: Words with `confidence < threshold` (default 0.6, configurable in Settings panel) render with an orange dotted underline. Tooltip shows exact confidence percentage. Threshold is persisted in `localStorage` key `talkedit:confidenceThreshold`.
|
||
- **Audio normalization (#018)**: New backend endpoint `POST /audio/normalize` in `backend/routers/audio.py`. Two-pass FFmpeg `loudnorm` (measure then apply) implemented in `backend/services/audio_cleaner.py:normalize_audio()`. Falls back to single-pass if measurement fails. Frontend UI in Export panel: target selector (YouTube -14, Spotify -16, Broadcast -23, etc.) with "Normalize" button.
|
||
- **Store**: New `updateWordText(index, text)` action in `editorStore.ts` updates both `words[]` and recomputes `segments[].text`.
|
||
- **Settings panel**: New confidence threshold slider (0–1 range).
|
||
- **WAV export format**: Format selector shows "WAV (Uncompressed)" for audio-only inputs. Backend uses `pcm_s16le` codec via `_get_codec_args()` helper. Codec selection centralized in `backend/services/video_editor.py:_get_codec_args(format_hint, has_video)`.
|
||
- **Normalization moved to export**: No longer a standalone button. Integrated as `normalizeAudio` checkbox + LUFS target selector in ExportPanel. Sent as `normalize_loudness`/`normalize_target_lufs` to backend. Applied via `loudnorm` in FFmpeg audio filter chain during export.
|
||
- **Export camelCase fix**: `ExportDialog.tsx` now manually maps `gainRanges`→`gain_db` and `muteRanges`→`{start,end}` before sending to backend. Prevents Pydantic v2 field rejection.
|
||
- **color-scheme:dark**: All `<select>` elements in ExportDialog use `[color-scheme:dark]` to ensure readable native dropdown popups on Linux WebKit.
|
||
- **Re-transcribe selection (#013)**: Backend `POST /transcribe/segment` extracts audio via FFmpeg, runs Whisper, adjusts timestamps. Frontend: "Re-transcribe" button on selected words in TranscriptEditor; `replaceWordRange()` store action swaps words + rebuilds segments by speaker.
|
||
- **Transcript-only export (#024)**: "Export Transcript Only" in ExportDialog with .txt/.srt options. **Pure frontend** — generates content in-browser, writes via Tauri `writeFile`. No backend dependency. Respects word cuts.
|
||
|
||
## Update Rules (Important)
|
||
|
||
When a task changes architecture, app wiring, commands, API shape, project schema, or major conventions, update this file before finishing.
|
||
|
||
Always update these sections if affected:
|
||
|
||
- `Project Snapshot`
|
||
- `Tech Stack`
|
||
- `Code Map`
|
||
- `Run And Build (Preferred)`
|
||
- `Working Conventions`
|
||
- `Known Risk Areas`
|
||
- Recent changes section (if applicable)
|
||
- `Code Map`
|
||
- `Run And Build (Preferred)`
|
||
- `Known Risk Areas`
|
||
|
||
If behavior changed significantly, add a short note under a new `Recent Changes` section with:
|
||
|
||
- Date (`YYYY-MM-DD`)
|
||
- What changed
|
||
- What future edits should preserve
|
||
|
||
## Assistant Behavior For This Repo
|
||
|
||
- Validate assumptions against current files before editing.
|
||
- Prefer existing patterns in neighboring files over introducing new patterns.
|
||
- Call out uncertainty explicitly when code and docs disagree.
|
||
- If you discover stale docs, fix them as part of the same task when reasonable.
|