Files
TalkEdit/.github/copilot-instructions.md
2026-05-05 10:22:35 -06:00

7.6 KiB
Raw Blame History

TalkEdit Copilot Instructions (Living Project Context)

Purpose: give AI assistants immediate, accurate context for this repository and define what must be kept in sync when the project evolves.

How To Use This File

  • This is a workspace instruction file for VS Code Chat/Copilot.
  • Treat this as the first source of truth for architecture and workflow expectations.
  • If your code changes make any section outdated, update this file in the same change.

Project Snapshot

  • Name: TalkEdit
  • Product: local-first, AI-powered, text-based audio/video editor.
  • Primary runtime: Tauri + React frontend + Python FastAPI backend.
  • Desktop only (Electron has been removed; Tauri is the exclusive desktop runtime).

Tech Stack

  • Frontend: React 19, TypeScript, Vite, Tailwind, Zustand.
  • Desktop bridge: Tauri API (IPC commands via window.electronAPI polyfill in frontend/src/lib/tauri-bridge.ts for unified call-site interface).
  • Backend: FastAPI + Uvicorn (backend/main.py) with routers in backend/routers and core services in backend/services.
  • Media tooling: FFmpeg for edit/export and codec operations.
  • AI tooling: WhisperX/faster-whisper for transcription; provider layer supports OpenAI/Anthropic/Ollama.

Code Map

  • frontend/src/components: editor UI (player, transcript, waveform, settings, export, AI panel).
  • frontend/src/store: Zustand state (editorStore, aiStore).
  • frontend/src/hooks: keyboard/video sync behavior.
  • backend/routers: API surface (/transcribe, /export, /ai/*, /captions, /audio/*).
  • backend/services: heavy operations (transcription, captioning, diarization, video editing, cleanup).
  • shared/project-schema.json: saved project schema contract.
  • src-tauri: Rust/Tauri host code and app configuration.

Run And Build (Preferred)

  • Frontend dev: npm run dev
  • Backend dev: npm run dev:backend
  • Tauri dev: npm run dev:tauri
  • Tauri build: npm run build:tauri

Use project virtualenvs where available (.venv312, .venv, or venv) for backend execution.

Working Conventions

  • Keep router files thin; put heavy logic in backend/services.
  • Preserve response compatibility for existing frontend callers unless task explicitly allows API breakage.
  • Frontend uses unified window.electronAPI interface (Tauri-backed via tauri-bridge.ts); desktop APIs are implemented exclusively in Tauri.
  • Prefer small, focused edits over broad refactors.

Known Risk Areas

  • Startup/rendering on Linux WebKit can regress when reintroducing remote fonts/CSP allowances; prefer local font assets.
  • Media URL handling between project load paths should remain consistent to avoid format-specific regressions (especially WAV/MP3 behavior).
  • Export pipeline changes must preserve caption modes (none, sidecar, burn-in) and audio enhancement behavior.
  • WAV export uses pcm_s16le codec — only available for audio-only inputs (no video stream). Format selector conditionally shows WAV based on input file extension.
  • <select> dropdowns need [color-scheme:dark] Tailwind class on Linux WebKit or the native popup renders white-on-light-gray.
  • Frontend gain ranges use camelCase (gainDb) but the backend expects snake_case (gain_db). The ExportDialog maps them before sending. Any new call sites must do the same.

Recent Changes

2026-05-04 — Word text correction, low-confidence highlighting, audio normalization

  • Word text correction (#015): Double-click any word in the transcript editor to edit its text inline. Press Enter to commit, Escape to cancel. State is updated in both words[] and segments[] arrays (segment text recomposed from updated words). Pure frontend; no backend changes needed.
  • Low-confidence word highlighting (#012): Words with confidence < threshold (default 0.6, configurable in Settings panel) render with an orange dotted underline. Tooltip shows exact confidence percentage. Threshold is persisted in localStorage key talkedit:confidenceThreshold.
  • Audio normalization (#018): New backend endpoint POST /audio/normalize in backend/routers/audio.py. Two-pass FFmpeg loudnorm (measure then apply) implemented in backend/services/audio_cleaner.py:normalize_audio(). Falls back to single-pass if measurement fails. Frontend UI in Export panel: target selector (YouTube -14, Spotify -16, Broadcast -23, etc.) with "Normalize" button.
  • Store: New updateWordText(index, text) action in editorStore.ts updates both words[] and recomputes segments[].text.
  • Settings panel: New confidence threshold slider (01 range).
  • WAV export format: Format selector shows "WAV (Uncompressed)" for audio-only inputs. Backend uses pcm_s16le codec via _get_codec_args() helper. Codec selection centralized in backend/services/video_editor.py:_get_codec_args(format_hint, has_video).
  • Normalization moved to export: No longer a standalone button. Integrated as normalizeAudio checkbox + LUFS target selector in ExportPanel. Sent as normalize_loudness/normalize_target_lufs to backend. Applied via loudnorm in FFmpeg audio filter chain during export.
  • Export camelCase fix: ExportDialog.tsx now manually maps gainRangesgain_db and muteRanges{start,end} before sending to backend. Prevents Pydantic v2 field rejection.
  • color-scheme:dark: All <select> elements in ExportDialog use [color-scheme:dark] to ensure readable native dropdown popups on Linux WebKit.
  • Re-transcribe selection (#013): Backend POST /transcribe/segment extracts audio via FFmpeg, runs Whisper, adjusts timestamps. Frontend: "Re-transcribe" button on selected words in TranscriptEditor; replaceWordRange() store action swaps words + rebuilds segments by speaker.
  • Transcript-only export (#024): "Export Transcript Only" in ExportDialog with .txt/.srt options. Pure frontend — generates content in-browser, writes via Tauri writeFile. No backend dependency. Respects word cuts.
  • Named timeline markers (#016): TimelineMarker type in project.ts. Store actions: addTimelineMarker, updateTimelineMarker, removeTimelineMarker. Colored pins on waveform canvas. MarkersPanel UI for add/edit/delete. Persisted in project.
  • Chapters (#017): getChapters() store action derives from sorted markers. "Copy as YouTube timestamps" in MarkersPanel. Zero backend.
  • Clip thumbnail strip (#022): lib/thumbnails.ts — frontend canvas capture from <video>. Toggle button in WaveformTimeline. Clickable frames at 10s intervals.
  • Customizable hotkeys (#041): lib/keybindings.ts with two presets (standard + left-hand). useKeyboardShortcuts.ts reads bindings dynamically. Settings panel includes key remapper with conflict detection and per-key reset. ? key shows dynamic cheatsheet.

Update Rules (Important)

When a task changes architecture, app wiring, commands, API shape, project schema, or major conventions, update this file before finishing.

Always update these sections if affected:

  • Project Snapshot
  • Tech Stack
  • Code Map
  • Run And Build (Preferred)
  • Working Conventions
  • Known Risk Areas
  • Recent changes section (if applicable)
  • Code Map
  • Run And Build (Preferred)
  • Known Risk Areas

If behavior changed significantly, add a short note under a new Recent Changes section with:

  • Date (YYYY-MM-DD)
  • What changed
  • What future edits should preserve

Assistant Behavior For This Repo

  • Validate assumptions against current files before editing.
  • Prefer existing patterns in neighboring files over introducing new patterns.
  • Call out uncertainty explicitly when code and docs disagree.
  • If you discover stale docs, fix them as part of the same task when reasonable.