# TalkEdit — Feature Roadmap Features are grouped by priority. Check off items as they are implemented. --- ## 🔴 Highest Impact Next — Conversion and retention - [x] [#015] **Word text correction** — double-click any word to edit its text in-place. Preserves timing and confidence. Pure frontend state change. (2026-05-04) - [x] [#013] **Re-transcribe selection** — select any word range in the transcript and click "Re-transcribe" to re-run Whisper on just that segment. Backend extracts audio via FFmpeg, transcribes with offset-adjusted timestamps. (2026-05-04) - [x] [#012] **Low-confidence word highlighting** — words with `confidence < 0.6` (configurable in Settings) get an orange dotted underline. Hover shows exact confidence %. (2026-05-04) - [x] [#018] **Audio normalization / loudness targeting** — Integrated checkbox in Export panel with LUFS target selector (-14 YouTube, -16 Spotify, -23 Broadcast). Applied during export via FFmpeg `loudnorm` in the audio filter chain. No intermediate files. (2026-05-04) - [x] [#024] **Export to transcript text / SRT only** — "Export Transcript Only" section in Export panel with format selector (plain text or SRT). Uses `POST /export/transcript` backend endpoint. Respects word cuts. (2026-05-04) - [x] [#023] **Batch silence removal** — full-file scan + remove all pauses above threshold in one click. Implemented by `SilenceTrimmerPanel` + `POST /audio/detect-silence` (FFmpeg silencedetect). --- ## 🟡 Medium Impact — Workflow completeness - [ ] [#016] **Named timeline markers** — drop named marker pins on the waveform (like Resolve markers). Store as `{ id, time, label, color }` in the project. Rendered as colored triangles on the timeline canvas. - [ ] [#017] **Chapters** — group markers into named chapter ranges. Useful for podcasts and lectures. Exportable as YouTube chapter timestamps in the description. - [ ] [#041] **Customizable hotkeys / keymap editor (left-hand focused)** — allow users to view, remap, and reset keyboard shortcuts (transport, edit, save/export, zone tools), with a default preset optimized for left-hand reach (Q/W/E/R/A/S/D/F/Z/X/C/V + modifiers). Include conflict detection, an alternate standard preset, and one-click "restore defaults". - [ ] [#022] **Clip thumbnail strip** — video frame thumbnails along the timeline so users can navigate visually, not only by waveform. Backend: `ffmpeg` thumbnail extraction at regular intervals. --- ## 🟢 Lower Impact — Expansion and advanced scope - [ ] [#020] **Video zoom / punch-in** — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: `ffmpeg -vf crop/scale/zoompan`. - [ ] [#021] **Multi-clip / append** — load a second video and append it to the timeline. Even without a full multi-track timeline, "append clip" is a heavily used workflow. - [ ] [#019] **Background music track** — a second audio track for background music with volume ducking. Major gap in Descript that TalkEdit could own. Backend: `ffmpeg` amix + `asendcmd` for auto-ducking. - [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing. --- ## ✅ Completed high-impact foundation - [x] [#001] **Cut / Mute sections** - [x] [#002] **Silence / pause trimmer** - [x] [#003] **Operation-level undo for batch actions** - [x] [#004] **Grouped silence-trim zones (editable batch)** - [x] [#005] **Edit silence-trim group settings after apply** - [x] [#006] **Volume / gain control** - [x] [#007] **Speed adjustment (4th zone type)** - [x] [#008] **Cut preview** - [x] [#009] **Timeline shows output length** - [x] [#010] **Transcript search (Ctrl+F)** - [x] [#011] **Mark In / Out + delete (I / O keys)** --- ## 💡 TalkEdit competitive advantages to lean into These aren't features to build — they're things to make more visible in the UI and README: - **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine. - **Local AI models** — Ollama support means no API costs and no data leaving the device. - **Word-level precision** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor. - **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut. --- ## ✅ Already Implemented - [#025] Word-level transcript editing (select, drag, shift-click, delete) - [#026] Ctrl+click word → seek timeline to that position - [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead - [#028] Auto-scroll waveform when playhead goes off-screen - [#029] AI filler word detection and removal (Ollama / OpenAI / Claude) - [#030] AI clip suggestions for social media - [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN) - [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM/WAV, 720p/1080p/4K). WAV available for audio-only inputs. - [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options - [#034] Speaker diarization - [#035] Project save / load (.aive JSON format) - [#036] Undo / redo (100-level history via Zundo) - [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A) - [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E) - [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude) - [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove