TalkEdit/FEATURES.md

# TalkEdit — Feature Roadmap

Features are grouped by priority. Check off items as they are implemented.

---

## 🔴 Highest Impact Next — Conversion and retention

- [ ] [#015] **Word text correction** — allow editing the transcript text of a word without affecting its timing. Whisper gets homophones/proper nouns wrong constantly. Pure frontend state change; no backend needed.

- [ ] [#013] **Re-transcribe selection** — if Whisper gets a section wrong, let the user select a word range and re-run transcription on just that segment (optionally with a different model or language).

- [ ] [#012] **Low-confidence word highlighting** — WhisperX already returns `confidence` per word. Words below a threshold (e.g. < 0.6) should be visually underlined or tinted so the user knows where to double-check.

- [ ] [#018] **Audio normalization / loudness targeting** — single "Normalize" button that targets a LUFS level (-14 for YouTube, -16 for Spotify). Backend: `ffmpeg -af loudnorm`. Very high value for podcasters, ~2–3 hours of work.

- [ ] [#024] **Export to transcript text / SRT only** — some users just want a clean `.txt` or `.srt` of the edited transcript without rendering video.

- [ ] [#023] **Batch silence removal** — full-file scan + remove all pauses above threshold in one click. Distinct from the manual trimmer above; this is a "fix the whole file" operation.

---

## 🟡 Medium Impact — Workflow completeness

- [ ] [#016] **Named timeline markers** — drop named marker pins on the waveform (like Resolve markers). Store as `{ id, time, label, color }` in the project. Rendered as colored triangles on the timeline canvas.

- [ ] [#017] **Chapters** — group markers into named chapter ranges. Useful for podcasts and lectures. Exportable as YouTube chapter timestamps in the description.

- [ ] [#041] **Customizable hotkeys / keymap editor (left-hand focused)** — allow users to view, remap, and reset keyboard shortcuts (transport, edit, save/export, zone tools), with a default preset optimized for left-hand reach (Q/W/E/R/A/S/D/F/Z/X/C/V + modifiers). Include conflict detection, an alternate standard preset, and one-click "restore defaults".

- [ ] [#022] **Clip thumbnail strip** — video frame thumbnails along the timeline so users can navigate visually, not only by waveform. Backend: `ffmpeg` thumbnail extraction at regular intervals.

---

## 🟢 Lower Impact — Expansion and advanced scope

- [ ] [#020] **Video zoom / punch-in** — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: `ffmpeg -vf crop/scale/zoompan`.

- [ ] [#021] **Multi-clip / append** — load a second video and append it to the timeline. Even without a full multi-track timeline, "append clip" is a heavily used workflow.

- [ ] [#019] **Background music track** — a second audio track for background music with volume ducking. Major gap in Descript that TalkEdit could own. Backend: `ffmpeg` amix + `asendcmd` for auto-ducking.

- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.

---

## ✅ Completed high-impact foundation

- [x] [#001] **Cut / Mute sections**
- [x] [#002] **Silence / pause trimmer**
- [x] [#003] **Operation-level undo for batch actions**
- [x] [#004] **Grouped silence-trim zones (editable batch)**
- [x] [#005] **Edit silence-trim group settings after apply**
- [x] [#006] **Volume / gain control**
- [x] [#007] **Speed adjustment (4th zone type)**
- [x] [#008] **Cut preview**
- [x] [#009] **Timeline shows output length**
- [x] [#010] **Transcript search (Ctrl+F)**
- [x] [#011] **Mark In / Out + delete (I / O keys)**

---

## 💡 TalkEdit competitive advantages to lean into

These aren't features to build — they're things to make more visible in the UI and README:

- **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
- **Local AI models** — Ollama support means no API costs and no data leaving the device.
- **Word-level precision** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
- **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.

---

## ✅ Already Implemented

- [#025] Word-level transcript editing (select, drag, shift-click, delete)
- [#026] Ctrl+click word → seek timeline to that position
- [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
- [#028] Auto-scroll waveform when playhead goes off-screen
- [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
- [#030] AI clip suggestions for social media
- [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
- [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM, 720p/1080p/4K)
- [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
- [#034] Speaker diarization
- [#035] Project save / load (.aive JSON format)
- [#036] Undo / redo (100-level history via Zundo)
- [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
+								# TalkEdit — Feature Roadmap
 								Features are grouped by priority. Check off items as they are implemented.
 								---
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								## 🔴 Highest Impact Next — Conversion and retention
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#015] **Word text correction** — allow editing the transcript text of a word without affecting its timing. Whisper gets homophones/proper nouns wrong constantly. Pure frontend state change; no backend needed.
-												silence trimmer

											
										
										
											2026-04-03 12:05:44 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#013] **Re-transcribe selection** — if Whisper gets a section wrong, let the user select a word range and re-run transcription on just that segment (optionally with a different model or language).
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#012] **Low-confidence word highlighting** — WhisperX already returns `confidence` per word. Words below a threshold (e.g. < 0.6) should be visually underlined or tinted so the user knows where to double-check.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#018] **Audio normalization / loudness targeting** — single "Normalize" button that targets a LUFS level (-14 for YouTube, -16 for Spotify). Backend: `ffmpeg -af loudnorm`. Very high value for podcasters, ~2–3 hours of work.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#024] **Export to transcript text / SRT only** — some users just want a clean `.txt` or `.srt` of the edited transcript without rendering video.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#023] **Batch silence removal** — full-file scan + remove all pauses above threshold in one click. Distinct from the manual trimmer above; this is a "fix the whole file" operation.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
 								---
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								## 🟡 Medium Impact — Workflow completeness
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												volume panel; copilot instructions

											
										
										
											2026-04-15 16:10:35 -06:00
+								- [ ] [#016] **Named timeline markers** — drop named marker pins on the waveform (like Resolve markers). Store as `{ id, time, label, color }` in the project. Rendered as colored triangles on the timeline canvas.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												volume panel; copilot instructions

											
										
										
											2026-04-15 16:10:35 -06:00
+								- [ ] [#017] **Chapters** — group markers into named chapter ranges. Useful for podcasts and lectures. Exportable as YouTube chapter timestamps in the description.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												improved zone handling

											
										
										
											2026-04-15 18:00:34 -06:00
+								- [ ] [#041] **Customizable hotkeys / keymap editor (left-hand focused)** — allow users to view, remap, and reset keyboard shortcuts (transport, edit, save/export, zone tools), with a default preset optimized for left-hand reach (Q/W/E/R/A/S/D/F/Z/X/C/V + modifiers). Include conflict detection, an alternate standard preset, and one-click "restore defaults".
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#022] **Clip thumbnail strip** — video frame thumbnails along the timeline so users can navigate visually, not only by waveform. Backend: `ffmpeg` thumbnail extraction at regular intervals.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								---
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								## 🟢 Lower Impact — Expansion and advanced scope
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												volume panel; copilot instructions

											
										
										
											2026-04-15 16:10:35 -06:00
+								- [ ] [#020] **Video zoom / punch-in** — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: `ffmpeg -vf crop/scale/zoompan`.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												volume panel; copilot instructions

											
										
										
											2026-04-15 16:10:35 -06:00
+								- [ ] [#021] **Multi-clip / append** — load a second video and append it to the timeline. Even without a full multi-track timeline, "append clip" is a heavily used workflow.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#019] **Background music track** — a second audio track for background music with volume ducking. Major gap in Descript that TalkEdit could own. Backend: `ffmpeg` amix + `asendcmd` for auto-ducking.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								---
 								## ✅ Completed high-impact foundation
 								- [x] [#001] **Cut / Mute sections**
 								- [x] [#002] **Silence / pause trimmer**
 								- [x] [#003] **Operation-level undo for batch actions**
 								- [x] [#004] **Grouped silence-trim zones (editable batch)**
 								- [x] [#005] **Edit silence-trim group settings after apply**
 								- [x] [#006] **Volume / gain control**
 								- [x] [#007] **Speed adjustment (4th zone type)**
 								- [x] [#008] **Cut preview**
 								- [x] [#009] **Timeline shows output length**
 								- [x] [#010] **Transcript search (Ctrl+F)**
 								- [x] [#011] **Mark In / Out + delete (I / O keys)**
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
 								---
 								## 💡 TalkEdit competitive advantages to lean into
 								These aren't features to build — they're things to make more visible in the UI and README:
 								- **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
 								- **Local AI models** — Ollama support means no API costs and no data leaving the device.
 								- **Word-level precision** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
 								- **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.
 								---
 								## ✅ Already Implemented
-												volume panel; copilot instructions

											
										
										
											2026-04-15 16:10:35 -06:00
+								- [#025] Word-level transcript editing (select, drag, shift-click, delete)
 								- [#026] Ctrl+click word → seek timeline to that position
 								- [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
 								- [#028] Auto-scroll waveform when playhead goes off-screen
 								- [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
 								- [#030] AI clip suggestions for social media
 								- [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
 								- [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM, 720p/1080p/4K)
 								- [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
 								- [#034] Speaker diarization
 								- [#035] Project save / load (.aive JSON format)
 								- [#036] Undo / redo (100-level history via Zundo)
 								- [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
 								- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
 								- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
 								- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove