TalkEdit/FEATURES.md

# TalkEdit — Feature Roadmap

Features are grouped by priority. Check off items as they are implemented.

---

## 🔴 Highest Impact Next — Conversion and retention

- [x] [#015] **Word text correction** — double-click any word to edit its text in-place. Preserves timing and confidence. Pure frontend state change. (2026-05-04)

- [x] [#013] **Re-transcribe selection** — select any word range in the transcript and click "Re-transcribe" to re-run Whisper on just that segment. Backend extracts audio via FFmpeg, transcribes with offset-adjusted timestamps. (2026-05-04)

- [x] [#012] **Low-confidence word highlighting** — words with `confidence < 0.6` (configurable in Settings) get an orange dotted underline. Hover shows exact confidence %. (2026-05-04)

- [x] [#018] **Audio normalization / loudness targeting** — Integrated checkbox in Export panel with LUFS target selector (-14 YouTube, -16 Spotify, -23 Broadcast). Applied during export via FFmpeg `loudnorm` in the audio filter chain. No intermediate files. (2026-05-04)

- [x] [#024] **Export to transcript text / SRT only** — "Export Transcript Only" section in Export panel with format selector (plain text or SRT). Uses `POST /export/transcript` backend endpoint. Respects word cuts. (2026-05-04)

- [x] [#023] **Batch silence removal** — full-file scan + remove all pauses above threshold in one click. Implemented by `SilenceTrimmerPanel` + `POST /audio/detect-silence` (FFmpeg silencedetect).

---

## 🟡 Medium Impact — Workflow completeness

- [x] [#016] **Named timeline markers** — colored marker pins on the waveform canvas. Add at current playback position with label/color picker in Markers panel. Editable labels, deletable. Persisted in project file. (2026-05-04)

- [x] [#017] **Chapters** — sorted markers auto-form chapters. "Copy as YouTube timestamps" button exports `MM:SS Label` format to clipboard. (2026-05-04)

- [x] [#041] **Customizable hotkeys / keymap editor** — two presets (Standard: J/K/L/I/O/arrows; Left-hand: Q/W/E/A/S/D/F). Settings panel shows all bindings with click-to-remap, conflict detection, per-key reset to default. Cheatsheet (press `?`) shows current bindings. (2026-05-04)

- [x] [#022] **Clip thumbnail strip** — frontend-side canvas capture from the `<video>` element. Toggle "Thumbnails" button above waveform. Extracts frames at 10s intervals, clickable to seek. Zero backend dependency. (2026-05-04)

---

## 🟢 Lower Impact — Expansion and advanced scope

- [x] [#020] **Video zoom / punch-in** — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: FFmpeg crop/scale post-process. Frontend: sliders in Export dialog. (2026-05-05)

- [x] [#021] **Multi-clip / append** — load additional video clips via Append Clip panel and concatenate during export. Uses FFmpeg concat demuxer. (2026-05-05)

- [x] [#019] **Background music track** — a second audio track for background music with volume ducking. Uses FFmpeg amix + sidechaincompress for auto-ducking. Configurable in Background Music panel. (2026-05-05)

- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.

---

## ✅ Completed high-impact foundation

- [x] [#001] **Cut / Mute sections**
- [x] [#002] **Silence / pause trimmer**
- [x] [#003] **Operation-level undo for batch actions**
- [x] [#004] **Grouped silence-trim zones (editable batch)**
- [x] [#005] **Edit silence-trim group settings after apply**
- [x] [#006] **Volume / gain control**
- [x] [#007] **Speed adjustment (4th zone type)**
- [x] [#008] **Cut preview**
- [x] [#009] **Timeline shows output length**
- [x] [#010] **Transcript search (Ctrl+F)**
- [x] [#011] **Mark In / Out + delete (I / O keys)**

---

- [x] [#042] **Background removal** — MediaPipe Selfie Segmentation + FFmpeg frame processing for person/background separation. Configurable replacement: blur, solid color, or custom image. Applied during export. Falls back to FFmpeg colorkey when MediaPipe unavailable. (2026-05-05)

## 🔮 Future — AI-powered editing & resource library

All AI features use the existing Ollama/OpenAI/Claude provider config — no new auth or setup needed.

- [ ] [#043] **AI Smart Clean** — one-click chain: filler removal + silence trim + noise reduction + loudness normalization in a single pass. `POST /ai/smart-clean` calls existing services sequentially.

- [ ] [#044] **AI Transcript Summarization** — generate bullet-point summary from transcript. `POST /ai/summarize`. AIPanel new tab.

- [ ] [#045] **AI Sentence Rephrase** — right-click word/sentence in transcript → "Rephrase with AI" → see 3 alternatives → click to replace. `POST /ai/rephrase`. TranscriptEditor context menu.

- [ ] [#046] **AI Smart Speed** — detect slow/low-energy sections → mark as suggested SpeedRange segments. `POST /ai/smart-speed`. Preview in AIPanel.

- [ ] [#047] **AI Auto-Chapters** — detect topic shifts in transcript → create TimelineMarkers automatically. `POST /ai/chapters`.

- [ ] [#048] **AI Show Notes** — generate title, description, soundbites, keywords from transcript + markers. `POST /ai/show-notes`. Copy to clipboard or save to file.

- [ ] [#049] **AI Find Fluff** — AI marks rambles, intros, off-topic chatter for deletion. Extends existing filler detection. `POST /ai/find-fluff`. AIPanel tab showing suggested cut ranges.

- [ ] [#050] **AI Smooth Cuts** — remove jump cuts between deleted segments using crossfade/blend during re-encode. Export option toggle.

- [ ] [#051] **AI B-roll** — generate footage from a text prompt to fill visual gaps in the timeline. Uses local SD or API. New "B-roll" section in AIPanel.

- [ ] [#052] **Smart Layouts** — auto-switch video layout between speakers based on who's talking. Detects active speaker from diarization + volume, applies crop/pad to focus on current speaker during export.

- [ ] [#053] **Per-track audio levels** — individual gain per speaker track. Extend `GainRange` model with `track_id`, apply per-stream via FFmpeg.

- [ ] [#054] **Intro/Outro templates** — save segment ranges as reusable templates, apply with one click on export.

- [ ] [#055] **Built-in free music library** — 5–10 CC0/royalty-free short loops shipped in `frontend/public/resources/music/`. BackgroundMusicPanel gets a "Built-in" tab with play/preview.

- [ ] [#056] **Stock media browser** — new `MediaLibraryPanel` that browses local `resources/media/` for images, video, audio with thumbnails. Frontend-only via Tauri `readDir`. Drag-to-add for bg removal images, append clips, or music.

- [ ] [#057] **Sample content downloader** — "Get Sample Video" button on empty state downloads a short public-domain test video + pre-made transcription JSON for trying the app without your own media.

---

## 💡 TalkEdit competitive advantages to lean into

These aren't features to build — they're things to make more visible in the UI and README:

- **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
- **Local AI models** — Ollama support means no API costs and no data leaving the device.
- **Word-level precision** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
- **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.

---

## ✅ Already Implemented

- [#025] Word-level transcript editing (select, drag, shift-click, delete)
- [#026] Ctrl+click word → seek timeline to that position
- [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
- [#028] Auto-scroll waveform when playhead goes off-screen
- [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
- [#030] AI clip suggestions for social media
- [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
- [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM/WAV, 720p/1080p/4K). WAV available for audio-only inputs.
- [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
- [#034] Speaker diarization
- [#035] Project save / load (.aive JSON format)
- [#036] Undo / redo (100-level history via Zundo)
- [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
+								# TalkEdit — Feature Roadmap
 								Features are grouped by priority. Check off items as they are implemented.
 								---
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								## 🔴 Highest Impact Next — Conversion and retention
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												implemented 15,12,18 didn't check 18

											
										
										
											2026-05-04 16:37:25 -06:00
+								- [x] [#015] **Word text correction** — double-click any word to edit its text in-place. Preserves timing and confidence. Pure frontend state change. (2026-05-04)
-												silence trimmer

											
										
										
											2026-04-03 12:05:44 -06:00
-												able to re-transcribe

											
										
										
											2026-05-04 23:54:14 -06:00
+								- [x] [#013] **Re-transcribe selection** — select any word range in the transcript and click "Re-transcribe" to re-run Whisper on just that segment. Backend extracts audio via FFmpeg, transcribes with offset-adjusted timestamps. (2026-05-04)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												implemented 15,12,18 didn't check 18

											
										
										
											2026-05-04 16:37:25 -06:00
+								- [x] [#012] **Low-confidence word highlighting** — words with `confidence < 0.6` (configurable in Settings) get an orange dotted underline. Hover shows exact confidence %. (2026-05-04)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												features update

											
										
										
											2026-05-04 19:01:11 -06:00
+								- [x] [#018] **Audio normalization / loudness targeting** — Integrated checkbox in Export panel with LUFS target selector (-14 YouTube, -16 Spotify, -23 Broadcast). Applied during export via FFmpeg `loudnorm` in the audio filter chain. No intermediate files. (2026-05-04)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												able to re-transcribe

											
										
										
											2026-05-04 23:54:14 -06:00
+								- [x] [#024] **Export to transcript text / SRT only** — "Export Transcript Only" section in Export panel with format selector (plain text or SRT). Uses `POST /export/transcript` backend endpoint. Respects word cuts. (2026-05-04)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												features update

											
										
										
											2026-05-04 19:01:11 -06:00
+								- [x] [#023] **Batch silence removal** — full-file scan + remove all pauses above threshold in one click. Implemented by `SilenceTrimmerPanel` + `POST /audio/detect-silence` (FFmpeg silencedetect).
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
 								---
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								## 🟡 Medium Impact — Workflow completeness
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												verified hotkeys

											
										
										
											2026-05-05 10:22:35 -06:00
+								- [x] [#016] **Named timeline markers** — colored marker pins on the waveform canvas. Add at current playback position with label/color picker in Markers panel. Editable labels, deletable. Persisted in project file. (2026-05-04)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												verified hotkeys

											
										
										
											2026-05-05 10:22:35 -06:00
+								- [x] [#017] **Chapters** — sorted markers auto-form chapters. "Copy as YouTube timestamps" button exports `MM:SS Label` format to clipboard. (2026-05-04)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												verified hotkeys

											
										
										
											2026-05-05 10:22:35 -06:00
+								- [x] [#041] **Customizable hotkeys / keymap editor** — two presets (Standard: J/K/L/I/O/arrows; Left-hand: Q/W/E/A/S/D/F). Settings panel shows all bindings with click-to-remap, conflict detection, per-key reset to default. Cheatsheet (press `?`) shows current bindings. (2026-05-04)
-												improved zone handling

											
										
										
											2026-04-15 18:00:34 -06:00
-												verified hotkeys

											
										
										
											2026-05-05 10:22:35 -06:00
+								- [x] [#022] **Clip thumbnail strip** — frontend-side canvas capture from the `<video>` element. Toggle "Thumbnails" button above waveform. Extracts frames at 10s intervals, clickable to seek. Zero backend dependency. (2026-05-04)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								---
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								## 🟢 Lower Impact — Expansion and advanced scope
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												implemented the lower priority features; haven't tested them

											
										
										
											2026-05-05 20:46:55 -06:00
+								- [x] [#020] **Video zoom / punch-in** — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: FFmpeg crop/scale post-process. Frontend: sliders in Export dialog. (2026-05-05)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												implemented the lower priority features; haven't tested them

											
										
										
											2026-05-05 20:46:55 -06:00
+								- [x] [#021] **Multi-clip / append** — load additional video clips via Append Clip panel and concatenate during export. Uses FFmpeg concat demuxer. (2026-05-05)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												implemented the lower priority features; haven't tested them

											
										
										
											2026-05-05 20:46:55 -06:00
+								- [x] [#019] **Background music track** — a second audio track for background music with volume ducking. Uses FFmpeg amix + sidechaincompress for auto-ducking. Configurable in Background Music panel. (2026-05-05)
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
-												trying to fix export issue and waveform load

											
										
										
											2026-04-15 21:51:05 -06:00
+								---
 								## ✅ Completed high-impact foundation
 								- [x] [#001] **Cut / Mute sections**
 								- [x] [#002] **Silence / pause trimmer**
 								- [x] [#003] **Operation-level undo for batch actions**
 								- [x] [#004] **Grouped silence-trim zones (editable batch)**
 								- [x] [#005] **Edit silence-trim group settings after apply**
 								- [x] [#006] **Volume / gain control**
 								- [x] [#007] **Speed adjustment (4th zone type)**
 								- [x] [#008] **Cut preview**
 								- [x] [#009] **Timeline shows output length**
 								- [x] [#010] **Transcript search (Ctrl+F)**
 								- [x] [#011] **Mark In / Out + delete (I / O keys)**
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
 								---
-												implemented the lower priority features; haven't tested them

											
										
										
											2026-05-05 20:46:55 -06:00
+								- [x] [#042] **Background removal** — MediaPipe Selfie Segmentation + FFmpeg frame processing for person/background separation. Configurable replacement: blur, solid color, or custom image. Applied during export. Falls back to FFmpeg colorkey when MediaPipe unavailable. (2026-05-05)
-												clean up of features

											
										
										
											2026-05-05 23:31:18 -06:00
+								## 🔮 Future — AI-powered editing & resource library
 								All AI features use the existing Ollama/OpenAI/Claude provider config — no new auth or setup needed.
 								- [ ] [#043] **AI Smart Clean** — one-click chain: filler removal + silence trim + noise reduction + loudness normalization in a single pass. `POST /ai/smart-clean` calls existing services sequentially.
 								- [ ] [#044] **AI Transcript Summarization** — generate bullet-point summary from transcript. `POST /ai/summarize`. AIPanel new tab.
 								- [ ] [#045] **AI Sentence Rephrase** — right-click word/sentence in transcript → "Rephrase with AI" → see 3 alternatives → click to replace. `POST /ai/rephrase`. TranscriptEditor context menu.
 								- [ ] [#046] **AI Smart Speed** — detect slow/low-energy sections → mark as suggested SpeedRange segments. `POST /ai/smart-speed`. Preview in AIPanel.
 								- [ ] [#047] **AI Auto-Chapters** — detect topic shifts in transcript → create TimelineMarkers automatically. `POST /ai/chapters`.
 								- [ ] [#048] **AI Show Notes** — generate title, description, soundbites, keywords from transcript + markers. `POST /ai/show-notes`. Copy to clipboard or save to file.
 								- [ ] [#049] **AI Find Fluff** — AI marks rambles, intros, off-topic chatter for deletion. Extends existing filler detection. `POST /ai/find-fluff`. AIPanel tab showing suggested cut ranges.
 								- [ ] [#050] **AI Smooth Cuts** — remove jump cuts between deleted segments using crossfade/blend during re-encode. Export option toggle.
 								- [ ] [#051] **AI B-roll** — generate footage from a text prompt to fill visual gaps in the timeline. Uses local SD or API. New "B-roll" section in AIPanel.
 								- [ ] [#052] **Smart Layouts** — auto-switch video layout between speakers based on who's talking. Detects active speaker from diarization + volume, applies crop/pad to focus on current speaker during export.
 								- [ ] [#053] **Per-track audio levels** — individual gain per speaker track. Extend `GainRange` model with `track_id`, apply per-stream via FFmpeg.
 								- [ ] [#054] **Intro/Outro templates** — save segment ranges as reusable templates, apply with one click on export.
 								- [ ] [#055] **Built-in free music library** — 5–10 CC0/royalty-free short loops shipped in `frontend/public/resources/music/`. BackgroundMusicPanel gets a "Built-in" tab with play/preview.
 								- [ ] [#056] **Stock media browser** — new `MediaLibraryPanel` that browses local `resources/media/` for images, video, audio with thumbnails. Frontend-only via Tauri `readDir`. Drag-to-add for bg removal images, append clips, or music.
 								- [ ] [#057] **Sample content downloader** — "Get Sample Video" button on empty state downloads a short public-domain test video + pre-made transcription JSON for trying the app without your own media.
 								---
-												added distil models

											
										
										
											2026-04-03 10:25:48 -06:00
+								## 💡 TalkEdit competitive advantages to lean into
 								These aren't features to build — they're things to make more visible in the UI and README:
 								- **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
 								- **Local AI models** — Ollama support means no API costs and no data leaving the device.
 								- **Word-level precision** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
 								- **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.
 								---
 								## ✅ Already Implemented
-												volume panel; copilot instructions

											
										
										
											2026-04-15 16:10:35 -06:00
+								- [#025] Word-level transcript editing (select, drag, shift-click, delete)
 								- [#026] Ctrl+click word → seek timeline to that position
 								- [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
 								- [#028] Auto-scroll waveform when playhead goes off-screen
 								- [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
 								- [#030] AI clip suggestions for social media
 								- [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
-												features update

											
										
										
											2026-05-04 19:01:11 -06:00
+								- [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM/WAV, 720p/1080p/4K). WAV available for audio-only inputs.
-												volume panel; copilot instructions

											
										
										
											2026-04-15 16:10:35 -06:00
+								- [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
 								- [#034] Speaker diarization
 								- [#035] Project save / load (.aive JSON format)
 								- [#036] Undo / redo (100-level history via Zundo)
 								- [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
 								- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
 								- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
 								- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove