Files
TalkEdit/FEATURES.md

92 lines
5.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TalkEdit — Feature Roadmap
Features are grouped by priority. Check off items as they are implemented.
---
## 🔴 Highest Impact Next — Conversion and retention
- [ ] [#015] **Word text correction** — allow editing the transcript text of a word without affecting its timing. Whisper gets homophones/proper nouns wrong constantly. Pure frontend state change; no backend needed.
- [ ] [#013] **Re-transcribe selection** — if Whisper gets a section wrong, let the user select a word range and re-run transcription on just that segment (optionally with a different model or language).
- [ ] [#012] **Low-confidence word highlighting** — WhisperX already returns `confidence` per word. Words below a threshold (e.g. < 0.6) should be visually underlined or tinted so the user knows where to double-check.
- [ ] [#018] **Audio normalization / loudness targeting** single "Normalize" button that targets a LUFS level (-14 for YouTube, -16 for Spotify). Backend: `ffmpeg -af loudnorm`. Very high value for podcasters, ~23 hours of work.
- [ ] [#024] **Export to transcript text / SRT only** some users just want a clean `.txt` or `.srt` of the edited transcript without rendering video.
- [ ] [#023] **Batch silence removal** full-file scan + remove all pauses above threshold in one click. Distinct from the manual trimmer above; this is a "fix the whole file" operation.
---
## 🟡 Medium Impact — Workflow completeness
- [ ] [#016] **Named timeline markers** drop named marker pins on the waveform (like Resolve markers). Store as `{ id, time, label, color }` in the project. Rendered as colored triangles on the timeline canvas.
- [ ] [#017] **Chapters** group markers into named chapter ranges. Useful for podcasts and lectures. Exportable as YouTube chapter timestamps in the description.
- [ ] [#041] **Customizable hotkeys / keymap editor (left-hand focused)** allow users to view, remap, and reset keyboard shortcuts (transport, edit, save/export, zone tools), with a default preset optimized for left-hand reach (Q/W/E/R/A/S/D/F/Z/X/C/V + modifiers). Include conflict detection, an alternate standard preset, and one-click "restore defaults".
- [ ] [#022] **Clip thumbnail strip** video frame thumbnails along the timeline so users can navigate visually, not only by waveform. Backend: `ffmpeg` thumbnail extraction at regular intervals.
---
## 🟢 Lower Impact — Expansion and advanced scope
- [ ] [#020] **Video zoom / punch-in** scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: `ffmpeg -vf crop/scale/zoompan`.
- [ ] [#021] **Multi-clip / append** load a second video and append it to the timeline. Even without a full multi-track timeline, "append clip" is a heavily used workflow.
- [ ] [#019] **Background music track** a second audio track for background music with volume ducking. Major gap in Descript that TalkEdit could own. Backend: `ffmpeg` amix + `asendcmd` for auto-ducking.
- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.
---
## ✅ Completed high-impact foundation
- [x] [#001] **Cut / Mute sections**
- [x] [#002] **Silence / pause trimmer**
- [x] [#003] **Operation-level undo for batch actions**
- [x] [#004] **Grouped silence-trim zones (editable batch)**
- [x] [#005] **Edit silence-trim group settings after apply**
- [x] [#006] **Volume / gain control**
- [x] [#007] **Speed adjustment (4th zone type)**
- [x] [#008] **Cut preview**
- [x] [#009] **Timeline shows output length**
- [x] [#010] **Transcript search (Ctrl+F)**
- [x] [#011] **Mark In / Out + delete (I / O keys)**
---
## 💡 TalkEdit competitive advantages to lean into
These aren't features to build they're things to make more visible in the UI and README:
- **100% offline / no account required** CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
- **Local AI models** Ollama support means no API costs and no data leaving the device.
- **Word-level precision** editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
- **Works on long files** virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.
---
## ✅ Already Implemented
- [#025] Word-level transcript editing (select, drag, shift-click, delete)
- [#026] Ctrl+click word seek timeline to that position
- [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
- [#028] Auto-scroll waveform when playhead goes off-screen
- [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
- [#030] AI clip suggestions for social media
- [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
- [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM, 720p/1080p/4K)
- [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
- [#034] Speaker diarization
- [#035] Project save / load (.aive JSON format)
- [#036] Undo / redo (100-level history via Zundo)
- [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove