Files
TalkEdit/FEATURES.md
2026-05-06 16:15:38 -06:00

189 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TalkEdit — Feature Roadmap
Features are grouped by priority. Check off items as they are implemented.
---
## 🔴 Highest Impact Next — Conversion and retention
- [x] [#015] **Word text correction** — double-click any word to edit its text in-place. Preserves timing and confidence. Pure frontend state change. (2026-05-04)
- [x] [#013] **Re-transcribe selection** — select any word range in the transcript and click "Re-transcribe" to re-run Whisper on just that segment. Backend extracts audio via FFmpeg, transcribes with offset-adjusted timestamps. (2026-05-04)
- [x] [#012] **Low-confidence word highlighting** — words with `confidence < 0.6` (configurable in Settings) get an orange dotted underline. Hover shows exact confidence %. (2026-05-04)
- [x] [#018] **Audio normalization / loudness targeting** — Integrated checkbox in Export panel with LUFS target selector (-14 YouTube, -16 Spotify, -23 Broadcast). Applied during export via FFmpeg `loudnorm` in the audio filter chain. No intermediate files. (2026-05-04)
- [x] [#024] **Export to transcript text / SRT only** — "Export Transcript Only" section in Export panel with format selector (plain text or SRT). Uses `POST /export/transcript` backend endpoint. Respects word cuts. (2026-05-04)
- [x] [#023] **Batch silence removal** — full-file scan + remove all pauses above threshold in one click. Implemented by `SilenceTrimmerPanel` + `POST /audio/detect-silence` (FFmpeg silencedetect).
---
## 🟡 Medium Impact — Workflow completeness
- [x] [#016] **Named timeline markers** — colored marker pins on the waveform canvas. Add at current playback position with label/color picker in Markers panel. Editable labels, deletable. Persisted in project file. (2026-05-04)
- [x] [#017] **Chapters** — sorted markers auto-form chapters. "Copy as YouTube timestamps" button exports `MM:SS Label` format to clipboard. (2026-05-04)
- [x] [#041] **Customizable hotkeys / keymap editor** — two presets (Standard: J/K/L/I/O/arrows; Left-hand: Q/W/E/A/S/D/F). Settings panel shows all bindings with click-to-remap, conflict detection, per-key reset to default. Cheatsheet (press `?`) shows current bindings. (2026-05-04)
- [x] [#022] **Clip thumbnail strip** — frontend-side canvas capture from the `<video>` element. Toggle "Thumbnails" button above waveform. Extracts frames at 10s intervals, clickable to seek. Zero backend dependency. (2026-05-04)
---
## 🟢 Lower Impact — Expansion and advanced scope
- [x] [#020] **Video zoom / punch-in** — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: FFmpeg crop/scale post-process. Frontend: sliders in Export dialog. (2026-05-05)
- [x] [#021] **Multi-clip / append** — load additional video clips via Append Clip panel and concatenate during export. Uses FFmpeg concat demuxer. (2026-05-05)
- [x] [#019] **Background music track** — a second audio track for background music with volume ducking. Uses FFmpeg amix + sidechaincompress for auto-ducking. Configurable in Background Music panel. (2026-05-05)
- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.
---
## ✅ Completed high-impact foundation
- [x] [#001] **Cut / Mute sections**
- [x] [#002] **Silence / pause trimmer**
- [x] [#003] **Operation-level undo for batch actions**
- [x] [#004] **Grouped silence-trim zones (editable batch)**
- [x] [#005] **Edit silence-trim group settings after apply**
- [x] [#006] **Volume / gain control**
- [x] [#007] **Speed adjustment (4th zone type)**
- [x] [#008] **Cut preview**
- [x] [#009] **Timeline shows output length**
- [x] [#010] **Transcript search (Ctrl+F)**
- [x] [#011] **Mark In / Out + delete (I / O keys)**
---
- [x] [#042] **Background removal** — MediaPipe Selfie Segmentation + FFmpeg frame processing for person/background separation. Configurable replacement: blur, solid color, or custom image. Applied during export. Falls back to FFmpeg colorkey when MediaPipe unavailable. (2026-05-05)
## 🔮 Future — AI-powered editing & resource library
All AI features use the existing Ollama/OpenAI/Claude provider config — no new auth or setup needed.
- [ ] [#043] **AI Smart Clean** — one-click chain: filler removal + silence trim + noise reduction + loudness normalization in a single pass. `POST /ai/smart-clean` calls existing services sequentially.
- [ ] [#044] **AI Transcript Summarization** — generate bullet-point summary from transcript. `POST /ai/summarize`. AIPanel new tab.
- [ ] [#045] **AI Sentence Rephrase** — right-click word/sentence in transcript → "Rephrase with AI" → see 3 alternatives → click to replace. `POST /ai/rephrase`. TranscriptEditor context menu.
- [ ] [#046] **AI Smart Speed** — detect slow/low-energy sections → mark as suggested SpeedRange segments. `POST /ai/smart-speed`. Preview in AIPanel.
- [ ] [#047] **AI Auto-Chapters** — detect topic shifts in transcript → create TimelineMarkers automatically. `POST /ai/chapters`.
- [ ] [#048] **AI Show Notes** — generate title, description, soundbites, keywords from transcript + markers. `POST /ai/show-notes`. Copy to clipboard or save to file.
- [ ] [#049] **AI Find Fluff** — AI marks rambles, intros, off-topic chatter for deletion. Extends existing filler detection. `POST /ai/find-fluff`. AIPanel tab showing suggested cut ranges.
- [ ] [#050] **AI Smooth Cuts** — remove jump cuts between deleted segments using crossfade/blend during re-encode. Export option toggle.
- [ ] [#051] **AI B-roll** — generate footage from a text prompt to fill visual gaps in the timeline. Uses local SD or API. New "B-roll" section in AIPanel.
- [ ] [#052] **Smart Layouts** — auto-switch video layout between speakers based on who's talking. Detects active speaker from diarization + volume, applies crop/pad to focus on current speaker during export.
- [ ] [#053] **Per-track audio levels** — individual gain per speaker track. Extend `GainRange` model with `track_id`, apply per-stream via FFmpeg.
- [ ] [#054] **Intro/Outro templates** — save segment ranges as reusable templates, apply with one click on export.
- [ ] [#055] **Built-in free music library** — 510 CC0/royalty-free short loops shipped in `frontend/public/resources/music/`. BackgroundMusicPanel gets a "Built-in" tab with play/preview.
- [ ] [#056] **Stock media browser** — new `MediaLibraryPanel` that browses local `resources/media/` for images, video, audio with thumbnails. Frontend-only via Tauri `readDir`. Drag-to-add for bg removal images, append clips, or music.
- [ ] [#057] **Sample content downloader** — "Get Sample Video" button on empty state downloads a short public-domain test video + pre-made transcription JSON for trying the app without your own media.
---
## 🎬 OpenShot-inspired features
Features adapted from OpenShot Video Editor that would fill capability gaps.
### High-value
- [ ] **Keyframe animations** — animate clip position, scale, opacity, and effects over time. Quadratic bezier, linear, or constant interpolation.
- [ ] **Unlimited tracks / layers** — overlay images, watermarks, background videos. Any transparency shows through to layer below.
- [ ] **Video transitions** — crossfade, wipe, dissolve between clips. Over 400 transition types. Auto-create transition when overlapping two clips.
- [ ] **Title / text overlays** — built-in title editor with 40+ vector templates. SVG-based. Adjust font, color, size, position.
- [ ] **Digital video effects** — brightness, contrast, hue, greyscale, saturation, chroma key (bluescreen/greenscreen). Drag onto clip, animate properties.
- [ ] **Chroma key / greenscreen** — wire up existing `background_removal.rs` as a per-clip effect with color picker + tolerance.
### Medium-value
- [ ] **Time-mapping and speed ramps** — animate speed up/down within a clip using keyframes (vs current constant-speed zones).
- [ ] **Audio split & per-channel mixing** — split audio from video clip, adjust each channel independently.
- [ ] **Frame-accurate stepping** — arrow keys to step frame by frame through the video.
- [ ] **Clip trimming on timeline** — drag clip edges to trim in/out points directly on the waveform timeline.
- [ ] **Snapping** — magnetic snap to markers, clip edges, playhead, zone boundaries.
### Lower priority (partial overlap with existing features)
- [ ] **Cross-platform project files** — save and open `.aive` projects across OSes (already works via JSON).
- [ ] **Desktop drag-and-drop** — drag video/audio/image files from file manager directly onto the timeline or welcome screen.
- [ ] **Multi-format export presets** — common format presets in export dialog (YouTube, Twitter/Instagram, Podcast).
---
## 💡 TalkEdit competitive advantages to lean into
These aren't features to build — they're things to make more visible in the UI and README:
- **7-day free trial (no CC required)** — full feature access for 7 days with no credit card. Try everything risk-free.
- **One-time purchase, no subscription** — pay once, own forever. No recurring fees, no cloud dependency.
- **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
- **Local AI models** — Ollama support means no API costs and no data leaving the device.
- **Word-level precision editing** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor. Double-click any word to correct its text in-place.
- **Per-segment re-transcription** — select any word range and re-run Whisper on just that segment instead of re-transcribing the entire file.
- **Auto-ducking background music** — add a second audio track that automatically lowers when speech is detected, no manual keyframing needed.
- **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.
---
## 🚫 Explicitly deferred — multi-track timeline
TalkEdit deliberately avoids becoming a general-purpose NLE. Multi-track compositing (z-order, per-track opacity, keyframe animation, nested sequences) would take months and you'd still lose to DaVinci Resolve (free), CapCut (free), and OpenShot (free).
**TalkEdit's advantage is that it isn't a timeline editor** — the text-is-the-timeline model makes spoken-word editing drastically faster than dragging razor cuts around.
What exists and what's planned:
- One audio track (main) + optional background music track (done)
- Clip concatenation (append clips end-to-end) (done)
- Potential future: **B-roll overlay** — a single up/down overlay track for inserting images or cutaway clips at specific timestamps. Useful for talking-head content (screenshots, charts, reaction clips) without a full multi-track system. The zone model already supports this — you add an overlay zone type with a file reference.
Everything beyond that (picture-in-picture, multi-layer compositing, per-layer keyframing) should stay deferred. If the market demands it later, it's easier to add than to remove complexity.
---
## ✅ Already Implemented
- [#025] Word-level transcript editing (select, drag, shift-click, delete)
- [#026] Ctrl+click word → seek timeline to that position
- [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
- [#028] Auto-scroll waveform when playhead goes off-screen
- [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
- [#030] AI clip suggestions for social media
- [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
- [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM/WAV, 720p/1080p/4K). WAV available for audio-only inputs.
- [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
- [#034] Speaker diarization
- [#035] Project save / load (.aive JSON format)
- [#036] Undo / redo (100-level history via Zundo)
- [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove
- [#058] Help panel with full feature documentation and searchable index
- [#059] First-run welcome overlay with quick-start guides
- [#060] Auto-save crash recovery — periodic project snapshots, restore on next launch
- [#061] Error boundary + global error logging to `~/.TalkEdit/logs/`
- [#062] Store input validation — Zustand middleware guards against invalid state
- [#063] Trial system — 7-day free trial, sentinel file with integrity check, no CC required
- [#064] License activation — email confirmation flow, offline validation
- [#065] Model management — view/delete downloaded Whisper and DeepFilterNet models
- [#066] Keyboard cheatsheet — press `?` overlay with close button and active preset indicator
- [#067] Visual toolbar — grouped icon buttons with section dividers
- [#068] Backend health check — reconnecting banner when backend is unreachable