features

2026-05-06 16:47:54 -06:00
parent 813877a7b4
commit 2212d7b265
1 changed files with 175 additions and 182 deletions
--- a/FEATURES.md
+++ b/FEATURES.md
@ -1,188 +1,181 @@
-# TalkEdit — Feature Roadmap
+# TalkEdit — Features & Roadmap
-Features are grouped by priority. Check off items as they are implemented.
+**Niche:** "Descript for long-form content" — works on hour+ files without degrading, fully offline, one-time payment.
 ---
 ## 🔴 Highest Impact Next — Conversion and retention
 - [x] [#015] **Word text correction** — double-click any word to edit its text in-place. Preserves timing and confidence. Pure frontend state change. (2026-05-04)
 - [x] [#013] **Re-transcribe selection** — select any word range in the transcript and click "Re-transcribe" to re-run Whisper on just that segment. Backend extracts audio via FFmpeg, transcribes with offset-adjusted timestamps. (2026-05-04)
 - [x] [#012] **Low-confidence word highlighting** — words with `confidence < 0.6` (configurable in Settings) get an orange dotted underline. Hover shows exact confidence %. (2026-05-04)
 - [x] [#018] **Audio normalization / loudness targeting** — Integrated checkbox in Export panel with LUFS target selector (-14 YouTube, -16 Spotify, -23 Broadcast). Applied during export via FFmpeg `loudnorm` in the audio filter chain. No intermediate files. (2026-05-04)
 - [x] [#024] **Export to transcript text / SRT only** — "Export Transcript Only" section in Export panel with format selector (plain text or SRT). Uses `POST /export/transcript` backend endpoint. Respects word cuts. (2026-05-04)
 - [x] [#023] **Batch silence removal** — full-file scan + remove all pauses above threshold in one click. Implemented by `SilenceTrimmerPanel` + `POST /audio/detect-silence` (FFmpeg silencedetect).
 ---
 ## 🟡 Medium Impact — Workflow completeness
 - [x] [#016] **Named timeline markers** — colored marker pins on the waveform canvas. Add at current playback position with label/color picker in Markers panel. Editable labels, deletable. Persisted in project file. (2026-05-04)
 - [x] [#017] **Chapters** — sorted markers auto-form chapters. "Copy as YouTube timestamps" button exports `MM:SS Label` format to clipboard. (2026-05-04)
 - [x] [#041] **Customizable hotkeys / keymap editor** — two presets (Standard: J/K/L/I/O/arrows; Left-hand: Q/W/E/A/S/D/F). Settings panel shows all bindings with click-to-remap, conflict detection, per-key reset to default. Cheatsheet (press `?`) shows current bindings. (2026-05-04)
 - [x] [#022] **Clip thumbnail strip** — frontend-side canvas capture from the `<video>` element. Toggle "Thumbnails" button above waveform. Extracts frames at 10s intervals, clickable to seek. Zero backend dependency. (2026-05-04)
 ---
 ## 🟢 Lower Impact — Expansion and advanced scope
 - [x] [#020] **Video zoom / punch-in** — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: FFmpeg crop/scale post-process. Frontend: sliders in Export dialog. (2026-05-05)
 - [x] [#021] **Multi-clip / append** — load additional video clips via Append Clip panel and concatenate during export. Uses FFmpeg concat demuxer. (2026-05-05)
 - [x] [#019] **Background music track** — a second audio track for background music with volume ducking. Uses FFmpeg amix + sidechaincompress for auto-ducking. Configurable in Background Music panel. (2026-05-05)
 - [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.
 ---
 ## ✅ Completed high-impact foundation
 - [x] [#001] **Cut / Mute sections**
 - [x] [#002] **Silence / pause trimmer**
 - [x] [#003] **Operation-level undo for batch actions**
 - [x] [#004] **Grouped silence-trim zones (editable batch)**
 - [x] [#005] **Edit silence-trim group settings after apply**
 - [x] [#006] **Volume / gain control**
 - [x] [#007] **Speed adjustment (4th zone type)**
 - [x] [#008] **Cut preview**
 - [x] [#009] **Timeline shows output length**
 - [x] [#010] **Transcript search (Ctrl+F)**
 - [x] [#011] **Mark In / Out + delete (I / O keys)**
 ---
 - [x] [#042] **Background removal** — MediaPipe Selfie Segmentation + FFmpeg frame processing for person/background separation. Configurable replacement: blur, solid color, or custom image. Applied during export. Falls back to FFmpeg colorkey when MediaPipe unavailable. (2026-05-05)
 ## 🔮 Future — AI-powered editing & resource library
 All AI features use the existing Ollama/OpenAI/Claude provider config — no new auth or setup needed.
 - [ ] [#043] **AI Smart Clean** — one-click chain: filler removal + silence trim + noise reduction + loudness normalization in a single pass. `POST /ai/smart-clean` calls existing services sequentially.
 - [ ] [#044] **AI Transcript Summarization** — generate bullet-point summary from transcript. `POST /ai/summarize`. AIPanel new tab.
 - [ ] [#045] **AI Sentence Rephrase** — right-click word/sentence in transcript → "Rephrase with AI" → see 3 alternatives → click to replace. `POST /ai/rephrase`. TranscriptEditor context menu.
 - [ ] [#046] **AI Smart Speed** — detect slow/low-energy sections → mark as suggested SpeedRange segments. `POST /ai/smart-speed`. Preview in AIPanel.
 - [ ] [#047] **AI Auto-Chapters** — detect topic shifts in transcript → create TimelineMarkers automatically. `POST /ai/chapters`.
 - [ ] [#048] **AI Show Notes** — generate title, description, soundbites, keywords from transcript + markers. `POST /ai/show-notes`. Copy to clipboard or save to file.
 - [ ] [#049] **AI Find Fluff** — AI marks rambles, intros, off-topic chatter for deletion. Extends existing filler detection. `POST /ai/find-fluff`. AIPanel tab showing suggested cut ranges.
 - [ ] [#050] **AI Smooth Cuts** — remove jump cuts between deleted segments using crossfade/blend during re-encode. Export option toggle.
 - [ ] [#051] **AI B-roll** — generate footage from a text prompt to fill visual gaps in the timeline. Uses local SD or API. New "B-roll" section in AIPanel.
 - [ ] [#052] **Smart Layouts** — auto-switch video layout between speakers based on who's talking. Detects active speaker from diarization + volume, applies crop/pad to focus on current speaker during export.
 - [ ] [#053] **Per-track audio levels** — individual gain per speaker track. Extend `GainRange` model with `track_id`, apply per-stream via FFmpeg.
 - [ ] [#054] **Intro/Outro templates** — save segment ranges as reusable templates, apply with one click on export.
 - [ ] [#055] **Built-in free music library** — 5–10 CC0/royalty-free short loops shipped in `frontend/public/resources/music/`. BackgroundMusicPanel gets a "Built-in" tab with play/preview.
 - [ ] [#056] **Stock media browser** — new `MediaLibraryPanel` that browses local `resources/media/` for images, video, audio with thumbnails. Frontend-only via Tauri `readDir`. Drag-to-add for bg removal images, append clips, or music.
 - [ ] [#057] **Sample content downloader** — "Get Sample Video" button on empty state downloads a short public-domain test video + pre-made transcription JSON for trying the app without your own media.
 ---
 ## 🎬 OpenShot-inspired features
 Features adapted from OpenShot Video Editor that would fill capability gaps.
 ### High-value
 - [ ] **Keyframe animations** — animate clip position, scale, opacity, and effects over time. Quadratic bezier, linear, or constant interpolation.
 - [ ] **Unlimited tracks / layers** — overlay images, watermarks, background videos. Any transparency shows through to layer below.
 - [ ] **Video transitions** — crossfade, wipe, dissolve between clips. Over 400 transition types. Auto-create transition when overlapping two clips.
 - [ ] **Title / text overlays** — built-in title editor with 40+ vector templates. SVG-based. Adjust font, color, size, position.
 - [ ] **Digital video effects** — brightness, contrast, hue, greyscale, saturation, chroma key (bluescreen/greenscreen). Drag onto clip, animate properties.
 - [ ] **Chroma key / greenscreen** — wire up existing `background_removal.rs` as a per-clip effect with color picker + tolerance.
 ### Medium-value
 - [ ] **Time-mapping and speed ramps** — animate speed up/down within a clip using keyframes (vs current constant-speed zones).
 - [ ] **Audio split & per-channel mixing** — split audio from video clip, adjust each channel independently.
 - [ ] **Frame-accurate stepping** — arrow keys to step frame by frame through the video.
 - [ ] **Clip trimming on timeline** — drag clip edges to trim in/out points directly on the waveform timeline.
 - [ ] **Snapping** — magnetic snap to markers, clip edges, playhead, zone boundaries.
 ### Lower priority (partial overlap with existing features)
 - [ ] **Cross-platform project files** — save and open `.aive` projects across OSes (already works via JSON).
 - [ ] **Desktop drag-and-drop** — drag video/audio/image files from file manager directly onto the timeline or welcome screen.
 - [ ] **Multi-format export presets** — common format presets in export dialog (YouTube, Twitter/Instagram, Podcast).
 ---
 ## 💡 TalkEdit competitive advantages to lean into
 These aren't features to build — they're things to make more visible in the UI and README:
 - **7-day free trial (no CC required)** — full feature access for 7 days with no credit card. Try everything risk-free.
 - **One-time purchase, no subscription** — pay once, own forever. No recurring fees, no cloud dependency.
 - **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
 - **Local AI models** — Ollama support means no API costs and no data leaving the device.
 - **Word-level precision editing** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor. Double-click any word to correct its text in-place.
 - **Per-segment re-transcription** — select any word range and re-run Whisper on just that segment instead of re-transcribing the entire file.
 - **Auto-ducking background music** — add a second audio track that automatically lowers when speech is detected, no manual keyframing needed.
 - **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.
 ---
 ## 🚫 Explicitly deferred — multi-track timeline
 TalkEdit deliberately avoids becoming a general-purpose NLE. Multi-track compositing (z-order, per-track opacity, keyframe animation, nested sequences) would take months and you'd still lose to DaVinci Resolve (free), CapCut (free), and OpenShot (free).
 **TalkEdit's advantage is that it isn't a timeline editor** — the text-is-the-timeline model makes spoken-word editing drastically faster than dragging razor cuts around.
 What exists and what's planned:
 - One audio track (main) + optional background music track (done)
 - Clip concatenation (append clips end-to-end) (done)
 - Potential future: **B-roll overlay** — a single up/down overlay track for inserting images or cutaway clips at specific timestamps. Useful for talking-head content (screenshots, charts, reaction clips) without a full multi-track system. The zone model already supports this — you add an overlay zone type with a file reference.
 Everything beyond that (picture-in-picture, multi-layer compositing, per-layer keyframing) should stay deferred. If the market demands it later, it's easier to add than to remove complexity.
 ---
 ## ✅ Already Implemented
- [#025] Word-level transcript editing (select, drag, shift-click, delete)
+### Core editing
- [#026] Ctrl+click word → seek timeline to that position
+- [x] [#001] **Cut / Mute sections** — remove or silence segments from output
- [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
+- [x] [#002] **Silence / pause trimmer** — batch detect and remove silent pauses
- [#028] Auto-scroll waveform when playhead goes off-screen
+- [x] [#006] **Volume / gain control** — per-zone and global gain adjustment
- [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
+- [x] [#007] **Speed adjustment** — per-zone playback speed changes (0.25x–4x)
- [#030] AI clip suggestions for social media
+- [x] [#008] **Cut preview** — preview zones before export with configurable padding
- [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
+- [x] [#009] **Timeline shows output length** — adjusted timeline with cut compression
- [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM/WAV, 720p/1080p/4K). WAV available for audio-only inputs.
+- [x] [#011] **Mark In / Out** — I/O keys to set selection range on timeline
- [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
+
- [#034] Speaker diarization
+### Transcript
- [#035] Project save / load (.aive JSON format)
+- [x] [#010] **Transcript search (Ctrl+F)** — find words, navigate matches
- [#036] Undo / redo (100-level history via Zundo)
+- [x] [#012] **Low-confidence word highlighting** — orange dotted underline with confidence %
- [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
+- [x] [#013] **Re-transcribe selection** — re-run Whisper on a selected word range
- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
+- [x] [#015] **Word text correction** — double-click any word to edit text in-place
- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
+- [x] [#016] **Named timeline markers** — colored pins with labels, editable
- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove
+- [x] [#017] **Chapters** — auto-form from markers, copy as YouTube timestamps
- [#058] Help panel with full feature documentation and searchable index
+- [x] [#025] Word-level transcript editing (click, shift+click, drag select)
- [#059] First-run welcome overlay with quick-start guides
+- [x] [#026] Ctrl+click word → seek video to that timestamp
- [#060] Auto-save crash recovery — periodic project snapshots, restore on next launch
+- [x] [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub
- [#061] Error boundary + global error logging to `~/.TalkEdit/logs/`
+- [x] [#028] Auto-scroll waveform when playhead goes off-screen
- [#062] Store input validation — Zustand middleware guards against invalid state
+
- [#063] Trial system — 7-day free trial, sentinel file with integrity check, no CC required
+### AI features
- [#064] License activation — email confirmation flow, offline validation
+- [x] [#029] **AI filler word detection** — find and remove "um", "uh", "like" etc.
- [#065] Model management — view/delete downloaded Whisper and DeepFilterNet models
+- [x] [#030] **AI clip suggestions** — find best 20-60s segments for social media
- [#066] Keyboard cheatsheet — press `?` overlay with close button and active preset indicator
+- [x] [#031] **Noise reduction** — DeepFilterNet or FFmpeg ANLMDN
- [#067] Visual toolbar — grouped icon buttons with section dividers
+- [x] [#034] **Speaker diarization** — label speakers in transcript
- [#068] Backend health check — reconnecting banner when backend is unreachable
+- [x] [#042] **Background removal** — MediaPipe segmentation, blur/color/image replacement
 ### Export
 - [x] [#018] **Audio loudness normalization** — LUFS targets (-14 YouTube, -16 Spotify, -23 Broadcast)
 - [x] [#019] **Background music** — auto-ducking via FFmpeg sidechain compress
 - [x] [#020] **Video zoom / punch-in** — crop, zoom, pan during export
 - [x] [#021] **Multi-clip / append** — concatenate multiple video files
 - [x] [#024] **Export transcript** — plain text or SRT without video
 - [x] [#032] **Export** — fast stream-copy or full re-encode (MP4/MOV/WebM/WAV, 720p–4K)
 - [x] [#033] **Captions** — SRT, VTT, ASS burn-in with font/color/position options
 ### Project & state
 - [x] [#003] **Undo / redo** — 100-level history via Zundo
 - [x] [#004] **Grouped silence-trim zones** — editable batch groups
 - [x] [#005] **Edit silence-trim group** settings after applying
 - [x] [#022] **Clip thumbnail strip** — canvas capture from video, clickable
 - [x] [#035] **Project save / load** — .aive JSON format
 - [x] [#037] **Multi-format input** — MP4, MKV, MOV, AVI, WebM, M4A
 - [x] [#038] **Keyboard shortcuts** — Space, J/K/L, arrows, Ctrl+Z/S/E, ?
 - [x] [#039] **Settings panel** — AI provider config (Ollama, OpenAI, Claude)
 - [x] [#040] **Zone creation on timeline** — draggable edits, Delete to remove
 - [x] [#041] **Customizable hotkeys** — two presets, click-to-remap, conflict detection
 - [x] **[M] Manage Models** — view/delete downloaded Whisper and LLM files
 - [x] **[M] Keyboard cheatsheet** — `?` overlay with close button, preset indicator
 - [x] **[M] Visual toolbar** — grouped buttons with section dividers
 - [x] **[M] Help panel** — full feature documentation in sidebar
 - [x] **[M] First-run welcome overlay** — 3-step quick-start guide
 - [x] **[M] Responsive welcome screen** — animated audio bars, model picker
 - [x] **[M] Error boundary** — catches React crashes, shows fallback + reload
 - [x] **[M] Global error logging** — uncaught errors logged to Rust backend
 - [x] **[M] Store input validation** — NaN rejection, bounds clamping, min zone duration
 - [x] **[M] Runtime assertions** — dev-mode guards in critical paths
 - [x] **[M] Backend health check** — polls every 30s, shows reconnecting banner
 ### Licensing
 - [x] **[L] 7-day free trial** — no credit card required
 - [x] **[L] License activation** — email confirmation step to deter key sharing
 - [x] **[L] Ed25519-signed license keys** — offline verification
 - [x] **[L] Trial integrity** — sentinel file prevents delete-and-reset, XOR checksum deters timestamp editing
 - [x] **[L] canEdit gate** — defaults to locked, only unlocks after verified status
 - [x] **[L] Expired state** — export and loading still work, editing and AI locked
 ### Robustness
 - [x] **[R] Auto-save crash recovery** — every 60s, restore prompt on next launch
 - [x] **[R] Bad project state recovery** — auto-prunes invalid zones on load
 - [x] **[R] Zone/marker deletion confirmations** — prevents accidental removals
 - [x] **[R] Progress bars** — export (determinate), transcription (indeterminate)
 - [x] **[R] Loading spinners** — waveform, AI processing
 - [x] **[R] Error states with retry** — AIPanel, WaveformTimeline
 - [x] **[R] Empty states** — MarkersPanel, AIPanel, ZoneEditor
 - [x] **[R] Canvas zone handles enlarged** — radius 6px, hit area increased
 - [x] **[R] Search match contrast** — thicker rings, higher opacity
 - [x] **[R] Split panes keyboard-accessible** — arrow keys, tabIndex, ARIA
 ### Testing
 - [x] **95 frontend tests** — editorStore (68), licenseStore (22), aiStore (15), assert (4)
 - [x] **12 Rust tests** — licensing (7), models (5)
 - [x] **CI pipeline** — GitHub Actions (Rust: test+clippy, Frontend: tsc+vitest, Python: pytest)
 ---
 ## 🔴 What's Next — highest impact
 - [ ] **[LLM] Bundled Qwen3 LLM** — auto-download on first AI use, no API keys needed. Replace Python `ai_provider.py` with llama.cpp Rust bindings. Two sizes: 4B (2.5GB, 8GB+ RAM) and 1.7B (1GB, 4GB+ RAM)
 - [ ] **[SHORTS] Smart Shorts finder** — scan transcript for self-contained 10–90s segments, ranked by engagement. One-click export as separate clips
 - [ ] **[PAYMENT] Wire checkout** — payment page at talked.it, Stripe → license key generation → delivery email
 - [ ] **[BETA] Beta testers** — give 5–10 podcasters free licenses in exchange for feedback
 - [ ] **[BUILD] Production builds** — `cargo tauri build` for Windows, macOS, Linux
 ---
 ## 🟡 Medium impact — AI features
 - [ ] [#044] **AI Transcript Summarization** — bullet-point summary from transcript
 - [ ] [#045] **AI Sentence Rephrase** — right-click word → see alternatives → replace
 - [ ] [#046] **AI Smart Speed** — detect slow sections → suggest speed adjustments
 - [ ] [#047] **AI Auto-Chapters** — topic detection from transcript → markers
 - [ ] [#048] **AI Show Notes** — title, description, keywords, timestamps
 - [ ] [#049] **AI Find Fluff** — detect rambles, off-topic chatter
 - [ ] [#050] **AI Smooth Cuts** — crossfade between deleted segments
 ---
 ## 🟢 Lower impact — expansion
 - [ ] **Project stitching** — load multiple .aive projects into one export
 - [ ] **Batch export** — multiple projects/cuts in sequence
 - [ ] **Smart chunking** — overlapping chunks for files >2hr
 - [ ] [#014] Alternate transcription backend (VibeVoice-ASR-HF)
 - [ ] [#051] **AI B-roll** — generate footage from text prompt
 - [ ] [#052] **Smart Layouts** — auto-switch speakers in video frame
 - [ ] [#053] **Per-track audio levels** — gain per speaker track
 - [ ] [#054] **Intro/Outro templates** — reusable segment presets
 - [ ] [#055] **Built-in free music library** — CC0 loops shipped with app
 - [ ] [#056] **Stock media browser** — browse local resources/media/
 - [ ] [#057] **Sample content downloader** — test video with pre-made transcript
 ---
 ## 🎬 OpenShot-inspired (long-term)
 - [ ] Keyframe animations — clip position, scale, opacity over time
 - [ ] Video transitions — crossfade, wipe between clips
 - [ ] Title / text overlays — SVG templates, adjustable font/color
 - [ ] Chroma key / greenscreen — per-clip effect
 - [ ] Speed ramps — animate speed within a clip
 - [ ] Frame-accurate stepping — arrow keys frame by frame
 - [ ] Clip trimming on timeline — drag edges to trim
 - [ ] Snapping — magnetic snap to markers and edges
 ---
 ## 💡 Competitive advantages
 - **7-day free trial (no CC)** — full features, no risk
 - **One-time purchase** — $39 Pro, $79 Business, no subscription
 - **100% offline** — no account, no cloud, no data leaves your machine
 - **Local AI** — filler detection, clip suggestions, Smart Clean work offline
 - **Word-level precision** — edit video by deleting words, not razor cuts
 - **Per-segment re-transcription** — fix transcription errors on just the bad part
 - **Auto-ducking background music** — music lowers when speech detected, no keyframing
 - **Works on long files** — virtualized transcript + chunked waveform handles 1hr+
 ---
 ## 🚫 Explicitly deferred
 - Cloud sync / collaboration
 - Voice cloning / TTS
 - Full multi-track NLE (compositing, keyframes, nested sequences)
 - Mobile app
 - Subscription model
 - Image/video generation models
 TalkEdit's advantage is that it isn't a timeline editor — the text-is-the-timeline model makes spoken-word editing drastically faster than dragging razor cuts.
 ---
 ## 📦 Launch checklist
 - [ ] Landing page at talked.it (features, screenshots, pricing, downloads)
 - [ ] Demo video (3–5 min walkthrough)
 - [ ] Product Hunt listing + 50 free licenses
 - [ ] r/podcasting, r/VideoEditing, r/selfhosted posts
 - [ ] Hacker News "Show HN"
 - [ ] GitHub v1.0.0 release with Windows/macOS/Linux binaries
 - [ ] Compare page: TalkEdit vs Descript