Files
TalkEdit/FEATURES.md
2026-05-05 23:31:18 -06:00

8.6 KiB
Raw Blame History

TalkEdit — Feature Roadmap

Features are grouped by priority. Check off items as they are implemented.


🔴 Highest Impact Next — Conversion and retention

  • [#015] Word text correction — double-click any word to edit its text in-place. Preserves timing and confidence. Pure frontend state change. (2026-05-04)

  • [#013] Re-transcribe selection — select any word range in the transcript and click "Re-transcribe" to re-run Whisper on just that segment. Backend extracts audio via FFmpeg, transcribes with offset-adjusted timestamps. (2026-05-04)

  • [#012] Low-confidence word highlighting — words with confidence < 0.6 (configurable in Settings) get an orange dotted underline. Hover shows exact confidence %. (2026-05-04)

  • [#018] Audio normalization / loudness targeting — Integrated checkbox in Export panel with LUFS target selector (-14 YouTube, -16 Spotify, -23 Broadcast). Applied during export via FFmpeg loudnorm in the audio filter chain. No intermediate files. (2026-05-04)

  • [#024] Export to transcript text / SRT only — "Export Transcript Only" section in Export panel with format selector (plain text or SRT). Uses POST /export/transcript backend endpoint. Respects word cuts. (2026-05-04)

  • [#023] Batch silence removal — full-file scan + remove all pauses above threshold in one click. Implemented by SilenceTrimmerPanel + POST /audio/detect-silence (FFmpeg silencedetect).


🟡 Medium Impact — Workflow completeness

  • [#016] Named timeline markers — colored marker pins on the waveform canvas. Add at current playback position with label/color picker in Markers panel. Editable labels, deletable. Persisted in project file. (2026-05-04)

  • [#017] Chapters — sorted markers auto-form chapters. "Copy as YouTube timestamps" button exports MM:SS Label format to clipboard. (2026-05-04)

  • [#041] Customizable hotkeys / keymap editor — two presets (Standard: J/K/L/I/O/arrows; Left-hand: Q/W/E/A/S/D/F). Settings panel shows all bindings with click-to-remap, conflict detection, per-key reset to default. Cheatsheet (press ?) shows current bindings. (2026-05-04)

  • [#022] Clip thumbnail strip — frontend-side canvas capture from the <video> element. Toggle "Thumbnails" button above waveform. Extracts frames at 10s intervals, clickable to seek. Zero backend dependency. (2026-05-04)


🟢 Lower Impact — Expansion and advanced scope

  • [#020] Video zoom / punch-in — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: FFmpeg crop/scale post-process. Frontend: sliders in Export dialog. (2026-05-05)

  • [#021] Multi-clip / append — load additional video clips via Append Clip panel and concatenate during export. Uses FFmpeg concat demuxer. (2026-05-05)

  • [#019] Background music track — a second audio track for background music with volume ducking. Uses FFmpeg amix + sidechaincompress for auto-ducking. Configurable in Background Music panel. (2026-05-05)

  • [#014] Optional VibeVoice-ASR-HF transcription backend (future) — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.


Completed high-impact foundation

  • [#001] Cut / Mute sections
  • [#002] Silence / pause trimmer
  • [#003] Operation-level undo for batch actions
  • [#004] Grouped silence-trim zones (editable batch)
  • [#005] Edit silence-trim group settings after apply
  • [#006] Volume / gain control
  • [#007] Speed adjustment (4th zone type)
  • [#008] Cut preview
  • [#009] Timeline shows output length
  • [#010] Transcript search (Ctrl+F)
  • [#011] Mark In / Out + delete (I / O keys)

  • [#042] Background removal — MediaPipe Selfie Segmentation + FFmpeg frame processing for person/background separation. Configurable replacement: blur, solid color, or custom image. Applied during export. Falls back to FFmpeg colorkey when MediaPipe unavailable. (2026-05-05)

🔮 Future — AI-powered editing & resource library

All AI features use the existing Ollama/OpenAI/Claude provider config — no new auth or setup needed.

  • [#043] AI Smart Clean — one-click chain: filler removal + silence trim + noise reduction + loudness normalization in a single pass. POST /ai/smart-clean calls existing services sequentially.

  • [#044] AI Transcript Summarization — generate bullet-point summary from transcript. POST /ai/summarize. AIPanel new tab.

  • [#045] AI Sentence Rephrase — right-click word/sentence in transcript → "Rephrase with AI" → see 3 alternatives → click to replace. POST /ai/rephrase. TranscriptEditor context menu.

  • [#046] AI Smart Speed — detect slow/low-energy sections → mark as suggested SpeedRange segments. POST /ai/smart-speed. Preview in AIPanel.

  • [#047] AI Auto-Chapters — detect topic shifts in transcript → create TimelineMarkers automatically. POST /ai/chapters.

  • [#048] AI Show Notes — generate title, description, soundbites, keywords from transcript + markers. POST /ai/show-notes. Copy to clipboard or save to file.

  • [#049] AI Find Fluff — AI marks rambles, intros, off-topic chatter for deletion. Extends existing filler detection. POST /ai/find-fluff. AIPanel tab showing suggested cut ranges.

  • [#050] AI Smooth Cuts — remove jump cuts between deleted segments using crossfade/blend during re-encode. Export option toggle.

  • [#051] AI B-roll — generate footage from a text prompt to fill visual gaps in the timeline. Uses local SD or API. New "B-roll" section in AIPanel.

  • [#052] Smart Layouts — auto-switch video layout between speakers based on who's talking. Detects active speaker from diarization + volume, applies crop/pad to focus on current speaker during export.

  • [#053] Per-track audio levels — individual gain per speaker track. Extend GainRange model with track_id, apply per-stream via FFmpeg.

  • [#054] Intro/Outro templates — save segment ranges as reusable templates, apply with one click on export.

  • [#055] Built-in free music library — 510 CC0/royalty-free short loops shipped in frontend/public/resources/music/. BackgroundMusicPanel gets a "Built-in" tab with play/preview.

  • [#056] Stock media browser — new MediaLibraryPanel that browses local resources/media/ for images, video, audio with thumbnails. Frontend-only via Tauri readDir. Drag-to-add for bg removal images, append clips, or music.

  • [#057] Sample content downloader — "Get Sample Video" button on empty state downloads a short public-domain test video + pre-made transcription JSON for trying the app without your own media.


💡 TalkEdit competitive advantages to lean into

These aren't features to build — they're things to make more visible in the UI and README:

  • 100% offline / no account required — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
  • Local AI models — Ollama support means no API costs and no data leaving the device.
  • Word-level precision — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
  • Works on long files — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.

Already Implemented

  • [#025] Word-level transcript editing (select, drag, shift-click, delete)
  • [#026] Ctrl+click word → seek timeline to that position
  • [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
  • [#028] Auto-scroll waveform when playhead goes off-screen
  • [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
  • [#030] AI clip suggestions for social media
  • [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
  • [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM/WAV, 720p/1080p/4K). WAV available for audio-only inputs.
  • [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
  • [#034] Speaker diarization
  • [#035] Project save / load (.aive JSON format)
  • [#036] Undo / redo (100-level history via Zundo)
  • [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
  • [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
  • [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
  • [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove