Files
TalkEdit/FEATURES.md
2026-05-04 23:54:14 -06:00

5.6 KiB

TalkEdit — Feature Roadmap

Features are grouped by priority. Check off items as they are implemented.


🔴 Highest Impact Next — Conversion and retention

  • [#015] Word text correction — double-click any word to edit its text in-place. Preserves timing and confidence. Pure frontend state change. (2026-05-04)

  • [#013] Re-transcribe selection — select any word range in the transcript and click "Re-transcribe" to re-run Whisper on just that segment. Backend extracts audio via FFmpeg, transcribes with offset-adjusted timestamps. (2026-05-04)

  • [#012] Low-confidence word highlighting — words with confidence < 0.6 (configurable in Settings) get an orange dotted underline. Hover shows exact confidence %. (2026-05-04)

  • [#018] Audio normalization / loudness targeting — Integrated checkbox in Export panel with LUFS target selector (-14 YouTube, -16 Spotify, -23 Broadcast). Applied during export via FFmpeg loudnorm in the audio filter chain. No intermediate files. (2026-05-04)

  • [#024] Export to transcript text / SRT only — "Export Transcript Only" section in Export panel with format selector (plain text or SRT). Uses POST /export/transcript backend endpoint. Respects word cuts. (2026-05-04)

  • [#023] Batch silence removal — full-file scan + remove all pauses above threshold in one click. Implemented by SilenceTrimmerPanel + POST /audio/detect-silence (FFmpeg silencedetect).


🟡 Medium Impact — Workflow completeness

  • [#016] Named timeline markers — drop named marker pins on the waveform (like Resolve markers). Store as { id, time, label, color } in the project. Rendered as colored triangles on the timeline canvas.

  • [#017] Chapters — group markers into named chapter ranges. Useful for podcasts and lectures. Exportable as YouTube chapter timestamps in the description.

  • [#041] Customizable hotkeys / keymap editor (left-hand focused) — allow users to view, remap, and reset keyboard shortcuts (transport, edit, save/export, zone tools), with a default preset optimized for left-hand reach (Q/W/E/R/A/S/D/F/Z/X/C/V + modifiers). Include conflict detection, an alternate standard preset, and one-click "restore defaults".

  • [#022] Clip thumbnail strip — video frame thumbnails along the timeline so users can navigate visually, not only by waveform. Backend: ffmpeg thumbnail extraction at regular intervals.


🟢 Lower Impact — Expansion and advanced scope

  • [#020] Video zoom / punch-in — scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: ffmpeg -vf crop/scale/zoompan.

  • [#021] Multi-clip / append — load a second video and append it to the timeline. Even without a full multi-track timeline, "append clip" is a heavily used workflow.

  • [#019] Background music track — a second audio track for background music with volume ducking. Major gap in Descript that TalkEdit could own. Backend: ffmpeg amix + asendcmd for auto-ducking.

  • [#014] Optional VibeVoice-ASR-HF transcription backend (future) — evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.


Completed high-impact foundation

  • [#001] Cut / Mute sections
  • [#002] Silence / pause trimmer
  • [#003] Operation-level undo for batch actions
  • [#004] Grouped silence-trim zones (editable batch)
  • [#005] Edit silence-trim group settings after apply
  • [#006] Volume / gain control
  • [#007] Speed adjustment (4th zone type)
  • [#008] Cut preview
  • [#009] Timeline shows output length
  • [#010] Transcript search (Ctrl+F)
  • [#011] Mark In / Out + delete (I / O keys)

💡 TalkEdit competitive advantages to lean into

These aren't features to build — they're things to make more visible in the UI and README:

  • 100% offline / no account required — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
  • Local AI models — Ollama support means no API costs and no data leaving the device.
  • Word-level precision — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
  • Works on long files — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.

Already Implemented

  • [#025] Word-level transcript editing (select, drag, shift-click, delete)
  • [#026] Ctrl+click word → seek timeline to that position
  • [#027] Waveform timeline with zoom (Ctrl+scroll), scroll, drag-to-scrub playhead
  • [#028] Auto-scroll waveform when playhead goes off-screen
  • [#029] AI filler word detection and removal (Ollama / OpenAI / Claude)
  • [#030] AI clip suggestions for social media
  • [#031] Noise reduction (DeepFilterNet or FFmpeg ANLMDN)
  • [#032] Export: fast stream-copy or full reencode (MP4/MOV/WebM/WAV, 720p/1080p/4K). WAV available for audio-only inputs.
  • [#033] Captions: SRT, VTT, ASS burn-in with font/color/position options
  • [#034] Speaker diarization
  • [#035] Project save / load (.aive JSON format)
  • [#036] Undo / redo (100-level history via Zundo)
  • [#037] Multi-format input (MP4, MKV, MOV, AVI, WebM, M4A)
  • [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
  • [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
  • [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove