Files
TalkEdit/TECH_FEATURES.md
2026-05-06 16:15:38 -06:00

4.0 KiB
Raw Blame History

TalkEdit — Tech Stack, Tools, and Features

This document summarizes the chosen technology, tooling, the full feature set, recommended additions, and items on the back burner.

Overview

  • Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.

Tech Stack

  • Frontend: React 19 + Vite + TypeScript + Tailwind CSS + Zustand (with zundo undo/redo) + Virtuoso (virtualized transcript)
  • Backend: Tauri 2.0 (Rust) for file I/O, licensing, licensing crypto (Ed25519), model management, error logging
  • Transcription: Python faster-whisper with WhisperX for word-level alignment. Models downloaded on demand.
  • Audio/Video Processing: FFmpeg invoked from Rust via Python scripts (video_editor.py, audio_cleaner.py, caption_generator.py)
  • AI: Ollama, OpenAI, Claude through Python ai_provider.py. Bundled Qwen3 LLM planned.
  • State: Zustand (in-frontend store) + zundo middleware for undo/redo history
  • Packaging: Tauri tauri build for cross-platform installers

Developer Tools

  • Rust toolchain (cargo, rustc)
  • Node.js + npm for frontend
  • Python 3.11+ (faster-whisper, WhisperX, AI providers)
  • FFmpeg binaries (platform-specific; bundled or downloaded at install)
  • Build/test: Tauri CLI, Vite dev server
  • Testing: Vitest (frontend), cargo test (Rust), pytest (Python)
  • CI: GitHub Actions (Rust clippy/test, Frontend tsc/vitest, Python pytest)

Implemented Features

  • 1. Media import via file dialog (audio/video auto audio-extract)
  • 2. One-click local transcription with model selector (tiny/base → larger models) and model-size chooser
  • 3. Scrollable, Google-Doc-style transcript editor (Virtuoso virtualized)
    • Click word → seek video/audio
    • Select words → cut corresponding media segment (smart 150250ms fades)
  • 4. Smart Cleanup
    • Filler word removal (configurable list per-project)
    • Silence trimming
  • 5. Audio Polish chain (FFmpeg): normalize, compression, noise reduction
  • 6. Preview with synced playback, undo/redo (zundo), project save/load
  • 7. Export MP4/audio with SRT/VTT/ASS captions (speaker-labeled)
  • 8. Speaker diarization
  • 9. Custom filler lists per-project
  • 10. Background music with auto-ducking
  • 11. Append clips (concatenation)
  • 12. Settings: AI provider config (Ollama, OpenAI, Claude)
  • 13. Keyboard shortcuts with custom remapping
  • 14. Help panel + cheatsheet
  • 15. 7-day licensing with Ed25519-signed license keys
  • Local GPU/CPU detection & recommended model/settings UI
  • Per-project incremental transcription: re-run only edited segments
  • "Preview cleaning" dry-run that highlights candidate removals before applying
  • Export size/time estimator and suggested export presets
  • Accessibility export presets (podcast vs YouTube presets)
  • Bundled Qwen3 LLM for offline AI features

Remove / Defer (Back Burner)

These broaden scope or add legal/privacy surface — defer for now.

  • Voice cloning / TTS: DEFER
  • Multi-track, full timeline NLE features: DEFER
  • Real-time collaboration / cloud sync: DEFER
  • Built-in cloud processing by default: DEFER (make optional add-on later)

Risks & Mitigations

  • Large model sizes: don't bundle large models; download on-demand and document storage location.
  • Timestamp accuracy: WhisperX word-level alignment + manual per-segment re-run available.
  • FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.

Prioritized Quick Wins

  1. Per-project incremental transcription
  2. "Preview cleaning" dry-run UI
  3. Export presets (podcast vs YouTube)

Next Steps for Implementation

  • Bundle Qwen3 LLM for offline AI processing.
  • Implement incremental transcription to speed up re-editing workflows.
  • Add export presets and size estimation.
  • Improve GPU/CPU detection and model recommendations.

Generated to capture tech, tools, implemented features, and the recommended add/remove/defer list.