Files
TalkEdit/plan.md

4.9 KiB
Raw Blame History

Plan for Building TalkEdit (Whisper.cpp + Tauri)

Based on your original idea summary and our discussions, here's a detailed plan to build a standalone, local audio/video editor app. We'll modify CutScript as the base, migrate to Tauri 2.0 (Rust backend + React frontend) for tiny, dependency-free installers, and use Whisper.cpp for fast, accurate transcription. This keeps the scope minimal, focuses on text-based editing for spoken content, and targets podcasters/YouTubers.

1. Overview

  • Goal: Create an offline Descript alternative with word-level editing, transcription, and export. Users download one file (~1020MB), install, and run—no Python, FFmpeg, or external deps.
  • Why This Stack: Tauri bundles everything into a native app; Whisper.cpp (C++ lib) integrates seamlessly with Rust for CPU-efficient transcription. Faster than rebuilding from scratch.
  • Target Users: Creators editing podcasts/videos; free core + Pro upgrades.
  • Key Differentiators: Fully local, text-based editing like Google Docs, smart cuts with fades.

2. Tech Stack

  • Frontend: React + Vite + Tailwind CSS + shadcn/ui (from CutScript; minimal changes).
  • Backend: Tauri 2.0 (Rust) handles file I/O, FFmpeg calls, Whisper.cpp integration.
  • Transcription: Whisper.cpp (via Rust bindings like whisper-cpp-sys or whisper-rs).
  • Audio/Video Processing: FFmpeg (bundled or called via Rust wrappers like ffmpeg-next).
  • State Management: Zustand (from CutScript).
  • Packaging: Tauri's tauri build for cross-platform installers.
  • AI Features: Local models only (no APIs); optional Ollama for fillers.

3. Step-by-Step Development Plan

  1. Set Up Tauri in CutScript (12 weeks):

    • Install tauri-cli globally.
    • In CutScript root: npx tauri init (choose Rust backend, link to existing React frontend).
    • Migrate Electron main.js to Tauri's src/main.rs (handle window, file dialogs).
    • Update tauri.conf.json for app metadata, bundle settings.
  2. Integrate Whisper.cpp in Rust (23 weeks):

    • Add whisper-cpp as a dependency in Cargo.toml.
    • Create a Rust module for transcription: Load models, process audio, return word-level timestamps.
    • Replace Python backend calls with Tauri commands (e.g., invoke from frontend to Rust for transcription).
    • Handle model downloads on first run (store in app data dir).
  3. Migrate Audio/Video Logic (2 weeks):

    • Port FFmpeg calls to Rust (use ffmpeg-next for cutting/export).
    • Implement segment calculation: From edited transcript, build keep_segments with padding/fades.
    • Add audio cleaning (noise reduction via bundled tools or Rust libs).
  4. Frontend Polish (12 weeks):

    • Update UI for Tauri (file dialogs via tauri-plugin-dialog).
    • Refine transcript editor: Better timestamp syncing, manual adjustments.
    • Add export options (MP4 with subs, audio-only).
  5. Testing & Packaging (1 week):

    • Test on Windows/macOS/Linux; ensure Whisper runs offline.
    • Bundle with tauri build; verify no external deps.
    • Add auto-updater for Pro features.
  6. Launch & Iterate (Ongoing):

    • Open-source core on GitHub.
    • Market on Product Hunt, Reddit; gather feedback.

4. MVP Features (Minimal but Useful)

Focus on what creators need for spoken content:

  • Drag-and-drop import: Audio/video files; auto-extract audio.
  • One-click transcription: Whisper.cpp with model choice (Fast - less accurate: tiny/base; Slow - more accurate: small/medium/large).
  • Text-based editing: Scrollable transcript; click word → jump to video; select/delete words → auto-cut audio with 150ms fades.
  • Smart cleanup: Remove fillers ("um", pauses >0.8s) via local AI.
  • Preview & Export: Synced preview; export MP4/audio with optional SRT subs.
  • Undo/Redo: Full edit history.

No multi-track, voice cloning, or collaboration—keep it simple.

4. Notes

  • Consider adding Parakeet TDT as a transcription option in the future for users who want alternatives to Whisper.

5. Monetization Model

  • Free Forever: Core editing/transcription (unlimited local use).
  • Pro License ($2949 one-time): Batch processing, high-quality voices (if adding TTS), custom presets, priority support.
  • Optional Add-Ons: Cloud credits for long videos (rarely needed).

6. Timeline & Milestones

  • Weeks 14: Tauri setup + Whisper integration.
  • Weeks 56: Audio logic migration + frontend tweaks.
  • Weeks 78: Testing, packaging, launch prep.
  • Total: 610 weeks to MVP (solo dev + AI).

7. Risks & Tips

  • Risks: Whisper.cpp compilation issues; Rust learning curve if new to it.
  • Tips: Start with small models (base ~70MB); test timestamp accuracy early. Use Tauri's docs for migration. If stuck, fall back to bundling Python for Whisper (but avoid for true standalone).
  • Resources: Tauri docs, Whisper.cpp GitHub, Rust audio crates. /home/dillon/_code/audio_editor/plan.md