# Plan for Building TalkEdit (Whisper.cpp + Tauri) Based on your original idea summary and our discussions, here's a detailed plan to build a standalone, local audio/video editor app. We'll modify CutScript as the base, migrate to **Tauri 2.0** (Rust backend + React frontend) for tiny, dependency-free installers, and use **Whisper.cpp** for fast, accurate transcription. This keeps the scope minimal, focuses on text-based editing for spoken content, and targets podcasters/YouTubers. ## 1. Overview - **Goal**: Create an offline Descript alternative with word-level editing, transcription, and export. Users download one file (~10–20MB), install, and run—no Python, FFmpeg, or external deps. - **Why This Stack**: Tauri bundles everything into a native app; Whisper.cpp (C++ lib) integrates seamlessly with Rust for CPU-efficient transcription. Faster than rebuilding from scratch. - **Target Users**: Creators editing podcasts/videos; free core + Pro upgrades. - **Key Differentiators**: Fully local, text-based editing like Google Docs, smart cuts with fades. ## 2. Tech Stack - **Frontend**: React + Vite + Tailwind CSS + shadcn/ui (from CutScript; minimal changes). - **Backend**: Tauri 2.0 (Rust) – handles file I/O, FFmpeg calls, Whisper.cpp integration. - **Transcription**: Whisper.cpp (via Rust bindings like `whisper-cpp-sys` or `whisper-rs`). - **Audio/Video Processing**: FFmpeg (bundled or called via Rust wrappers like `ffmpeg-next`). - **State Management**: Zustand (from CutScript). - **Packaging**: Tauri's `tauri build` for cross-platform installers. - **AI Features**: Local models only (no APIs); optional Ollama for fillers. ## 3. Step-by-Step Development Plan 1. **Set Up Tauri in CutScript** (1–2 weeks): - Install `tauri-cli` globally. - In CutScript root: `npx tauri init` (choose Rust backend, link to existing React frontend). - Migrate Electron main.js to Tauri's `src/main.rs` (handle window, file dialogs). - Update `tauri.conf.json` for app metadata, bundle settings. 2. **Integrate Whisper.cpp in Rust** (2–3 weeks): - Add `whisper-cpp` as a dependency in `Cargo.toml`. - Create a Rust module for transcription: Load models, process audio, return word-level timestamps. - Replace Python backend calls with Tauri commands (e.g., `invoke` from frontend to Rust for transcription). - Handle model downloads on first run (store in app data dir). 3. **Migrate Audio/Video Logic** (2 weeks): - Port FFmpeg calls to Rust (use `ffmpeg-next` for cutting/export). - Implement segment calculation: From edited transcript, build keep_segments with padding/fades. - Add audio cleaning (noise reduction via bundled tools or Rust libs). 4. **Frontend Polish** (1–2 weeks): - Update UI for Tauri (file dialogs via `tauri-plugin-dialog`). - Refine transcript editor: Better timestamp syncing, manual adjustments. - Add export options (MP4 with subs, audio-only). 5. **Testing & Packaging** (1 week): - Test on Windows/macOS/Linux; ensure Whisper runs offline. - Bundle with `tauri build`; verify no external deps. - Add auto-updater for Pro features. 6. **Launch & Iterate** (Ongoing): - Open-source core on GitHub. - Market on Product Hunt, Reddit; gather feedback. ## 4. MVP Features (Minimal but Useful) Focus on what creators need for spoken content: - **Drag-and-drop import**: Audio/video files; auto-extract audio. - **One-click transcription**: Whisper.cpp with model choice (Fast - less accurate: tiny/base; Slow - more accurate: small/medium/large). - **Text-based editing**: Scrollable transcript; click word → jump to video; select/delete words → auto-cut audio with 150ms fades. - **Smart cleanup**: Remove fillers ("um", pauses >0.8s) via local AI. - **Preview & Export**: Synced preview; export MP4/audio with optional SRT subs. - **Undo/Redo**: Full edit history. No multi-track, voice cloning, or collaboration—keep it simple. ## 5. Monetization Model - **Free Forever**: Core editing/transcription (unlimited local use). - **Pro License** ($29–49 one-time): Batch processing, high-quality voices (if adding TTS), custom presets, priority support. - **Optional Add-Ons**: Cloud credits for long videos (rarely needed). ## 6. Timeline & Milestones - **Weeks 1–4**: Tauri setup + Whisper integration. - **Weeks 5–6**: Audio logic migration + frontend tweaks. - **Weeks 7–8**: Testing, packaging, launch prep. - **Total**: 6–10 weeks to MVP (solo dev + AI). ## 7. Risks & Tips - **Risks**: Whisper.cpp compilation issues; Rust learning curve if new to it. - **Tips**: Start with small models (base ~70MB); test timestamp accuracy early. Use Tauri's docs for migration. If stuck, fall back to bundling Python for Whisper (but avoid for true standalone). - **Resources**: Tauri docs, Whisper.cpp GitHub, Rust audio crates. /home/dillon/_code/audio_editor/plan.md