4.8 KiB
4.8 KiB
Plan for Building TalkEdit (Whisper.cpp + Tauri)
Based on your original idea summary and our discussions, here's a detailed plan to build a standalone, local audio/video editor app. We'll modify CutScript as the base, migrate to Tauri 2.0 (Rust backend + React frontend) for tiny, dependency-free installers, and use Whisper.cpp for fast, accurate transcription. This keeps the scope minimal, focuses on text-based editing for spoken content, and targets podcasters/YouTubers.
1. Overview
- Goal: Create an offline Descript alternative with word-level editing, transcription, and export. Users download one file (~10–20MB), install, and run—no Python, FFmpeg, or external deps.
- Why This Stack: Tauri bundles everything into a native app; Whisper.cpp (C++ lib) integrates seamlessly with Rust for CPU-efficient transcription. Faster than rebuilding from scratch.
- Target Users: Creators editing podcasts/videos; free core + Pro upgrades.
- Key Differentiators: Fully local, text-based editing like Google Docs, smart cuts with fades.
2. Tech Stack
- Frontend: React + Vite + Tailwind CSS + shadcn/ui (from CutScript; minimal changes).
- Backend: Tauri 2.0 (Rust) – handles file I/O, FFmpeg calls, Whisper.cpp integration.
- Transcription: Whisper.cpp (via Rust bindings like
whisper-cpp-sysorwhisper-rs). - Audio/Video Processing: FFmpeg (bundled or called via Rust wrappers like
ffmpeg-next). - State Management: Zustand (from CutScript).
- Packaging: Tauri's
tauri buildfor cross-platform installers. - AI Features: Local models only (no APIs); optional Ollama for fillers.
3. Step-by-Step Development Plan
-
Set Up Tauri in CutScript (1–2 weeks):
- Install
tauri-cliglobally. - In CutScript root:
npx tauri init(choose Rust backend, link to existing React frontend). - Migrate Electron main.js to Tauri's
src/main.rs(handle window, file dialogs). - Update
tauri.conf.jsonfor app metadata, bundle settings.
- Install
-
Integrate Whisper.cpp in Rust (2–3 weeks):
- Add
whisper-cppas a dependency inCargo.toml. - Create a Rust module for transcription: Load models, process audio, return word-level timestamps.
- Replace Python backend calls with Tauri commands (e.g.,
invokefrom frontend to Rust for transcription). - Handle model downloads on first run (store in app data dir).
- Add
-
Migrate Audio/Video Logic (2 weeks):
- Port FFmpeg calls to Rust (use
ffmpeg-nextfor cutting/export). - Implement segment calculation: From edited transcript, build keep_segments with padding/fades.
- Add audio cleaning (noise reduction via bundled tools or Rust libs).
- Port FFmpeg calls to Rust (use
-
Frontend Polish (1–2 weeks):
- Update UI for Tauri (file dialogs via
tauri-plugin-dialog). - Refine transcript editor: Better timestamp syncing, manual adjustments.
- Add export options (MP4 with subs, audio-only).
- Update UI for Tauri (file dialogs via
-
Testing & Packaging (1 week):
- Test on Windows/macOS/Linux; ensure Whisper runs offline.
- Bundle with
tauri build; verify no external deps. - Add auto-updater for Pro features.
-
Launch & Iterate (Ongoing):
- Open-source core on GitHub.
- Market on Product Hunt, Reddit; gather feedback.
4. MVP Features (Minimal but Useful)
Focus on what creators need for spoken content:
- Drag-and-drop import: Audio/video files; auto-extract audio.
- One-click transcription: Whisper.cpp with model choice (Fast - less accurate: tiny/base; Slow - more accurate: small/medium/large).
- Text-based editing: Scrollable transcript; click word → jump to video; select/delete words → auto-cut audio with 150ms fades.
- Smart cleanup: Remove fillers ("um", pauses >0.8s) via local AI.
- Preview & Export: Synced preview; export MP4/audio with optional SRT subs.
- Undo/Redo: Full edit history.
No multi-track, voice cloning, or collaboration—keep it simple.
5. Monetization Model
- Free Forever: Core editing/transcription (unlimited local use).
- Pro License ($29–49 one-time): Batch processing, high-quality voices (if adding TTS), custom presets, priority support.
- Optional Add-Ons: Cloud credits for long videos (rarely needed).
6. Timeline & Milestones
- Weeks 1–4: Tauri setup + Whisper integration.
- Weeks 5–6: Audio logic migration + frontend tweaks.
- Weeks 7–8: Testing, packaging, launch prep.
- Total: 6–10 weeks to MVP (solo dev + AI).
7. Risks & Tips
- Risks: Whisper.cpp compilation issues; Rust learning curve if new to it.
- Tips: Start with small models (base ~70MB); test timestamp accuracy early. Use Tauri's docs for migration. If stuck, fall back to bundling Python for Whisper (but avoid for true standalone).
- Resources: Tauri docs, Whisper.cpp GitHub, Rust audio crates. /home/dillon/_code/audio_editor/plan.md