Files
TalkEdit/plan.md

81 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Plan for Building TalkEdit (Whisper.cpp + Tauri)
Based on your original idea summary and our discussions, here's a detailed plan to build a standalone, local audio/video editor app. We'll modify CutScript as the base, migrate to **Tauri 2.0** (Rust backend + React frontend) for tiny, dependency-free installers, and use **Whisper.cpp** for fast, accurate transcription. This keeps the scope minimal, focuses on text-based editing for spoken content, and targets podcasters/YouTubers.
## 1. Overview
- **Goal**: Create an offline Descript alternative with word-level editing, transcription, and export. Users download one file (~1020MB), install, and run—no Python, FFmpeg, or external deps.
- **Why This Stack**: Tauri bundles everything into a native app; Whisper.cpp (C++ lib) integrates seamlessly with Rust for CPU-efficient transcription. Faster than rebuilding from scratch.
- **Target Users**: Creators editing podcasts/videos; free core + Pro upgrades.
- **Key Differentiators**: Fully local, text-based editing like Google Docs, smart cuts with fades.
## 2. Tech Stack
- **Frontend**: React + Vite + Tailwind CSS + shadcn/ui (from CutScript; minimal changes).
- **Backend**: Tauri 2.0 (Rust) handles file I/O, FFmpeg calls, Whisper.cpp integration.
- **Transcription**: Whisper.cpp (via Rust bindings like `whisper-cpp-sys` or `whisper-rs`).
- **Audio/Video Processing**: FFmpeg (bundled or called via Rust wrappers like `ffmpeg-next`).
- **State Management**: Zustand (from CutScript).
- **Packaging**: Tauri's `tauri build` for cross-platform installers.
- **AI Features**: Local models only (no APIs); optional Ollama for fillers.
## 3. Step-by-Step Development Plan
1. **Set Up Tauri in CutScript** (12 weeks):
- Install `tauri-cli` globally.
- In CutScript root: `npx tauri init` (choose Rust backend, link to existing React frontend).
- Migrate Electron main.js to Tauri's `src/main.rs` (handle window, file dialogs).
- Update `tauri.conf.json` for app metadata, bundle settings.
2. **Integrate Whisper.cpp in Rust** (23 weeks):
- Add `whisper-cpp` as a dependency in `Cargo.toml`.
- Create a Rust module for transcription: Load models, process audio, return word-level timestamps.
- Replace Python backend calls with Tauri commands (e.g., `invoke` from frontend to Rust for transcription).
- Handle model downloads on first run (store in app data dir).
3. **Migrate Audio/Video Logic** (2 weeks):
- Port FFmpeg calls to Rust (use `ffmpeg-next` for cutting/export).
- Implement segment calculation: From edited transcript, build keep_segments with padding/fades.
- Add audio cleaning (noise reduction via bundled tools or Rust libs).
4. **Frontend Polish** (12 weeks):
- Update UI for Tauri (file dialogs via `tauri-plugin-dialog`).
- Refine transcript editor: Better timestamp syncing, manual adjustments.
- Add export options (MP4 with subs, audio-only).
5. **Testing & Packaging** (1 week):
- Test on Windows/macOS/Linux; ensure Whisper runs offline.
- Bundle with `tauri build`; verify no external deps.
- Add auto-updater for Pro features.
6. **Launch & Iterate** (Ongoing):
- Open-source core on GitHub.
- Market on Product Hunt, Reddit; gather feedback.
## 4. MVP Features (Minimal but Useful)
Focus on what creators need for spoken content:
- **Drag-and-drop import**: Audio/video files; auto-extract audio.
- **One-click transcription**: Whisper.cpp with model choice (Fast - less accurate: tiny/base; Slow - more accurate: small/medium/large).
- **Text-based editing**: Scrollable transcript; click word → jump to video; select/delete words → auto-cut audio with 150ms fades.
- **Smart cleanup**: Remove fillers ("um", pauses >0.8s) via local AI.
- **Preview & Export**: Synced preview; export MP4/audio with optional SRT subs.
- **Undo/Redo**: Full edit history.
No multi-track, voice cloning, or collaboration—keep it simple.
## 4. Notes
- Consider adding Parakeet TDT as a transcription option in the future for users who want alternatives to Whisper.
## 5. Monetization Model
- **Free Forever**: Core editing/transcription (unlimited local use).
- **Pro License** ($2949 one-time): Batch processing, high-quality voices (if adding TTS), custom presets, priority support.
- **Optional Add-Ons**: Cloud credits for long videos (rarely needed).
## 6. Timeline & Milestones
- **Weeks 14**: Tauri setup + Whisper integration.
- **Weeks 56**: Audio logic migration + frontend tweaks.
- **Weeks 78**: Testing, packaging, launch prep.
- **Total**: 610 weeks to MVP (solo dev + AI).
## 7. Risks & Tips
- **Risks**: Whisper.cpp compilation issues; Rust learning curve if new to it.
- **Tips**: Start with small models (base ~70MB); test timestamp accuracy early. Use Tauri's docs for migration. If stuck, fall back to bundling Python for Whisper (but avoid for true standalone).
- **Resources**: Tauri docs, Whisper.cpp GitHub, Rust audio crates.</content>
<parameter name="filePath">/home/dillon/_code/audio_editor/plan.md