81 lines
4.9 KiB
Markdown
81 lines
4.9 KiB
Markdown
# Plan for Building TalkEdit (Whisper.cpp + Tauri)
|
||
|
||
Based on your original idea summary and our discussions, here's a detailed plan to build a standalone, local audio/video editor app. We'll continue evolving the existing TalkEdit codebase on **Tauri 2.0** (Rust backend + React frontend) for tiny, dependency-free installers, and use **Whisper.cpp** for fast, accurate transcription. This keeps the scope minimal, focuses on text-based editing for spoken content, and targets podcasters/YouTubers.
|
||
|
||
## 1. Overview
|
||
- **Goal**: Create an offline Descript alternative with word-level editing, transcription, and export. Users download one file (~10–20MB), install, and run—no Python, FFmpeg, or external deps.
|
||
- **Why This Stack**: Tauri bundles everything into a native app; Whisper.cpp (C++ lib) integrates seamlessly with Rust for CPU-efficient transcription. Faster than rebuilding from scratch.
|
||
- **Target Users**: Creators editing podcasts/videos; free core + Pro upgrades.
|
||
- **Key Differentiators**: Fully local, text-based editing like Google Docs, smart cuts with fades.
|
||
|
||
## 2. Tech Stack
|
||
- **Frontend**: React + Vite + Tailwind CSS + shadcn/ui.
|
||
- **Backend**: Tauri 2.0 (Rust) – handles file I/O, FFmpeg calls, Whisper.cpp integration.
|
||
- **Transcription**: Whisper.cpp (via Rust bindings like `whisper-cpp-sys` or `whisper-rs`).
|
||
- **Audio/Video Processing**: FFmpeg (bundled or called via Rust wrappers like `ffmpeg-next`).
|
||
- **State Management**: Zustand.
|
||
- **Packaging**: Tauri's `tauri build` for cross-platform installers.
|
||
- **AI Features**: Local models only (no APIs); optional Ollama for fillers.
|
||
|
||
## 3. Step-by-Step Development Plan
|
||
1. **Set Up Tauri in TalkEdit** (1–2 weeks):
|
||
- Install `tauri-cli` globally.
|
||
- In TalkEdit root: `npx tauri init` (choose Rust backend, link to existing React frontend).
|
||
- Implement Tauri `src/main.rs` host flow (window lifecycle, file dialogs, backend coordination).
|
||
- Update `tauri.conf.json` for app metadata, bundle settings.
|
||
|
||
2. **Integrate Whisper.cpp in Rust** (2–3 weeks):
|
||
- Add `whisper-cpp` as a dependency in `Cargo.toml`.
|
||
- Create a Rust module for transcription: Load models, process audio, return word-level timestamps.
|
||
- Replace Python backend calls with Tauri commands (e.g., `invoke` from frontend to Rust for transcription).
|
||
- Handle model downloads on first run (store in app data dir).
|
||
|
||
3. **Migrate Audio/Video Logic** (2 weeks):
|
||
- Port FFmpeg calls to Rust (use `ffmpeg-next` for cutting/export).
|
||
- Implement segment calculation: From edited transcript, build keep_segments with padding/fades.
|
||
- Add audio cleaning (noise reduction via bundled tools or Rust libs).
|
||
|
||
4. **Frontend Polish** (1–2 weeks):
|
||
- Update UI for Tauri (file dialogs via `tauri-plugin-dialog`).
|
||
- Refine transcript editor: Better timestamp syncing, manual adjustments.
|
||
- Add export options (MP4 with subs, audio-only).
|
||
|
||
5. **Testing & Packaging** (1 week):
|
||
- Test on Windows/macOS/Linux; ensure Whisper runs offline.
|
||
- Bundle with `tauri build`; verify no external deps.
|
||
- Add auto-updater for Pro features.
|
||
|
||
6. **Launch & Iterate** (Ongoing):
|
||
- Open-source core on GitHub.
|
||
- Market on Product Hunt, Reddit; gather feedback.
|
||
|
||
## 4. MVP Features (Minimal but Useful)
|
||
Focus on what creators need for spoken content:
|
||
- **Drag-and-drop import**: Audio/video files; auto-extract audio.
|
||
- **One-click transcription**: Whisper.cpp with model choice (Fast - less accurate: tiny/base; Slow - more accurate: small/medium/large).
|
||
- **Text-based editing**: Scrollable transcript; click word → jump to video; select/delete words → auto-cut audio with 150ms fades.
|
||
- **Smart cleanup**: Remove fillers ("um", pauses >0.8s) via local AI.
|
||
- **Preview & Export**: Synced preview; export MP4/audio with optional SRT subs.
|
||
- **Undo/Redo**: Full edit history.
|
||
|
||
No multi-track, voice cloning, or collaboration—keep it simple.
|
||
|
||
## 4. Notes
|
||
- Consider adding Parakeet TDT as a transcription option in the future for users who want alternatives to Whisper.
|
||
|
||
## 5. Monetization Model
|
||
- **Free Forever**: Core editing/transcription (unlimited local use).
|
||
- **Pro License** ($29–49 one-time): Batch processing, high-quality voices (if adding TTS), custom presets, priority support.
|
||
- **Optional Add-Ons**: Cloud credits for long videos (rarely needed).
|
||
|
||
## 6. Timeline & Milestones
|
||
- **Weeks 1–4**: Tauri setup + Whisper integration.
|
||
- **Weeks 5–6**: Audio logic migration + frontend tweaks.
|
||
- **Weeks 7–8**: Testing, packaging, launch prep.
|
||
- **Total**: 6–10 weeks to MVP (solo dev + AI).
|
||
|
||
## 7. Risks & Tips
|
||
- **Risks**: Whisper.cpp compilation issues; Rust learning curve if new to it.
|
||
- **Tips**: Start with small models (base ~70MB); test timestamp accuracy early. Use Tauri's docs for migration. If stuck, fall back to bundling Python for Whisper (but avoid for true standalone).
|
||
- **Resources**: Tauri docs, Whisper.cpp GitHub, Rust audio crates.</content>
|
||
<parameter name="filePath">/home/dillon/_code/audio_editor/plan.md |