TalkEdit/plan.md

# Plan for Building TalkEdit (Whisper.cpp + Tauri)

Based on your original idea summary and our discussions, here's a detailed plan to build a standalone, local audio/video editor app. We'll continue evolving the existing TalkEdit codebase on **Tauri 2.0** (Rust backend + React frontend) for tiny, dependency-free installers, and use **Whisper.cpp** for fast, accurate transcription. This keeps the scope minimal, focuses on text-based editing for spoken content, and targets podcasters/YouTubers.

## 1. Overview
- **Goal**: Create an offline Descript alternative with word-level editing, transcription, and export. Users download one file (~10–20MB), install, and run—no Python, FFmpeg, or external deps.
- **Why This Stack**: Tauri bundles everything into a native app; Whisper.cpp (C++ lib) integrates seamlessly with Rust for CPU-efficient transcription. Faster than rebuilding from scratch.
- **Target Users**: Creators editing podcasts/videos; free core + Pro upgrades.
- **Key Differentiators**: Fully local, text-based editing like Google Docs, smart cuts with fades.

## 2. Tech Stack
- **Frontend**: React + Vite + Tailwind CSS + shadcn/ui.
- **Backend**: Tauri 2.0 (Rust) – handles file I/O, FFmpeg calls, Whisper.cpp integration.
- **Transcription**: Whisper.cpp (via Rust bindings like `whisper-cpp-sys` or `whisper-rs`).
- **Audio/Video Processing**: FFmpeg (bundled or called via Rust wrappers like `ffmpeg-next`).
- **State Management**: Zustand.
- **Packaging**: Tauri's `tauri build` for cross-platform installers.
- **AI Features**: Local models only (no APIs); optional Ollama for fillers.

## 3. Step-by-Step Development Plan
1. **Set Up Tauri in TalkEdit** (1–2 weeks):
   - Install `tauri-cli` globally.
   - In TalkEdit root: `npx tauri init` (choose Rust backend, link to existing React frontend).
   - Implement Tauri `src/main.rs` host flow (window lifecycle, file dialogs, backend coordination).
   - Update `tauri.conf.json` for app metadata, bundle settings.

2. **Integrate Whisper.cpp in Rust** (2–3 weeks):
   - Add `whisper-cpp` as a dependency in `Cargo.toml`.
   - Create a Rust module for transcription: Load models, process audio, return word-level timestamps.
   - Replace Python backend calls with Tauri commands (e.g., `invoke` from frontend to Rust for transcription).
   - Handle model downloads on first run (store in app data dir).

3. **Migrate Audio/Video Logic** (2 weeks):
   - Port FFmpeg calls to Rust (use `ffmpeg-next` for cutting/export).
   - Implement segment calculation: From edited transcript, build keep_segments with padding/fades.
   - Add audio cleaning (noise reduction via bundled tools or Rust libs).

4. **Frontend Polish** (1–2 weeks):
   - Update UI for Tauri (file dialogs via `tauri-plugin-dialog`).
   - Refine transcript editor: Better timestamp syncing, manual adjustments.
   - Add export options (MP4 with subs, audio-only).

5. **Testing & Packaging** (1 week):
   - Test on Windows/macOS/Linux; ensure Whisper runs offline.
   - Bundle with `tauri build`; verify no external deps.
   - Add auto-updater for Pro features.

6. **Launch & Iterate** (Ongoing):
   - Open-source core on GitHub.
   - Market on Product Hunt, Reddit; gather feedback.

## 4. MVP Features (Minimal but Useful)
Focus on what creators need for spoken content:
- **Drag-and-drop import**: Audio/video files; auto-extract audio.
- **One-click transcription**: Whisper.cpp with model choice (Fast - less accurate: tiny/base; Slow - more accurate: small/medium/large).
- **Text-based editing**: Scrollable transcript; click word → jump to video; select/delete words → auto-cut audio with 150ms fades.
- **Smart cleanup**: Remove fillers ("um", pauses >0.8s) via local AI.
- **Preview & Export**: Synced preview; export MP4/audio with optional SRT subs.
- **Undo/Redo**: Full edit history.

No multi-track, voice cloning, or collaboration—keep it simple.

## 4. Notes
- Consider adding Parakeet TDT as a transcription option in the future for users who want alternatives to Whisper.

## 5. Monetization Model
- **Free Forever**: Core editing/transcription (unlimited local use).
- **Pro License** ($29–49 one-time): Batch processing, high-quality voices (if adding TTS), custom presets, priority support.
- **Optional Add-Ons**: Cloud credits for long videos (rarely needed).

## 6. Timeline & Milestones
- **Weeks 1–4**: Tauri setup + Whisper integration.
- **Weeks 5–6**: Audio logic migration + frontend tweaks.
- **Weeks 7–8**: Testing, packaging, launch prep.
- **Total**: 6–10 weeks to MVP (solo dev + AI).

## 7. Risks & Tips
- **Risks**: Whisper.cpp compilation issues; Rust learning curve if new to it.
- **Tips**: Start with small models (base ~70MB); test timestamp accuracy early. Use Tauri's docs for migration. If stuck, fall back to bundling Python for Whisper (but avoid for true standalone).
- **Resources**: Tauri docs, Whisper.cpp GitHub, Rust audio crates.</content>
<parameter name="filePath">/home/dillon/_code/audio_editor/plan.md