TalkEdit/plan.md

# TalkEdit — Launch Plan

## Niche: "Descript for long-form content"

TalkEdit's defensible position: **works on hour+ files without degrading**, fully offline, one-time payment. No competitor owns this — Descript chokes on long content, CapCut limits mobile uploads, and both require accounts.

---

## Phase 1: Polish (pre-launch, do this first)

### Reliability & error handling
- [ ] Handle backend crashes gracefully — if Python backend dies, show a reconnect banner, don't leave UI in broken state
- [ ] Transcription failure recovery — show which model/download step failed, suggest alternatives
- [ ] Export failure reporting — surface FFmpeg stderr to user in a readable way
- [ ] File locking / concurrent access — prevent exporting while transcription is running and vice versa

### UX roughness
- [ ] Drag-and-drop file import onto the welcome screen
- [ ] Loading spinners for every async action with descriptive messages ("Downloading model...", "Analyzing silence...")
- [ ] Undo/redo visual feedback — toast notification "Undo: removed cut"
- [ ] `?` keyboard shortcut opens a proper cheat sheet modal
- [ ] Save indicator — dot or "unsaved" badge next to project name
- [ ] Disabled state for all buttons during export/transcription to prevent double-clicks
- [ ] Empty states for every panel ("Add your first marker", "No silence detected", etc.)

### Trial & licensing UX
- [ ] Wire up the "Activate" link to a real payment page (not placeholder `talked.it`)
- [ ] Show days remaining in the welcome screen bar (done)
- [ ] After trial expires, clearly explain what still works (export, loading) vs what's locked (editing, AI)
- [ ] License key field should handle paste + validate format client-side before sending to Rust

### Performance
- [ ] Lazy-load the waveform for very long files (>2hr) — don't fetch entire WAV at once
- [ ] Virtualize the waveform canvas rendering (only draw visible portion), not just transcript
- [ ] Debounce project auto-save

---

## Phase 2: Standout features (own the niche)

### Long-form content (win here)
- [ ] **Chapter-based navigation** — markers auto-sorted, click to jump, usable on 3hr files (partially done)
- [ ] **Per-segment re-transcription** without losing surrounding context (done)
- [ ] **Append multiple clips** into one timeline (done)
- [ ] **Project stitching** — load multiple `.aive` projects, combine into one export
- [ ] **Session memory** — re-open last project on launch, auto-restore cursor position and scroll
- [ ] **Smart chunking** for transcription — for files >2hr, transcribe in overlapping chunks and stitch seamlessly

### Export differentiation
- [ ] **YouTube chapters** — auto-generate from markers, copy as timestamps (done)
- [ ] **Export chapter markers** — embed in MP4/MKV metadata for chapter skip in players
- [ ] **Batch export** — export multiple projects or multiple cuts from one project in sequence
- [ ] **Export transcript format presets** — SRT, VTT, TXT, TXT with timestamps, markdown (done)

### AI features (local-first moat)

All powered by the bundled Qwen3 LLM. No API keys, no cloud calls. Features are grouped by how much they contribute to the core workflow.

#### Content creation (biggest value)
- [ ] **Smart Shorts finder** — scan the full transcript for self-contained, engaging segments 10–90s long. Ranks by: narrative completeness (has a beginning/end), energy cues from transcript sentiment, and topic boundaries. Results shown as a list of suggested cut ranges with preview. One-click export as a separate short video or copy timecodes.
- [ ] **Sound bite / quotable moment finder** — find punchy, standalone sentences that work as clips for social media. Ranks by quotability. Different from Shorts finder: these are <15s, single-sentence, high-impact lines.
- [ ] **Hook analyzer** — score the first 30 seconds of the video for engagement. Suggests cuts or rephrases to make the intro stronger. Shows a "hook score" 1–10.
- [ ] **Title + description generator** — suggest 5 titles and a YouTube description from the transcript + markers. One-click copy.

#### Editing acceleration 
- [ ] **AI auto-chapters** — detect topic shifts from transcript → create timeline markers. Uses topic segmentation from LLM, not just silence gaps.
- [ ] **AI Smart Clean** — one-pass: filler removal + silence trim + normalize (done)
- [ ] **AI sentence rephrase** — right-click word → rephrase with AI → replace in transcript (also done, uses backend)
- [ ] **AI pacing analysis** — flag segments where the speaker talks too fast or too slow for sustained periods. Suggests speed range adjustments or cuts.
- [ ] **AI dead-air finder** — finds moments where nothing interesting is said for >5s (rambling, off-topic, false starts). Different from silence trimmer — this is content-based, not audio-level-based.
- [ ] **AI readabilty scan** — flag sentences that are too long, complex, or jargon-heavy for spoken word. Suggests simpler alternatives.

#### Metadata & distribution
- [ ] **AI show notes** — generate title, description, key moments, and timestamps from transcript + markers
- [ ] **AI keyword/tag extraction** — pull out 5–10 topic tags from the transcript. Useful for YouTube SEO or categorization
- [ ] **AI question finder** — detect all questions asked in the video (speaker or guest). Useful for Q&A videos, AMAs, interviews — jump to each question instantly
- [ ] **AI thumbnail text suggestion** — suggest short overlay text for video thumbnails based on the most compelling line in the video
- [ ] **AI call-to-action finder** — detect where the speaker asks for likes/subscribes/comments. Lets you trim or reposition CTAs

#### Accessibility & compliance
- [ ] **AI content flagging** — flag profanity, sensitive topics, or copyrighted references in the transcript. Color-coded by category. Useful before publishing to restricted platforms
- [ ] **AI language leveling** — rewrite transcript segments at a target reading grade level (e.g., "simplify to 8th-grade level"). Useful for educational content or broad audiences

### Bundled local LLM (killer friction-killer)
The biggest UX gap: users must set up Ollama or paste an API key to use AI features. Bundle two small models — download on first AI use, just like Whisper models.

**Three-tier AI provider choice** (set once, persisted):

| Option | Hardware needed | Download | Setup |
|--------|----------------|----------|-------|
| Qwen3 4B (recommended) | 8GB+ free RAM | 2.5 GB | None — auto-download |
| Qwen3 1.7B (lightweight) | 4GB+ free RAM | 1.0 GB | None — auto-download |
| Ollama (bring your own) | Any | None | User starts Ollama themselves |

Default to Qwen3 4B. If the machine can't meet the RAM threshold (checked via Tauri at runtime), fall back to 1.7B or prompt to set up Ollama.

- [ ] **Integrate llama.cpp Rust bindings** (`llama-cpp-rs` or `candle`) — replace Python `ai_provider.py` calls with native inference for bundled models
- [ ] **Auto-download Qwen3 4B or 1.7B** on first AI action based on hardware check (GGUF Q4_K_M format, ~2.5GB / ~1GB). Same UX as Whisper download: progress bar, resume on interrupt
- [ ] **Model selector** in Settings: "Qwen3 4B (fast, no setup)" vs "Qwen3 1.7B (lightweight)" vs Ollama vs OpenAI vs Claude. Default to Qwen3 4B
- [ ] **Hardware detection on first AI use** — check total system RAM, recommend 4B if ≥8GB free, 1.7B if less. Skip download entirely if machine can't run either (fall back to Ollama/API prompt)
- [ ] **GPU acceleration** — llama.cpp supports CUDA/Metal/Vulkan. Detect at runtime and enable if available
- [ ] **For lightweight AI tasks** (filler detection, chapter titles, summarization) the bundled model handles them directly. Only task requiring heavier reasoning (rephrase, smart speed) get the full model
- [ ] **Remove the Python backend dependency for AI** — once bundled LLM + Whisper.cpp handle everything, no need to ship Python for AI features. One less runtime dependency
- [ ] GGUF model files are cached in app data dir, same as Whisper models

**Why this wins:**
- New users open the app, click "Detect filler words" and it just works. No API signup. No Docker. No "install Ollama" README steps.
- Descript charges $24/mo and still requires internet. CapCut's AI features are cloud-only. TalkEdit gives you local AI with zero setup.
- The same download-on-demand pattern already works for Whisper models — users understand it.
- Two size options means it works on everything from a 16GB gaming laptop to an 8GB office machine.

---

## Phase 3: Launch prep

### Marketing assets
- [ ] Compare page: "TalkEdit vs Descript" — focus on offline, no subscription, long file support
- [ ] Demo video showing a 2hr podcast cleaned in 3 clicks
- [ ] Tagline: *"The offline video editor that doesn't slow down on long files"*
- [ ] Landing page at talked.it with clear pricing (one-time license)

### Distribution
- [ ] Product Hunt launch
- [ ] Post in r/podcasting, r/VideoEditing, r/selfhosted
- [ ] GitHub release with binaries for Windows/macOS/Linux
- [ ] Offer free licenses to podcasters with public feedback in exchange for testimonials

### Pricing
- [ ] **Free trial**: 30 days, full features (already done)
- [ ] **Pro**: $39 one-time — permanent license, includes all current + future features
- [ ] **Business**: $79 one-time — same but with priority support + volume licensing
- [ ] No subscriptions, no recurring charges. One purchase = owned forever.

---

## Phase 4: Post-launch

### Retention
- [ ] In-app changelog on update
- [ ] Email list for major releases (optional, no account required)
- [ ] Community templates/sharing for export presets and filler lists

### Growth features
- [ ] **Sample video download** — "Try without your own media" button downloads a test file + pre-made transcript
- [ ] **Built-in free music library** — 5-10 CC0 loops shipped with the app
- [ ] **Export presets** — community-contributed, loaded from a JSON file

---

## Non-goals (explicitly defer)

- Cloud sync / collaboration
- Voice cloning / TTS
- Full multi-track NLE timeline (transitions, picture-in-picture, etc.)
- Mobile app
- Subscription model
- Training/fine-tuning models in-app
- Image/video generation models (Stable Diffusion, etc.) — text-only LLM is sufficient for transcript tasks
-												plans and features

											
										
										
											2026-05-06 02:29:10 -06:00
+								# TalkEdit — Launch Plan
 								## Niche: "Descript for long-form content"
 								TalkEdit's defensible position: **works on hour+ files without degrading**, fully offline, one-time payment. No competitor owns this — Descript chokes on long content, CapCut limits mobile uploads, and both require accounts.
 								---
 								## Phase 1: Polish (pre-launch, do this first)
 								### Reliability & error handling
 								- [ ] Handle backend crashes gracefully — if Python backend dies, show a reconnect banner, don't leave UI in broken state
 								- [ ] Transcription failure recovery — show which model/download step failed, suggest alternatives
 								- [ ] Export failure reporting — surface FFmpeg stderr to user in a readable way
 								- [ ] File locking / concurrent access — prevent exporting while transcription is running and vice versa
 								### UX roughness
 								- [ ] Drag-and-drop file import onto the welcome screen
 								- [ ] Loading spinners for every async action with descriptive messages ("Downloading model...", "Analyzing silence...")
 								- [ ] Undo/redo visual feedback — toast notification "Undo: removed cut"
 								- [ ] `?` keyboard shortcut opens a proper cheat sheet modal
 								- [ ] Save indicator — dot or "unsaved" badge next to project name
 								- [ ] Disabled state for all buttons during export/transcription to prevent double-clicks
 								- [ ] Empty states for every panel ("Add your first marker", "No silence detected", etc.)
 								### Trial & licensing UX
 								- [ ] Wire up the "Activate" link to a real payment page (not placeholder `talked.it`)
 								- [ ] Show days remaining in the welcome screen bar (done)
 								- [ ] After trial expires, clearly explain what still works (export, loading) vs what's locked (editing, AI)
 								- [ ] License key field should handle paste + validate format client-side before sending to Rust
 								### Performance
 								- [ ] Lazy-load the waveform for very long files (>2hr) — don't fetch entire WAV at once
 								- [ ] Virtualize the waveform canvas rendering (only draw visible portion), not just transcript
 								- [ ] Debounce project auto-save
 								---
 								## Phase 2: Standout features (own the niche)
 								### Long-form content (win here)
 								- [ ] **Chapter-based navigation** — markers auto-sorted, click to jump, usable on 3hr files (partially done)
 								- [ ] **Per-segment re-transcription** without losing surrounding context (done)
 								- [ ] **Append multiple clips** into one timeline (done)
 								- [ ] **Project stitching** — load multiple `.aive` projects, combine into one export
 								- [ ] **Session memory** — re-open last project on launch, auto-restore cursor position and scroll
 								- [ ] **Smart chunking** for transcription — for files >2hr, transcribe in overlapping chunks and stitch seamlessly
 								### Export differentiation
 								- [ ] **YouTube chapters** — auto-generate from markers, copy as timestamps (done)
 								- [ ] **Export chapter markers** — embed in MP4/MKV metadata for chapter skip in players
 								- [ ] **Batch export** — export multiple projects or multiple cuts from one project in sequence
 								- [ ] **Export transcript format presets** — SRT, VTT, TXT, TXT with timestamps, markdown (done)
 								### AI features (local-first moat)
 								All powered by the bundled Qwen3 LLM. No API keys, no cloud calls. Features are grouped by how much they contribute to the core workflow.
 								#### Content creation (biggest value)
 								- [ ] **Smart Shorts finder** — scan the full transcript for self-contained, engaging segments 10–90s long. Ranks by: narrative completeness (has a beginning/end), energy cues from transcript sentiment, and topic boundaries. Results shown as a list of suggested cut ranges with preview. One-click export as a separate short video or copy timecodes.
 								- [ ] **Sound bite / quotable moment finder** — find punchy, standalone sentences that work as clips for social media. Ranks by quotability. Different from Shorts finder: these are <15s, single-sentence, high-impact lines.
 								- [ ] **Hook analyzer** — score the first 30 seconds of the video for engagement. Suggests cuts or rephrases to make the intro stronger. Shows a "hook score" 1–10.
 								- [ ] **Title + description generator** — suggest 5 titles and a YouTube description from the transcript + markers. One-click copy.
 								#### Editing acceleration
 								- [ ] **AI auto-chapters** — detect topic shifts from transcript → create timeline markers. Uses topic segmentation from LLM, not just silence gaps.
 								- [ ] **AI Smart Clean** — one-pass: filler removal + silence trim + normalize (done)
 								- [ ] **AI sentence rephrase** — right-click word → rephrase with AI → replace in transcript (also done, uses backend)
 								- [ ] **AI pacing analysis** — flag segments where the speaker talks too fast or too slow for sustained periods. Suggests speed range adjustments or cuts.
 								- [ ] **AI dead-air finder** — finds moments where nothing interesting is said for >5s (rambling, off-topic, false starts). Different from silence trimmer — this is content-based, not audio-level-based.
 								- [ ] **AI readabilty scan** — flag sentences that are too long, complex, or jargon-heavy for spoken word. Suggests simpler alternatives.
 								#### Metadata & distribution
 								- [ ] **AI show notes** — generate title, description, key moments, and timestamps from transcript + markers
 								- [ ] **AI keyword/tag extraction** — pull out 5–10 topic tags from the transcript. Useful for YouTube SEO or categorization
 								- [ ] **AI question finder** — detect all questions asked in the video (speaker or guest). Useful for Q&A videos, AMAs, interviews — jump to each question instantly
 								- [ ] **AI thumbnail text suggestion** — suggest short overlay text for video thumbnails based on the most compelling line in the video
 								- [ ] **AI call-to-action finder** — detect where the speaker asks for likes/subscribes/comments. Lets you trim or reposition CTAs
 								#### Accessibility & compliance
 								- [ ] **AI content flagging** — flag profanity, sensitive topics, or copyrighted references in the transcript. Color-coded by category. Useful before publishing to restricted platforms
 								- [ ] **AI language leveling** — rewrite transcript segments at a target reading grade level (e.g., "simplify to 8th-grade level"). Useful for educational content or broad audiences
 								### Bundled local LLM (killer friction-killer)
 								The biggest UX gap: users must set up Ollama or paste an API key to use AI features. Bundle two small models — download on first AI use, just like Whisper models.
 								**Three-tier AI provider choice** (set once, persisted):
 								| Option | Hardware needed | Download | Setup |
 								|--------|----------------|----------|-------|
 								| Qwen3 4B (recommended) | 8GB+ free RAM | 2.5 GB | None — auto-download |
 								| Qwen3 1.7B (lightweight) | 4GB+ free RAM | 1.0 GB | None — auto-download |
 								| Ollama (bring your own) | Any | None | User starts Ollama themselves |
 								Default to Qwen3 4B. If the machine can't meet the RAM threshold (checked via Tauri at runtime), fall back to 1.7B or prompt to set up Ollama.
 								- [ ] **Integrate llama.cpp Rust bindings** (`llama-cpp-rs` or `candle`) — replace Python `ai_provider.py` calls with native inference for bundled models
 								- [ ] **Auto-download Qwen3 4B or 1.7B** on first AI action based on hardware check (GGUF Q4_K_M format, ~2.5GB / ~1GB). Same UX as Whisper download: progress bar, resume on interrupt
 								- [ ] **Model selector** in Settings: "Qwen3 4B (fast, no setup)" vs "Qwen3 1.7B (lightweight)" vs Ollama vs OpenAI vs Claude. Default to Qwen3 4B
 								- [ ] **Hardware detection on first AI use** — check total system RAM, recommend 4B if ≥8GB free, 1.7B if less. Skip download entirely if machine can't run either (fall back to Ollama/API prompt)
 								- [ ] **GPU acceleration** — llama.cpp supports CUDA/Metal/Vulkan. Detect at runtime and enable if available
 								- [ ] **For lightweight AI tasks** (filler detection, chapter titles, summarization) the bundled model handles them directly. Only task requiring heavier reasoning (rephrase, smart speed) get the full model
 								- [ ] **Remove the Python backend dependency for AI** — once bundled LLM + Whisper.cpp handle everything, no need to ship Python for AI features. One less runtime dependency
 								- [ ] GGUF model files are cached in app data dir, same as Whisper models
 								**Why this wins:**
 								- New users open the app, click "Detect filler words" and it just works. No API signup. No Docker. No "install Ollama" README steps.
 								- Descript charges $24/mo and still requires internet. CapCut's AI features are cloud-only. TalkEdit gives you local AI with zero setup.
 								- The same download-on-demand pattern already works for Whisper models — users understand it.
 								- Two size options means it works on everything from a 16GB gaming laptop to an 8GB office machine.
 								---
 								## Phase 3: Launch prep
 								### Marketing assets
 								- [ ] Compare page: "TalkEdit vs Descript" — focus on offline, no subscription, long file support
 								- [ ] Demo video showing a 2hr podcast cleaned in 3 clicks
 								- [ ] Tagline: *"The offline video editor that doesn't slow down on long files"*
 								- [ ] Landing page at talked.it with clear pricing (one-time license)
 								### Distribution
 								- [ ] Product Hunt launch
 								- [ ] Post in r/podcasting, r/VideoEditing, r/selfhosted
 								- [ ] GitHub release with binaries for Windows/macOS/Linux
 								- [ ] Offer free licenses to podcasters with public feedback in exchange for testimonials
 								### Pricing
 								- [ ] **Free trial**: 30 days, full features (already done)
 								- [ ] **Pro**: $39 one-time — permanent license, includes all current + future features
 								- [ ] **Business**: $79 one-time — same but with priority support + volume licensing
 								- [ ] No subscriptions, no recurring charges. One purchase = owned forever.
 								---
 								## Phase 4: Post-launch
 								### Retention
 								- [ ] In-app changelog on update
 								- [ ] Email list for major releases (optional, no account required)
 								- [ ] Community templates/sharing for export presets and filler lists
 								### Growth features
 								- [ ] **Sample video download** — "Try without your own media" button downloads a test file + pre-made transcript
 								- [ ] **Built-in free music library** — 5-10 CC0 loops shipped with the app
 								- [ ] **Export presets** — community-contributed, loaded from a JSON file
 								---
 								## Non-goals (explicitly defer)
 								- Cloud sync / collaboration
 								- Voice cloning / TTS
 								- Full multi-track NLE timeline (transitions, picture-in-picture, etc.)
 								- Mobile app
 								- Subscription model
 								- Training/fine-tuning models in-app
 								- Image/video generation models (Stable Diffusion, etc.) — text-only LLM is sufficient for transcript tasks