Files
TalkEdit/plan.md
2026-05-06 02:29:10 -06:00

159 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TalkEdit — Launch Plan
## Niche: "Descript for long-form content"
TalkEdit's defensible position: **works on hour+ files without degrading**, fully offline, one-time payment. No competitor owns this — Descript chokes on long content, CapCut limits mobile uploads, and both require accounts.
---
## Phase 1: Polish (pre-launch, do this first)
### Reliability & error handling
- [ ] Handle backend crashes gracefully — if Python backend dies, show a reconnect banner, don't leave UI in broken state
- [ ] Transcription failure recovery — show which model/download step failed, suggest alternatives
- [ ] Export failure reporting — surface FFmpeg stderr to user in a readable way
- [ ] File locking / concurrent access — prevent exporting while transcription is running and vice versa
### UX roughness
- [ ] Drag-and-drop file import onto the welcome screen
- [ ] Loading spinners for every async action with descriptive messages ("Downloading model...", "Analyzing silence...")
- [ ] Undo/redo visual feedback — toast notification "Undo: removed cut"
- [ ] `?` keyboard shortcut opens a proper cheat sheet modal
- [ ] Save indicator — dot or "unsaved" badge next to project name
- [ ] Disabled state for all buttons during export/transcription to prevent double-clicks
- [ ] Empty states for every panel ("Add your first marker", "No silence detected", etc.)
### Trial & licensing UX
- [ ] Wire up the "Activate" link to a real payment page (not placeholder `talked.it`)
- [ ] Show days remaining in the welcome screen bar (done)
- [ ] After trial expires, clearly explain what still works (export, loading) vs what's locked (editing, AI)
- [ ] License key field should handle paste + validate format client-side before sending to Rust
### Performance
- [ ] Lazy-load the waveform for very long files (>2hr) — don't fetch entire WAV at once
- [ ] Virtualize the waveform canvas rendering (only draw visible portion), not just transcript
- [ ] Debounce project auto-save
---
## Phase 2: Standout features (own the niche)
### Long-form content (win here)
- [ ] **Chapter-based navigation** — markers auto-sorted, click to jump, usable on 3hr files (partially done)
- [ ] **Per-segment re-transcription** without losing surrounding context (done)
- [ ] **Append multiple clips** into one timeline (done)
- [ ] **Project stitching** — load multiple `.aive` projects, combine into one export
- [ ] **Session memory** — re-open last project on launch, auto-restore cursor position and scroll
- [ ] **Smart chunking** for transcription — for files >2hr, transcribe in overlapping chunks and stitch seamlessly
### Export differentiation
- [ ] **YouTube chapters** — auto-generate from markers, copy as timestamps (done)
- [ ] **Export chapter markers** — embed in MP4/MKV metadata for chapter skip in players
- [ ] **Batch export** — export multiple projects or multiple cuts from one project in sequence
- [ ] **Export transcript format presets** — SRT, VTT, TXT, TXT with timestamps, markdown (done)
### AI features (local-first moat)
All powered by the bundled Qwen3 LLM. No API keys, no cloud calls. Features are grouped by how much they contribute to the core workflow.
#### Content creation (biggest value)
- [ ] **Smart Shorts finder** — scan the full transcript for self-contained, engaging segments 1090s long. Ranks by: narrative completeness (has a beginning/end), energy cues from transcript sentiment, and topic boundaries. Results shown as a list of suggested cut ranges with preview. One-click export as a separate short video or copy timecodes.
- [ ] **Sound bite / quotable moment finder** — find punchy, standalone sentences that work as clips for social media. Ranks by quotability. Different from Shorts finder: these are <15s, single-sentence, high-impact lines.
- [ ] **Hook analyzer** score the first 30 seconds of the video for engagement. Suggests cuts or rephrases to make the intro stronger. Shows a "hook score" 110.
- [ ] **Title + description generator** suggest 5 titles and a YouTube description from the transcript + markers. One-click copy.
#### Editing acceleration
- [ ] **AI auto-chapters** detect topic shifts from transcript create timeline markers. Uses topic segmentation from LLM, not just silence gaps.
- [ ] **AI Smart Clean** one-pass: filler removal + silence trim + normalize (done)
- [ ] **AI sentence rephrase** right-click word rephrase with AI replace in transcript (also done, uses backend)
- [ ] **AI pacing analysis** flag segments where the speaker talks too fast or too slow for sustained periods. Suggests speed range adjustments or cuts.
- [ ] **AI dead-air finder** finds moments where nothing interesting is said for >5s (rambling, off-topic, false starts). Different from silence trimmer — this is content-based, not audio-level-based.
- [ ] **AI readabilty scan** — flag sentences that are too long, complex, or jargon-heavy for spoken word. Suggests simpler alternatives.
#### Metadata & distribution
- [ ] **AI show notes** — generate title, description, key moments, and timestamps from transcript + markers
- [ ] **AI keyword/tag extraction** — pull out 510 topic tags from the transcript. Useful for YouTube SEO or categorization
- [ ] **AI question finder** — detect all questions asked in the video (speaker or guest). Useful for Q&A videos, AMAs, interviews — jump to each question instantly
- [ ] **AI thumbnail text suggestion** — suggest short overlay text for video thumbnails based on the most compelling line in the video
- [ ] **AI call-to-action finder** — detect where the speaker asks for likes/subscribes/comments. Lets you trim or reposition CTAs
#### Accessibility & compliance
- [ ] **AI content flagging** — flag profanity, sensitive topics, or copyrighted references in the transcript. Color-coded by category. Useful before publishing to restricted platforms
- [ ] **AI language leveling** — rewrite transcript segments at a target reading grade level (e.g., "simplify to 8th-grade level"). Useful for educational content or broad audiences
### Bundled local LLM (killer friction-killer)
The biggest UX gap: users must set up Ollama or paste an API key to use AI features. Bundle two small models — download on first AI use, just like Whisper models.
**Three-tier AI provider choice** (set once, persisted):
| Option | Hardware needed | Download | Setup |
|--------|----------------|----------|-------|
| Qwen3 4B (recommended) | 8GB+ free RAM | 2.5 GB | None — auto-download |
| Qwen3 1.7B (lightweight) | 4GB+ free RAM | 1.0 GB | None — auto-download |
| Ollama (bring your own) | Any | None | User starts Ollama themselves |
Default to Qwen3 4B. If the machine can't meet the RAM threshold (checked via Tauri at runtime), fall back to 1.7B or prompt to set up Ollama.
- [ ] **Integrate llama.cpp Rust bindings** (`llama-cpp-rs` or `candle`) — replace Python `ai_provider.py` calls with native inference for bundled models
- [ ] **Auto-download Qwen3 4B or 1.7B** on first AI action based on hardware check (GGUF Q4_K_M format, ~2.5GB / ~1GB). Same UX as Whisper download: progress bar, resume on interrupt
- [ ] **Model selector** in Settings: "Qwen3 4B (fast, no setup)" vs "Qwen3 1.7B (lightweight)" vs Ollama vs OpenAI vs Claude. Default to Qwen3 4B
- [ ] **Hardware detection on first AI use** — check total system RAM, recommend 4B if ≥8GB free, 1.7B if less. Skip download entirely if machine can't run either (fall back to Ollama/API prompt)
- [ ] **GPU acceleration** — llama.cpp supports CUDA/Metal/Vulkan. Detect at runtime and enable if available
- [ ] **For lightweight AI tasks** (filler detection, chapter titles, summarization) the bundled model handles them directly. Only task requiring heavier reasoning (rephrase, smart speed) get the full model
- [ ] **Remove the Python backend dependency for AI** — once bundled LLM + Whisper.cpp handle everything, no need to ship Python for AI features. One less runtime dependency
- [ ] GGUF model files are cached in app data dir, same as Whisper models
**Why this wins:**
- New users open the app, click "Detect filler words" and it just works. No API signup. No Docker. No "install Ollama" README steps.
- Descript charges $24/mo and still requires internet. CapCut's AI features are cloud-only. TalkEdit gives you local AI with zero setup.
- The same download-on-demand pattern already works for Whisper models — users understand it.
- Two size options means it works on everything from a 16GB gaming laptop to an 8GB office machine.
---
## Phase 3: Launch prep
### Marketing assets
- [ ] Compare page: "TalkEdit vs Descript" — focus on offline, no subscription, long file support
- [ ] Demo video showing a 2hr podcast cleaned in 3 clicks
- [ ] Tagline: *"The offline video editor that doesn't slow down on long files"*
- [ ] Landing page at talked.it with clear pricing (one-time license)
### Distribution
- [ ] Product Hunt launch
- [ ] Post in r/podcasting, r/VideoEditing, r/selfhosted
- [ ] GitHub release with binaries for Windows/macOS/Linux
- [ ] Offer free licenses to podcasters with public feedback in exchange for testimonials
### Pricing
- [ ] **Free trial**: 30 days, full features (already done)
- [ ] **Pro**: $39 one-time — permanent license, includes all current + future features
- [ ] **Business**: $79 one-time — same but with priority support + volume licensing
- [ ] No subscriptions, no recurring charges. One purchase = owned forever.
---
## Phase 4: Post-launch
### Retention
- [ ] In-app changelog on update
- [ ] Email list for major releases (optional, no account required)
- [ ] Community templates/sharing for export presets and filler lists
### Growth features
- [ ] **Sample video download** — "Try without your own media" button downloads a test file + pre-made transcript
- [ ] **Built-in free music library** — 5-10 CC0 loops shipped with the app
- [ ] **Export presets** — community-contributed, loaded from a JSON file
---
## Non-goals (explicitly defer)
- Cloud sync / collaboration
- Voice cloning / TTS
- Full multi-track NLE timeline (transitions, picture-in-picture, etc.)
- Mobile app
- Subscription model
- Training/fine-tuning models in-app
- Image/video generation models (Stable Diffusion, etc.) — text-only LLM is sufficient for transcript tasks