updated features and docs

This commit is contained in:
2026-05-06 16:15:38 -06:00
parent e4484a57f9
commit 813877a7b4
4 changed files with 276 additions and 269 deletions

261
plan.md
View File

@ -4,193 +4,124 @@
TalkEdit's defensible position: **works on hour+ files without degrading**, fully offline, one-time payment. No competitor owns this — Descript chokes on long content, CapCut limits mobile uploads, and both require accounts.
---
## Phase 1: Polish (pre-launch, do this first)
### Reliability & error handling
- [ ] Handle backend crashes gracefully — if Python backend dies, show a reconnect banner, don't leave UI in broken state
- [ ] Transcription failure recovery — show which model/download step failed, suggest alternatives
- [ ] Export failure reporting — surface FFmpeg stderr to user in a readable way
- [ ] File locking / concurrent access — prevent exporting while transcription is running and vice versa
### UX roughness
- [ ] Drag-and-drop file import onto the welcome screen
- [ ] Loading spinners for every async action with descriptive messages ("Downloading model...", "Analyzing silence...")
- [ ] Undo/redo visual feedback — toast notification "Undo: removed cut"
- [ ] `?` keyboard shortcut opens a proper cheat sheet modal
- [ ] Save indicator — dot or "unsaved" badge next to project name
- [ ] Disabled state for all buttons during export/transcription to prevent double-clicks
- [ ] Empty states for every panel ("Add your first marker", "No silence detected", etc.)
### Trial & licensing UX
- [ ] Wire up the "Activate" link to a real payment page (not placeholder `talked.it`)
- [ ] Show days remaining in the welcome screen bar (done)
- [ ] After trial expires, clearly explain what still works (export, loading) vs what's locked (editing, AI)
- [ ] License key field should handle paste + validate format client-side before sending to Rust
### Performance
- [ ] Lazy-load the waveform for very long files (>2hr) — don't fetch entire WAV at once
- [ ] Virtualize the waveform canvas rendering (only draw visible portion), not just transcript
- [ ] Debounce project auto-save
**Current status (May 2026):** All core editing features are built and stable. Polish pass completed. 107 automated tests (95 frontend + 12 Rust). Ready for beta testing.
---
## Phase 2: Standout features (own the niche)
## Phase 1: Polish ✅ COMPLETED
### Long-form content (win here)
- [ ] **Chapter-based navigation** — markers auto-sorted, click to jump, usable on 3hr files (partially done)
- [ ] **Per-segment re-transcription** without losing surrounding context (done)
- [ ] **Append multiple clips** into one timeline (done)
- [ ] **Project stitching** — load multiple `.aive` projects, combine into one export
- [ ] **Session memory** — re-open last project on launch, auto-restore cursor position and scroll
- [ ] **Smart chunking** for transcription — for files >2hr, transcribe in overlapping chunks and stitch seamlessly
### Reliability & error handling ✅
- [x] Backend health check — polls `/health` every 30s, shows reconnecting banner
- [x] Export failure reporting — surfaces FFmpeg stderr with copy-to-clipboard
- [x] React ErrorBoundary catches render crashes, shows fallback with reload
- [x] Global JS error logging — `window.onerror` + `onunhandledrejection` logged to Rust backend
### Export differentiation
- [ ] **YouTube chapters** — auto-generate from markers, copy as timestamps (done)
- [ ] **Export chapter markers** — embed in MP4/MKV metadata for chapter skip in players
- [ ] **Batch export** — export multiple projects or multiple cuts from one project in sequence
- [ ] **Export transcript format presets** — SRT, VTT, TXT, TXT with timestamps, markdown (done)
### UX polish ✅
- [x] Tooltips on every button/control across all panels
- [x] Loading spinners for waveform, waveform retry button
- [x] Export progress bar (visual, not just text)
- [x] Help panel with full feature documentation
- [x] Keyboard cheatsheet overlay with close button and preset indicator
- [x] First-run welcome overlay with 3-step guide
- [x] `?` keyboard shortcut opens cheatsheet (accessible from Help panel)
- [x] Empty states: MarkersPanel, AIPanel, WaveformTimeline
- [x] Error states: AIPanel with retry, WaveformTimeline with retry
- [x] Auto-save crash recovery every 60s, restore prompt on next launch
- [x] Confirmation dialogs for zone/marker deletion
- [x] Disabled state for all buttons during export/transcription
- [x] Export button disabled when no video loaded
### AI features (local-first moat)
### Consistency ✅
- [x] Mute zone color unified (blue everywhere)
- [x] Disabled opacity unified (40% everywhere)
- [x] Zone list items border radius unified (`rounded-lg`)
- [x] Toolbar button groups separated with visual dividers
- [x] Labels simplified: "Sound Gain", "Speed Adjust", "Trim Silence", "Chapter Marks", "Edit Zones", "Add Clips", "Bkg. Music", "AI Tools"
- [x] Model selector moved to AIPanel reprocess tab
- [x] Orphaned VolumePanel.tsx removed
All powered by the bundled Qwen3 LLM. No API keys, no cloud calls. Features are grouped by how much they contribute to the core workflow.
### Trial & licensing ✅
- [x] Trial duration: 7 days
- [x] Trial bar on welcome screen with days remaining
- [x] Sentinel file prevents deleting trial.json to reset trial
- [x] XOR integrity check prevents editing trial.json timestamp
- [x] `canEdit` defaults to `false` (locked until status check confirms)
- [x] Email confirmation step before license activation (deters key sharing)
- [x] `verify_license` command (verify without caching)
- [x] Expired banner explains what still works (export, loading)
#### Content creation (biggest value)
- [ ] **Smart Shorts finder** — scan the full transcript for self-contained, engaging segments 1090s long. Ranks by: narrative completeness (has a beginning/end), energy cues from transcript sentiment, and topic boundaries. Results shown as a list of suggested cut ranges with preview. One-click export as a separate short video or copy timecodes.
- [ ] **Sound bite / quotable moment finder** — find punchy, standalone sentences that work as clips for social media. Ranks by quotability. Different from Shorts finder: these are <15s, single-sentence, high-impact lines.
- [ ] **Hook analyzer** score the first 30 seconds of the video for engagement. Suggests cuts or rephrases to make the intro stronger. Shows a "hook score" 110.
- [ ] **Title + description generator** suggest 5 titles and a YouTube description from the transcript + markers. One-click copy.
#### Editing acceleration
- [ ] **AI auto-chapters** detect topic shifts from transcript create timeline markers. Uses topic segmentation from LLM, not just silence gaps.
- [ ] **AI Smart Clean** one-pass: filler removal + silence trim + normalize (done)
- [ ] **AI sentence rephrase** right-click word rephrase with AI replace in transcript (also done, uses backend)
- [ ] **AI pacing analysis** flag segments where the speaker talks too fast or too slow for sustained periods. Suggests speed range adjustments or cuts.
- [ ] **AI dead-air finder** finds moments where nothing interesting is said for >5s (rambling, off-topic, false starts). Different from silence trimmer — this is content-based, not audio-level-based.
- [ ] **AI readabilty scan** — flag sentences that are too long, complex, or jargon-heavy for spoken word. Suggests simpler alternatives.
#### Metadata & distribution
- [ ] **AI show notes** — generate title, description, key moments, and timestamps from transcript + markers
- [ ] **AI keyword/tag extraction** — pull out 510 topic tags from the transcript. Useful for YouTube SEO or categorization
- [ ] **AI question finder** — detect all questions asked in the video (speaker or guest). Useful for Q&A videos, AMAs, interviews — jump to each question instantly
- [ ] **AI thumbnail text suggestion** — suggest short overlay text for video thumbnails based on the most compelling line in the video
- [ ] **AI call-to-action finder** — detect where the speaker asks for likes/subscribes/comments. Lets you trim or reposition CTAs
#### Accessibility & compliance
- [ ] **AI content flagging** — flag profanity, sensitive topics, or copyrighted references in the transcript. Color-coded by category. Useful before publishing to restricted platforms
- [ ] **AI language leveling** — rewrite transcript segments at a target reading grade level (e.g., "simplify to 8th-grade level"). Useful for educational content or broad audiences
### Bundled local LLM (killer friction-killer)
The biggest UX gap: users must set up Ollama or paste an API key to use AI features. Bundle two small models — download on first AI use, just like Whisper models.
**Three-tier AI provider choice** (set once, persisted):
| Option | Hardware needed | Download | Setup |
|--------|----------------|----------|-------|
| Qwen3 4B (recommended) | 8GB+ free RAM | 2.5 GB | None — auto-download |
| Qwen3 1.7B (lightweight) | 4GB+ free RAM | 1.0 GB | None — auto-download |
| Ollama (bring your own) | Any | None | User starts Ollama themselves |
Default to Qwen3 4B. If the machine can't meet the RAM threshold (checked via Tauri at runtime), fall back to 1.7B or prompt to set up Ollama.
- [ ] **Integrate llama.cpp Rust bindings** (`llama-cpp-rs` or `candle`) — replace Python `ai_provider.py` calls with native inference for bundled models
- [ ] **Auto-download Qwen3 4B or 1.7B** on first AI action based on hardware check (GGUF Q4_K_M format, ~2.5GB / ~1GB). Same UX as Whisper download: progress bar, resume on interrupt
- [ ] **Model selector** in Settings: "Qwen3 4B (fast, no setup)" vs "Qwen3 1.7B (lightweight)" vs Ollama vs OpenAI vs Claude. Default to Qwen3 4B
- [ ] **Hardware detection on first AI use** — check total system RAM, recommend 4B if ≥8GB free, 1.7B if less. Skip download entirely if machine can't run either (fall back to Ollama/API prompt)
- [ ] **GPU acceleration** — llama.cpp supports CUDA/Metal/Vulkan. Detect at runtime and enable if available
- [ ] **For lightweight AI tasks** (filler detection, chapter titles, summarization) the bundled model handles them directly. Only task requiring heavier reasoning (rephrase, smart speed) get the full model
- [ ] **Remove the Python backend dependency for AI** — once bundled LLM + Whisper.cpp handle everything, no need to ship Python for AI features. One less runtime dependency
- [ ] GGUF model files are cached in app data dir, same as Whisper models
**Why this wins:**
- New users open the app, click "Detect filler words" and it just works. No API signup. No Docker. No "install Ollama" README steps.
- Descript charges $24/mo and still requires internet. CapCut's AI features are cloud-only. TalkEdit gives you local AI with zero setup.
- The same download-on-demand pattern already works for Whisper models — users understand it.
- Two size options means it works on everything from a 16GB gaming laptop to an 8GB office machine.
### Robustness ✅
- [x] React ErrorBoundary
- [x] Store-level input validation (reject NaN, clamp bounds, enforce min zone duration)
- [x] Runtime assertions in critical paths (TranscriptEditor, WaveformTimeline, ExportDialog)
- [x] Auto-save crash recovery
- [x] CI pipeline (GitHub Actions: Rust + Frontend + Python)
- [x] Bad project state recovery (auto-prunes invalid zones on load, Dev Panel reset button)
- [x] 95 frontend tests (editorStore, licenseStore, aiStore, assert)
- [x] 12 Rust tests (licensing, models)
- [x] Canvas zone handles enlarged (r=6), hit area increased
- [x] Search match contrast improved
- [x] Split panes keyboard-accessible (arrow keys, tabIndex, ARIA)
---
## Phase 3: Marketing & launch
## Phase 2: Standout features (post-beta)
### Target audiences (three distinct segments)
### Long-form content
- [x] Chapter-based navigation — markers auto-sorted, click to jump (partially done)
- [x] Per-segment re-transcription (done)
- [x] Append multiple clips into one timeline (done)
- [ ] Project stitching — load multiple `.aive` projects, combine into one export
- [ ] Smart chunking for transcription — for files >2hr
| Segment | Pain point | TalkEdit's message |
|---------|-----------|-------------------|
| **Podcasters** | Editing a 1hr episode takes 4hrs in traditional editors. Descript is expensive and cloud-only. | *"Transcribe, clean, and export your podcast in minutes. One payment, forever."* |
| **YouTube creators (long-form)** | CapCut chokes on 30min+ files. Need AI tools but don't want subscriptions. | *"Edit hour-long videos like a doc. AI chapters, filler removal, Smart Shorts — all local."* |
| **Privacy-conscious / enterprise** | Can't upload content to cloud editors (legal, compliance, NDAs). | *"100% offline. Your video never leaves your machine. No account required."* |
### Export
- [x] YouTube chapters from markers (done)
- [x] Export transcript formats: SRT, VTT, TXT (done)
- [ ] Batch export — multiple projects/cuts in sequence
### AI features
- [x] AI Smart Clean — filler removal + silence trim + normalize (done)
- [x] AI sentence rephrase (done)
- [x] AI clip suggestions for social media (done)
- [ ] Smart Shorts finder — scan transcript for 1090s segments
- [ ] AI auto-chapters — topic detection from transcript
- [ ] AI show notes — title, description, key moments
- [ ] AI dead-air finder — content-based silence detection
### Bundled local LLM
- [ ] Integrate llama.cpp Rust bindings
- [ ] Auto-download Qwen3 on first AI use (4B: 2.5GB / 1.7B: 1GB)
- [ ] Hardware detection at runtime, model selection in Settings
---
## Phase 3: Marketing & launch (post-beta)
### Messaging pillars
1. "The offline video editor that doesn't slow down on long files"
2. "No subscription. One price, owned forever."
3. "Zero-setup AI" — bundled Qwen3, no API keys
4. "Your podcast → 10 TikToks in one click" — Smart Shorts finder
1. **"The offline video editor that doesn't slow down on long files"** — core positioning
2. **"No subscription. One price, owned forever."** — pricing differentiator
3. **"Zero-setup AI"** — bundled Qwen3, no API keys, no Docker, no Ollama
4. **"Your podcast → 10 TikToks in one click"** — Smart Shorts finder hook
### Launch channels
#### Creator communities (highest ROI)
- [ ] **r/podcasting** — post a demo video: "I edited a 1hr podcast in 4 minutes with this free tool." Free trial link. Emphasize offline + no sub.
- [ ] **r/VideoEditing** — comparison post: "TalkEdit vs Descript for long-form." Let the features speak. Include benchmarks (2hr file load time, export speed).
- [ ] **r/selfhosted** — this audience cares deeply about offline/local. Post: "Fully offline Descript alternative I've been building. Built-in local AI, no cloud." Free license giveaways to top commenters.
- [ ] **r/SaaS** — post the journey, get feedback. Good for building awareness among builders who might recommend it.
- [ ] **Hacker News** — "Show HN: I built an offline video editor with bundled local AI." The technical audience will appreciate the Rust + llama.cpp + Whisper stack. Be ready for technical questions.
#### Video demos (the product is visual — show it)
- [ ] **Product Hunt launch** — video + GIF-heavy listing. Tagline: *"Descript for long-form content, 100% offline."* Give away 50 free Pro licenses on launch day.
- [ ] **YouTube demo** — 3-5 min screencap: open 1hr file → auto-transcribe → Smart Clean → Smart Shorts finds 10 clips → export all. No cuts, real-time.
- [ ] **TikTok/Shorts** — 30s clips of the Smart Shorts finder in action. "This tool turned my 1hr podcast into 10 TikToks automatically." Each short is itself a demo of the feature.
#### Earned / low-cost
- [ ] **Offer free Pro licenses** to 20 podcasters with >10K followers in exchange for a public review or mention. Target: Joe Roganstyle solo podcasters who edit their own content.
- [ ] **GitHub release** — tag v1.0.0 with detailed release notes, screenshots, and binary downloads for all three platforms. Encourage issues/feature requests.
- [ ] **Write a "why I built this" post** — submit to Indie Hackers, Medium, Dev.to. Focus on: frustration with Descript pricing, desire for offline tools, the technical challenge of bundling an LLM.
- [ ] **Comparison landing page**`talked.it/vs/descript` with a feature table, pricing comparison, and "privacy" as a highlighted column. SEO target: "Descript alternative"
#### Paid (only after product-market fit is validated)
- [ ] YouTube ads targeting "how to edit a podcast" and "Descript tutorial" search terms
- [ ] Podcast sponsorship on indie podcasting shows (target audience overlap is perfect)
### Free-to-paid funnel
- [ ] 30-day full-feature trial — no credit card required, no account signup
- [ ] After trial: locked editing + AI, but export still works (people can still get value from completed projects)
- [ ] Pro license: $39 one-time. Business license: $79 one-time (priority support, volume licensing)
- [ ] No subscriptions. Emphasize: *"Buy once, don't think about it again."*
### Launch checklist
- [ ] Landing page at talked.it with: feature list, screenshots, pricing, download buttons
- [ ] Demo video (3-5 min walkthrough)
- [ ] 30s Smart Shorts demo clip for TikTok/Shorts
- [ ] Product Hunt listing ready (logo, description, GIFs, launch day plan)
- [ ] Reddit drafts for r/podcasting, r/VideoEditing, r/selfhosted
- [ ] HN Show HN draft
- [ ] 20 free Pro licenses queued for influencers
### Channels
- [ ] r/podcasting, r/VideoEditing, r/selfhosted
- [ ] Product Hunt, Hacker News "Show HN"
- [ ] YouTube demo (3-5 min walkthrough)
- [ ] Free licenses to 20 podcasters for testimonials
- [ ] GitHub v1.0.0 release with binaries
- [ ] Compare page: TalkEdit vs Descript
### Pricing
- 7-day free trial (no CC, no account)
- Pro: $39 one-time
- Business: $79 one-time (priority support, volume licensing)
---
## Phase 4: Post-launch
### Retention
- [ ] In-app changelog on update
- [ ] Email list for major releases (optional, no account required)
- [ ] Community templates/sharing for export presets and filler lists
### Growth features
- [ ] **Sample video download** — "Try without your own media" button downloads a test file + pre-made transcript
- [ ] **Built-in free music library** — 5-10 CC0 loops shipped with the app
- [ ] **Export presets** — community-contributed, loaded from a JSON file
---
## Non-goals (explicitly defer)
## Non-goals (explicitly deferred)
- Cloud sync / collaboration
- Voice cloning / TTS
- Full multi-track NLE timeline (transitions, picture-in-picture, etc.)
- Full multi-track NLE timeline
- Mobile app
- Subscription model
- Training/fine-tuning models in-app
- Image/video generation models (Stable Diffusion, etc.) — text-only LLM is sufficient for transcript tasks
- Image/video generation models