updated features and docs

This commit is contained in:
2026-05-06 16:15:38 -06:00
parent e4484a57f9
commit 813877a7b4
4 changed files with 276 additions and 269 deletions

View File

@ -131,9 +131,13 @@ Features adapted from OpenShot Video Editor that would fill capability gaps.
These aren't features to build — they're things to make more visible in the UI and README:
- **7-day free trial (no CC required)** — full feature access for 7 days with no credit card. Try everything risk-free.
- **One-time purchase, no subscription** — pay once, own forever. No recurring fees, no cloud dependency.
- **100% offline / no account required** — CapCut requires login and sends data to servers. Descript is cloud-first. TalkEdit never leaves the machine.
- **Local AI models** — Ollama support means no API costs and no data leaving the device.
- **Word-level precision** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor.
- **Word-level precision editing** — editing by deleting words (not dragging razor cuts) is faster for talking-head content than any timeline-based editor. Double-click any word to correct its text in-place.
- **Per-segment re-transcription** — select any word range and re-run Whisper on just that segment instead of re-transcribing the entire file.
- **Auto-ducking background music** — add a second audio track that automatically lowers when speech is detected, no manual keyframing needed.
- **Works on long files** — virtualized transcript + chunked waveform handles 1hr+ content that bogs down CapCut.
---
@ -171,3 +175,14 @@ Everything beyond that (picture-in-picture, multi-layer compositing, per-layer k
- [#038] Keyboard shortcuts (Space, J/K/L, arrows, Ctrl+Z/Shift+Z, Ctrl+S, Ctrl+E)
- [#039] Settings panel: AI provider config (Ollama, OpenAI, Claude)
- [#040] Cut/mute range creation on timeline with draggable zone edits and Delete-to-remove
- [#058] Help panel with full feature documentation and searchable index
- [#059] First-run welcome overlay with quick-start guides
- [#060] Auto-save crash recovery — periodic project snapshots, restore on next launch
- [#061] Error boundary + global error logging to `~/.TalkEdit/logs/`
- [#062] Store input validation — Zustand middleware guards against invalid state
- [#063] Trial system — 7-day free trial, sentinel file with integrity check, no CC required
- [#064] License activation — email confirmation flow, offline validation
- [#065] Model management — view/delete downloaded Whisper and DeepFilterNet models
- [#066] Keyboard cheatsheet — press `?` overlay with close button and active preset indicator
- [#067] Visual toolbar — grouped icon buttons with section dividers
- [#068] Backend health check — reconnecting banner when backend is unreachable

183
README.md
View File

@ -1,26 +1,58 @@
# CutScript
# TalkEdit
An open-source, local-first, Descript-like text-based audio and video editor powered by AI. Edit audio/video by editing text — delete a word from the transcript and it's cut from the audio/video.
**Edit video by editing text.** An offline, local-first desktop video editor where deleting a word from the transcript cuts it from the video.
<img width="1034" height="661" alt="image" src="https://github.com/user-attachments/assets/b1ed9505-792e-42ca-bb73-85458d0f02a5" />
<img width="1034" height="661" alt="TalkEdit screenshot" src="https://github.com/user-attachments/assets/b1ed9505-792e-42ca-bb73-85458d0f02a5" />
---
## Architecture
## Features
- **Tauri + React** desktop app with Tailwind CSS
- **FastAPI** Python backend (spawned as child process)
- **WhisperX** for word-level transcription with alignment
- **FFmpeg** for video processing (stream-copy and re-encode)
- **Ollama / OpenAI / Claude** for AI features (filler removal, clip creation)
- **Text-based editing** — delete, reorder, or correct words in the transcript to edit the underlying video. No razor tool, no timeline slicing.
- **Word-level transcription** — Whisper.cpp with per-word timestamps and confidence scores. Low-confidence words get a visual warning.
- **Four zone types** — Cut, Mute, Sound Gain, and Speed Adjust. Create zones on the waveform timeline and drag edges to refine.
- **Waveform timeline** — zoomable, scrollable waveform with playhead scrubbing, zone visualization, markers, chapters, and thumbnail strips.
- **AI-powered editing**
- Filler word detection and removal
- Smart Clean: one-click filler removal + silence trim + noise reduction + loudness normalization
- Clip suggestions for social media shorts
- Sentence rephrase with AI alternatives
- Supports **Ollama** (local), **OpenAI**, and **Claude** backends
- **Background music** — import a second audio track with auto-ducking via sidechain compression.
- **Export** — fast stream-copy or full re-encode to MP4, MOV, WebM, or WAV. Resolution up to 4K.
- **Captions** — generate SRT, VTT, or burn-in ASS subtitles with configurable font, color, and position.
- **Speaker diarization** — identify and label multiple speakers.
- **Audio tools** — noise reduction (DeepFilterNet), loudness normalization (LUFS targeting), background removal (MediaPipe), batch silence removal, video zoom/punch-in.
- **Project save/load** — `.aive` JSON format preserves all edits, zones, markers, and AI config.
- **Customizable hotkeys** — two presets (Standard / Left-hand) with per-key remapping and conflict detection.
- **100% offline, no account required** — everything runs on your machine. No telemetry, no cloud dependency.
- **7-day free trial** with one-time license key purchase. No subscription.
---
## Tech Stack
| Layer | Technology |
|-------|------------|
| Desktop shell | **Tauri 2.0** (Rust) |
| Frontend | **React** + **TypeScript** + **Tailwind CSS** |
| State management | **Zustand** with Zundo undo/redo |
| Transcription | **Whisper.cpp** (word-level timestamps) |
| AI / LLM | **Ollama**, **OpenAI**, **Claude** (plugable backends) |
| Media processing | **FFmpeg** |
| Python services | **FastAPI** (spawned as a child process) |
---
## Quick Start
### Prerequisites
- Node.js 18+
- Python 3.10+
- FFmpeg (in PATH)
- (Optional) Ollama for local AI features
- **Node.js** 18+
- **Python** 3.10+
- **FFmpeg** (in PATH)
- **Rust** toolchain (for Tauri)
- **Ollama** (optional, for local AI features)
### Install
@ -36,65 +68,89 @@ cd backend && pip install -r requirements.txt && cd ..
### Run (Development)
```bash
# Start Tauri dev environment (includes backend + frontend)
# Start everything: backend + frontend + Tauri
npm run dev:tauri
```
Or run them separately:
Or run components separately:
```bash
# Terminal 1: Backend
cd backend && python -m uvicorn main:app --reload --port 8642
# Terminal 1: Python backend
npm run dev:backend
# Terminal 2: Frontend + Tauri
cd frontend && cargo tauri dev
```
### Build
```bash
npm run build:tauri
```
---
## Project Structure
```
talkedit/
├── src-tauri/ # Tauri Rust runtime
├── src-tauri/ # Tauri 2.0 Rust runtime
│ ├── Cargo.toml
│ ├── src/
│ │ ├── main.rs # App entry & backend spawner
│ │ └── commands/ # Tauri IPC handlers
├── frontend/ # React + Vite + Tailwind
│ └── src/
│ ├── components/ # VideoPlayer, TranscriptEditor, etc.
│ ├── store/ # Zustand state (editorStore, aiStore)
│ ├── lib/tauri-bridge.ts # Tauri API polyfill
── types/ # TypeScript interfaces
├── backend/ # FastAPI Python backend
│ ├── main.rs # App entry, backend spawner
│ ├── lib.rs # Command handlers (IPC bridge)
│ ├── transcription.rs # Whisper.cpp integration
── video_editor.rs # FFmpeg-based editing
│ ├── caption_generator.rs
│ ├── diarization.rs
│ ├── ai_provider.rs # Ollama / OpenAI / Claude
│ ├── audio_cleaner.rs
│ ├── background_removal.rs
│ ├── licensing.rs # Trial + key activation
│ ├── models.rs # Shared data types
│ └── paths.rs
├── frontend/ # React + Vite + Tailwind
│ └── src/
│ ├── components/ # UI components
│ │ ├── TranscriptEditor.tsx
│ │ ├── WaveformTimeline.tsx
│ │ ├── VideoPlayer.tsx
│ │ ├── AIPanel.tsx
│ │ ├── ExportDialog.tsx
│ │ ├── SettingsPanel.tsx
│ │ ├── BackgroundMusicPanel.tsx
│ │ ├── MarkersPanel.tsx
│ │ ├── ZoneEditor.tsx
│ │ ├── SilenceTrimmerPanel.tsx
│ │ ├── AppendClipPanel.tsx
│ │ ├── LicenseDialog.tsx
│ │ └── DevPanel.tsx
│ ├── store/ # Zustand state (editorStore, aiStore, settingsStore)
│ ├── hooks/ # Custom React hooks
│ ├── lib/ # Utilities and Tauri bridge
│ └── types/ # TypeScript interfaces
├── backend/ # FastAPI Python services
│ ├── main.py
│ ├── routers/ # API endpoints
├── services/ # Core logic (transcription, editing, AI)
└── utils/ # GPU, cache, audio helpers
└── shared/ # Project schema
│ ├── routers/ # API endpoints
│ ├── transcribe.py
│ ├── ai.py
├── audio.py
│ │ ├── captions.py
│ │ └── export.py
│ ├── services/ # Core logic
│ ├── video_editor.py
│ ├── caption_generator.py
│ ├── ai_provider.py
│ ├── diarization.py
│ ├── audio_cleaner.py
│ ├── background_removal.py
│ └── license_server.py
├── shared/ # Schema definitions (project format)
├── models/ # Whisper model storage
└── docs/ # Documentation
```
## Features
| Feature | Status |
|---------|--------|
| Word-level transcription (WhisperX) | Done |
| Text-based video editing | Done |
| Undo/redo | Done |
| Waveform timeline | Done |
| FFmpeg stream-copy export | Done |
| FFmpeg re-encode (up to 4K) | Done |
| AI filler word removal | Done |
| AI clip creation (Shorts) | Done |
| Ollama + OpenAI + Claude | Done |
| Word-level captions (SRT/VTT/ASS) | Done |
| Caption burn-in on export | Done |
| Studio Sound (DeepFilterNet) | Done |
| Keyboard shortcuts (J/K/L) | Done |
| Speaker diarization | Done |
| Virtualized transcript (react-virtuoso) | Done |
| Encrypted API key storage | Done |
| Project save/load (.cutscript) | Done |
| AI background removal | Planned |
---
## Keyboard Shortcuts
@ -102,28 +158,19 @@ talkedit/
|-----|--------|
| Space | Play / Pause |
| J / K / L | Reverse / Pause / Forward |
| I / O | Mark In / Mark Out |
| ← / → | Seek ±5 seconds |
| Delete | Delete selected words |
| Delete | Delete selected words or zones |
| Ctrl+Z | Undo |
| Ctrl+Shift+Z | Redo |
| Ctrl+S | Save project |
| Ctrl+E | Export |
| Ctrl+F | Search transcript |
| Ctrl+Scroll | Zoom waveform |
| ? | Shortcut cheatsheet |
## API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /health | Health check |
| POST | /transcribe | Transcribe video with WhisperX |
| POST | /export | Export edited video (stream copy or re-encode) |
| POST | /ai/filler-removal | Detect filler words via LLM |
| POST | /ai/create-clip | AI-suggested clips for shorts |
| GET | /ai/ollama-models | List local Ollama models |
| POST | /captions | Generate SRT/VTT/ASS captions |
| POST | /audio/clean | Noise reduction (DeepFilterNet) |
| GET | /audio/capabilities | Check audio processing availability |
---
## License
MIT License — see [LICENSE](LICENSE) for details.
Source code is MIT — see [LICENSE](LICENSE) for details. The distributed binary includes a 7-day free trial requiring a one-time license key purchase for continued use.

View File

@ -1,50 +1,62 @@
# TalkEdit — Tech Stack, Tools, and Planned Features
# TalkEdit — Tech Stack, Tools, and Features
This document summarizes the chosen technology, tooling, the full planned feature set for the MVP, recommended additions, removals, and items to put on the back burner.
This document summarizes the chosen technology, tooling, the full feature set, recommended additions, and items on the back burner.
## Overview
- Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.
## Tech Stack
- Frontend: React + Vite + Tailwind CSS + shadcn/ui
- Backend: Tauri 2.0 (Rust) for file I/O, invoking native binaries, and exposing commands to the UI
- Transcription: Whisper.cpp (Rust bindings like `whisper-rs` / `whisper-cpp-sys`) — word-level timestamps
- Audio/Video Processing: FFmpeg invoked from Rust (or `ffmpeg-next` Rust crate)
- State: Zustand (in-frontend store)
- Frontend: React 19 + Vite + TypeScript + Tailwind CSS + Zustand (with zundo undo/redo) + Virtuoso (virtualized transcript)
- Backend: Tauri 2.0 (Rust) for file I/O, licensing, licensing crypto (Ed25519), model management, error logging
- Transcription: Python faster-whisper with WhisperX for word-level alignment. Models downloaded on demand.
- Audio/Video Processing: FFmpeg invoked from Rust via Python scripts (video_editor.py, audio_cleaner.py, caption_generator.py)
- AI: Ollama, OpenAI, Claude through Python ai_provider.py. Bundled Qwen3 LLM planned.
- State: Zustand (in-frontend store) + zundo middleware for undo/redo history
- Packaging: Tauri `tauri build` for cross-platform installers
- Optional local tools: Ollama (optional local LLMs) for advanced on-device heuristics
## Developer Tools
- Rust toolchain (cargo, rustc)
- Node.js + npm/yarn for frontend
- Node.js + npm for frontend
- Python 3.11+ (faster-whisper, WhisperX, AI providers)
- FFmpeg binaries (platform-specific; bundled or downloaded at install)
- Build/test: Tauri CLI, Vite dev server
- Testing: Vitest (frontend), cargo test (Rust), pytest (Python)
- CI: GitHub Actions (Rust clippy/test, Frontend tsc/vitest, Python pytest)
## MVP Feature List (Planned)
1. Drag-and-drop import (audio/video auto audio-extract)
2. One-click local transcription (model selector: tiny/base → larger models)
3. Scrollable, Google-Doc-style transcript editor
## Implemented Features
- [x] 1. Media import via file dialog (audio/video auto audio-extract)
- [x] 2. One-click local transcription with model selector (tiny/base → larger models) and model-size chooser
- [x] 3. Scrollable, Google-Doc-style transcript editor (Virtuoso virtualized)
- Click word → seek video/audio
- Highlight + Delete → remove corresponding media segment (smart 150250ms fades)
4. One-click "Clean it" button
- Remove fillers (configurable list)
- Remove long pauses (>0.8s) by default
5. One-click audio polish chain (FFmpeg): normalize, light compression, basic noise reduction
6. Preview with synced playback, undo/redo, project save/load
7. Export MP4/audio with optional SRT/VTT captions and burned-in captions
- Select words → cut corresponding media segment (smart 150250ms fades)
- [x] 4. Smart Cleanup
- Filler word removal (configurable list per-project)
- Silence trimming
- [x] 5. Audio Polish chain (FFmpeg): normalize, compression, noise reduction
- [x] 6. Preview with synced playback, undo/redo (zundo), project save/load
- [x] 7. Export MP4/audio with SRT/VTT/ASS captions (speaker-labeled)
- [x] 8. Speaker diarization
- [x] 9. Custom filler lists per-project
- [x] 10. Background music with auto-ducking
- [x] 11. Append clips (concatenation)
- [x] 12. Settings: AI provider config (Ollama, OpenAI, Claude)
- [x] 13. Keyboard shortcuts with custom remapping
- [x] 14. Help panel + cheatsheet
- [x] 15. 7-day licensing with Ed25519-signed license keys
## Recommended Additions (near-term, high ROI)
- Model-size chooser + progressive fallback (start fast, upgrade model later)
- Local GPU/CPU detection & recommended model/settings UI
- Per-project incremental transcription: re-run only edited segments
- "Preview cleaning" dry-run that highlights candidate removals before applying
- Export size/time estimator and suggested export presets
- Custom filler lists per-project and import/export of filler lists
- High-quality offline captions export (SRT + VTT + speaker labels)
- Accessibility export presets (podcast vs YouTube presets)
- [ ] Local GPU/CPU detection & recommended model/settings UI
- [ ] Per-project incremental transcription: re-run only edited segments
- [ ] "Preview cleaning" dry-run that highlights candidate removals before applying
- [ ] Export size/time estimator and suggested export presets
- [ ] Accessibility export presets (podcast vs YouTube presets)
- [ ] Bundled Qwen3 LLM for offline AI features
## Remove / Defer (Back Burner)
These broaden scope or add legal/privacy surface — defer for now.
- Voice cloning / TTS: DEFER
- Multi-track, full timeline NLE features: DEFER
- Real-time collaboration / cloud sync: DEFER
@ -52,18 +64,20 @@ These broaden scope or add legal/privacy surface — defer for now.
## Risks & Mitigations
- Large model sizes: don't bundle large models; download on-demand and document storage location.
- Timestamp accuracy: provide manual word-adjust UI and per-segment re-run.
- Timestamp accuracy: WhisperX word-level alignment + manual per-segment re-run available.
- FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.
## Prioritized Quick Wins
1. Model chooser UI + auto-fallback settings
1. Per-project incremental transcription
2. "Preview cleaning" dry-run UI
3. Per-project incremental transcription saving
3. Export presets (podcast vs YouTube)
## Next Steps for Implementation
- Add model chooser UI and capability detection early in the frontend iteration.
- Implement Rust transcription command and a compact API for incremental transcription.
- Implement FFmpeg polish templates and a minimal preview pipeline.
- Bundle Qwen3 LLM for offline AI processing.
- Implement incremental transcription to speed up re-editing workflows.
- Add export presets and size estimation.
- Improve GPU/CPU detection and model recommendations.
---
Generated as requested to capture tech, tools, planned features, and the recommended add/remove/defer list.
Generated to capture tech, tools, implemented features, and the recommended add/remove/defer list.

261
plan.md
View File

@ -4,193 +4,124 @@
TalkEdit's defensible position: **works on hour+ files without degrading**, fully offline, one-time payment. No competitor owns this — Descript chokes on long content, CapCut limits mobile uploads, and both require accounts.
---
## Phase 1: Polish (pre-launch, do this first)
### Reliability & error handling
- [ ] Handle backend crashes gracefully — if Python backend dies, show a reconnect banner, don't leave UI in broken state
- [ ] Transcription failure recovery — show which model/download step failed, suggest alternatives
- [ ] Export failure reporting — surface FFmpeg stderr to user in a readable way
- [ ] File locking / concurrent access — prevent exporting while transcription is running and vice versa
### UX roughness
- [ ] Drag-and-drop file import onto the welcome screen
- [ ] Loading spinners for every async action with descriptive messages ("Downloading model...", "Analyzing silence...")
- [ ] Undo/redo visual feedback — toast notification "Undo: removed cut"
- [ ] `?` keyboard shortcut opens a proper cheat sheet modal
- [ ] Save indicator — dot or "unsaved" badge next to project name
- [ ] Disabled state for all buttons during export/transcription to prevent double-clicks
- [ ] Empty states for every panel ("Add your first marker", "No silence detected", etc.)
### Trial & licensing UX
- [ ] Wire up the "Activate" link to a real payment page (not placeholder `talked.it`)
- [ ] Show days remaining in the welcome screen bar (done)
- [ ] After trial expires, clearly explain what still works (export, loading) vs what's locked (editing, AI)
- [ ] License key field should handle paste + validate format client-side before sending to Rust
### Performance
- [ ] Lazy-load the waveform for very long files (>2hr) — don't fetch entire WAV at once
- [ ] Virtualize the waveform canvas rendering (only draw visible portion), not just transcript
- [ ] Debounce project auto-save
**Current status (May 2026):** All core editing features are built and stable. Polish pass completed. 107 automated tests (95 frontend + 12 Rust). Ready for beta testing.
---
## Phase 2: Standout features (own the niche)
## Phase 1: Polish ✅ COMPLETED
### Long-form content (win here)
- [ ] **Chapter-based navigation** — markers auto-sorted, click to jump, usable on 3hr files (partially done)
- [ ] **Per-segment re-transcription** without losing surrounding context (done)
- [ ] **Append multiple clips** into one timeline (done)
- [ ] **Project stitching** — load multiple `.aive` projects, combine into one export
- [ ] **Session memory** — re-open last project on launch, auto-restore cursor position and scroll
- [ ] **Smart chunking** for transcription — for files >2hr, transcribe in overlapping chunks and stitch seamlessly
### Reliability & error handling ✅
- [x] Backend health check — polls `/health` every 30s, shows reconnecting banner
- [x] Export failure reporting — surfaces FFmpeg stderr with copy-to-clipboard
- [x] React ErrorBoundary catches render crashes, shows fallback with reload
- [x] Global JS error logging — `window.onerror` + `onunhandledrejection` logged to Rust backend
### Export differentiation
- [ ] **YouTube chapters** — auto-generate from markers, copy as timestamps (done)
- [ ] **Export chapter markers** — embed in MP4/MKV metadata for chapter skip in players
- [ ] **Batch export** — export multiple projects or multiple cuts from one project in sequence
- [ ] **Export transcript format presets** — SRT, VTT, TXT, TXT with timestamps, markdown (done)
### UX polish ✅
- [x] Tooltips on every button/control across all panels
- [x] Loading spinners for waveform, waveform retry button
- [x] Export progress bar (visual, not just text)
- [x] Help panel with full feature documentation
- [x] Keyboard cheatsheet overlay with close button and preset indicator
- [x] First-run welcome overlay with 3-step guide
- [x] `?` keyboard shortcut opens cheatsheet (accessible from Help panel)
- [x] Empty states: MarkersPanel, AIPanel, WaveformTimeline
- [x] Error states: AIPanel with retry, WaveformTimeline with retry
- [x] Auto-save crash recovery every 60s, restore prompt on next launch
- [x] Confirmation dialogs for zone/marker deletion
- [x] Disabled state for all buttons during export/transcription
- [x] Export button disabled when no video loaded
### AI features (local-first moat)
### Consistency ✅
- [x] Mute zone color unified (blue everywhere)
- [x] Disabled opacity unified (40% everywhere)
- [x] Zone list items border radius unified (`rounded-lg`)
- [x] Toolbar button groups separated with visual dividers
- [x] Labels simplified: "Sound Gain", "Speed Adjust", "Trim Silence", "Chapter Marks", "Edit Zones", "Add Clips", "Bkg. Music", "AI Tools"
- [x] Model selector moved to AIPanel reprocess tab
- [x] Orphaned VolumePanel.tsx removed
All powered by the bundled Qwen3 LLM. No API keys, no cloud calls. Features are grouped by how much they contribute to the core workflow.
### Trial & licensing ✅
- [x] Trial duration: 7 days
- [x] Trial bar on welcome screen with days remaining
- [x] Sentinel file prevents deleting trial.json to reset trial
- [x] XOR integrity check prevents editing trial.json timestamp
- [x] `canEdit` defaults to `false` (locked until status check confirms)
- [x] Email confirmation step before license activation (deters key sharing)
- [x] `verify_license` command (verify without caching)
- [x] Expired banner explains what still works (export, loading)
#### Content creation (biggest value)
- [ ] **Smart Shorts finder** — scan the full transcript for self-contained, engaging segments 1090s long. Ranks by: narrative completeness (has a beginning/end), energy cues from transcript sentiment, and topic boundaries. Results shown as a list of suggested cut ranges with preview. One-click export as a separate short video or copy timecodes.
- [ ] **Sound bite / quotable moment finder** — find punchy, standalone sentences that work as clips for social media. Ranks by quotability. Different from Shorts finder: these are <15s, single-sentence, high-impact lines.
- [ ] **Hook analyzer** score the first 30 seconds of the video for engagement. Suggests cuts or rephrases to make the intro stronger. Shows a "hook score" 110.
- [ ] **Title + description generator** suggest 5 titles and a YouTube description from the transcript + markers. One-click copy.
#### Editing acceleration
- [ ] **AI auto-chapters** detect topic shifts from transcript create timeline markers. Uses topic segmentation from LLM, not just silence gaps.
- [ ] **AI Smart Clean** one-pass: filler removal + silence trim + normalize (done)
- [ ] **AI sentence rephrase** right-click word rephrase with AI replace in transcript (also done, uses backend)
- [ ] **AI pacing analysis** flag segments where the speaker talks too fast or too slow for sustained periods. Suggests speed range adjustments or cuts.
- [ ] **AI dead-air finder** finds moments where nothing interesting is said for >5s (rambling, off-topic, false starts). Different from silence trimmer — this is content-based, not audio-level-based.
- [ ] **AI readabilty scan** — flag sentences that are too long, complex, or jargon-heavy for spoken word. Suggests simpler alternatives.
#### Metadata & distribution
- [ ] **AI show notes** — generate title, description, key moments, and timestamps from transcript + markers
- [ ] **AI keyword/tag extraction** — pull out 510 topic tags from the transcript. Useful for YouTube SEO or categorization
- [ ] **AI question finder** — detect all questions asked in the video (speaker or guest). Useful for Q&A videos, AMAs, interviews — jump to each question instantly
- [ ] **AI thumbnail text suggestion** — suggest short overlay text for video thumbnails based on the most compelling line in the video
- [ ] **AI call-to-action finder** — detect where the speaker asks for likes/subscribes/comments. Lets you trim or reposition CTAs
#### Accessibility & compliance
- [ ] **AI content flagging** — flag profanity, sensitive topics, or copyrighted references in the transcript. Color-coded by category. Useful before publishing to restricted platforms
- [ ] **AI language leveling** — rewrite transcript segments at a target reading grade level (e.g., "simplify to 8th-grade level"). Useful for educational content or broad audiences
### Bundled local LLM (killer friction-killer)
The biggest UX gap: users must set up Ollama or paste an API key to use AI features. Bundle two small models — download on first AI use, just like Whisper models.
**Three-tier AI provider choice** (set once, persisted):
| Option | Hardware needed | Download | Setup |
|--------|----------------|----------|-------|
| Qwen3 4B (recommended) | 8GB+ free RAM | 2.5 GB | None — auto-download |
| Qwen3 1.7B (lightweight) | 4GB+ free RAM | 1.0 GB | None — auto-download |
| Ollama (bring your own) | Any | None | User starts Ollama themselves |
Default to Qwen3 4B. If the machine can't meet the RAM threshold (checked via Tauri at runtime), fall back to 1.7B or prompt to set up Ollama.
- [ ] **Integrate llama.cpp Rust bindings** (`llama-cpp-rs` or `candle`) — replace Python `ai_provider.py` calls with native inference for bundled models
- [ ] **Auto-download Qwen3 4B or 1.7B** on first AI action based on hardware check (GGUF Q4_K_M format, ~2.5GB / ~1GB). Same UX as Whisper download: progress bar, resume on interrupt
- [ ] **Model selector** in Settings: "Qwen3 4B (fast, no setup)" vs "Qwen3 1.7B (lightweight)" vs Ollama vs OpenAI vs Claude. Default to Qwen3 4B
- [ ] **Hardware detection on first AI use** — check total system RAM, recommend 4B if ≥8GB free, 1.7B if less. Skip download entirely if machine can't run either (fall back to Ollama/API prompt)
- [ ] **GPU acceleration** — llama.cpp supports CUDA/Metal/Vulkan. Detect at runtime and enable if available
- [ ] **For lightweight AI tasks** (filler detection, chapter titles, summarization) the bundled model handles them directly. Only task requiring heavier reasoning (rephrase, smart speed) get the full model
- [ ] **Remove the Python backend dependency for AI** — once bundled LLM + Whisper.cpp handle everything, no need to ship Python for AI features. One less runtime dependency
- [ ] GGUF model files are cached in app data dir, same as Whisper models
**Why this wins:**
- New users open the app, click "Detect filler words" and it just works. No API signup. No Docker. No "install Ollama" README steps.
- Descript charges $24/mo and still requires internet. CapCut's AI features are cloud-only. TalkEdit gives you local AI with zero setup.
- The same download-on-demand pattern already works for Whisper models — users understand it.
- Two size options means it works on everything from a 16GB gaming laptop to an 8GB office machine.
### Robustness ✅
- [x] React ErrorBoundary
- [x] Store-level input validation (reject NaN, clamp bounds, enforce min zone duration)
- [x] Runtime assertions in critical paths (TranscriptEditor, WaveformTimeline, ExportDialog)
- [x] Auto-save crash recovery
- [x] CI pipeline (GitHub Actions: Rust + Frontend + Python)
- [x] Bad project state recovery (auto-prunes invalid zones on load, Dev Panel reset button)
- [x] 95 frontend tests (editorStore, licenseStore, aiStore, assert)
- [x] 12 Rust tests (licensing, models)
- [x] Canvas zone handles enlarged (r=6), hit area increased
- [x] Search match contrast improved
- [x] Split panes keyboard-accessible (arrow keys, tabIndex, ARIA)
---
## Phase 3: Marketing & launch
## Phase 2: Standout features (post-beta)
### Target audiences (three distinct segments)
### Long-form content
- [x] Chapter-based navigation — markers auto-sorted, click to jump (partially done)
- [x] Per-segment re-transcription (done)
- [x] Append multiple clips into one timeline (done)
- [ ] Project stitching — load multiple `.aive` projects, combine into one export
- [ ] Smart chunking for transcription — for files >2hr
| Segment | Pain point | TalkEdit's message |
|---------|-----------|-------------------|
| **Podcasters** | Editing a 1hr episode takes 4hrs in traditional editors. Descript is expensive and cloud-only. | *"Transcribe, clean, and export your podcast in minutes. One payment, forever."* |
| **YouTube creators (long-form)** | CapCut chokes on 30min+ files. Need AI tools but don't want subscriptions. | *"Edit hour-long videos like a doc. AI chapters, filler removal, Smart Shorts — all local."* |
| **Privacy-conscious / enterprise** | Can't upload content to cloud editors (legal, compliance, NDAs). | *"100% offline. Your video never leaves your machine. No account required."* |
### Export
- [x] YouTube chapters from markers (done)
- [x] Export transcript formats: SRT, VTT, TXT (done)
- [ ] Batch export — multiple projects/cuts in sequence
### AI features
- [x] AI Smart Clean — filler removal + silence trim + normalize (done)
- [x] AI sentence rephrase (done)
- [x] AI clip suggestions for social media (done)
- [ ] Smart Shorts finder — scan transcript for 1090s segments
- [ ] AI auto-chapters — topic detection from transcript
- [ ] AI show notes — title, description, key moments
- [ ] AI dead-air finder — content-based silence detection
### Bundled local LLM
- [ ] Integrate llama.cpp Rust bindings
- [ ] Auto-download Qwen3 on first AI use (4B: 2.5GB / 1.7B: 1GB)
- [ ] Hardware detection at runtime, model selection in Settings
---
## Phase 3: Marketing & launch (post-beta)
### Messaging pillars
1. "The offline video editor that doesn't slow down on long files"
2. "No subscription. One price, owned forever."
3. "Zero-setup AI" — bundled Qwen3, no API keys
4. "Your podcast → 10 TikToks in one click" — Smart Shorts finder
1. **"The offline video editor that doesn't slow down on long files"** — core positioning
2. **"No subscription. One price, owned forever."** — pricing differentiator
3. **"Zero-setup AI"** — bundled Qwen3, no API keys, no Docker, no Ollama
4. **"Your podcast → 10 TikToks in one click"** — Smart Shorts finder hook
### Launch channels
#### Creator communities (highest ROI)
- [ ] **r/podcasting** — post a demo video: "I edited a 1hr podcast in 4 minutes with this free tool." Free trial link. Emphasize offline + no sub.
- [ ] **r/VideoEditing** — comparison post: "TalkEdit vs Descript for long-form." Let the features speak. Include benchmarks (2hr file load time, export speed).
- [ ] **r/selfhosted** — this audience cares deeply about offline/local. Post: "Fully offline Descript alternative I've been building. Built-in local AI, no cloud." Free license giveaways to top commenters.
- [ ] **r/SaaS** — post the journey, get feedback. Good for building awareness among builders who might recommend it.
- [ ] **Hacker News** — "Show HN: I built an offline video editor with bundled local AI." The technical audience will appreciate the Rust + llama.cpp + Whisper stack. Be ready for technical questions.
#### Video demos (the product is visual — show it)
- [ ] **Product Hunt launch** — video + GIF-heavy listing. Tagline: *"Descript for long-form content, 100% offline."* Give away 50 free Pro licenses on launch day.
- [ ] **YouTube demo** — 3-5 min screencap: open 1hr file → auto-transcribe → Smart Clean → Smart Shorts finds 10 clips → export all. No cuts, real-time.
- [ ] **TikTok/Shorts** — 30s clips of the Smart Shorts finder in action. "This tool turned my 1hr podcast into 10 TikToks automatically." Each short is itself a demo of the feature.
#### Earned / low-cost
- [ ] **Offer free Pro licenses** to 20 podcasters with >10K followers in exchange for a public review or mention. Target: Joe Roganstyle solo podcasters who edit their own content.
- [ ] **GitHub release** — tag v1.0.0 with detailed release notes, screenshots, and binary downloads for all three platforms. Encourage issues/feature requests.
- [ ] **Write a "why I built this" post** — submit to Indie Hackers, Medium, Dev.to. Focus on: frustration with Descript pricing, desire for offline tools, the technical challenge of bundling an LLM.
- [ ] **Comparison landing page**`talked.it/vs/descript` with a feature table, pricing comparison, and "privacy" as a highlighted column. SEO target: "Descript alternative"
#### Paid (only after product-market fit is validated)
- [ ] YouTube ads targeting "how to edit a podcast" and "Descript tutorial" search terms
- [ ] Podcast sponsorship on indie podcasting shows (target audience overlap is perfect)
### Free-to-paid funnel
- [ ] 30-day full-feature trial — no credit card required, no account signup
- [ ] After trial: locked editing + AI, but export still works (people can still get value from completed projects)
- [ ] Pro license: $39 one-time. Business license: $79 one-time (priority support, volume licensing)
- [ ] No subscriptions. Emphasize: *"Buy once, don't think about it again."*
### Launch checklist
- [ ] Landing page at talked.it with: feature list, screenshots, pricing, download buttons
- [ ] Demo video (3-5 min walkthrough)
- [ ] 30s Smart Shorts demo clip for TikTok/Shorts
- [ ] Product Hunt listing ready (logo, description, GIFs, launch day plan)
- [ ] Reddit drafts for r/podcasting, r/VideoEditing, r/selfhosted
- [ ] HN Show HN draft
- [ ] 20 free Pro licenses queued for influencers
### Channels
- [ ] r/podcasting, r/VideoEditing, r/selfhosted
- [ ] Product Hunt, Hacker News "Show HN"
- [ ] YouTube demo (3-5 min walkthrough)
- [ ] Free licenses to 20 podcasters for testimonials
- [ ] GitHub v1.0.0 release with binaries
- [ ] Compare page: TalkEdit vs Descript
### Pricing
- 7-day free trial (no CC, no account)
- Pro: $39 one-time
- Business: $79 one-time (priority support, volume licensing)
---
## Phase 4: Post-launch
### Retention
- [ ] In-app changelog on update
- [ ] Email list for major releases (optional, no account required)
- [ ] Community templates/sharing for export presets and filler lists
### Growth features
- [ ] **Sample video download** — "Try without your own media" button downloads a test file + pre-made transcript
- [ ] **Built-in free music library** — 5-10 CC0 loops shipped with the app
- [ ] **Export presets** — community-contributed, loaded from a JSON file
---
## Non-goals (explicitly defer)
## Non-goals (explicitly deferred)
- Cloud sync / collaboration
- Voice cloning / TTS
- Full multi-track NLE timeline (transitions, picture-in-picture, etc.)
- Full multi-track NLE timeline
- Mobile app
- Subscription model
- Training/fine-tuning models in-app
- Image/video generation models (Stable Diffusion, etc.) — text-only LLM is sufficient for transcript tasks
- Image/video generation models