diff --git a/FEATURES.md b/FEATURES.md index fb26ccf..561a3ae 100644 --- a/FEATURES.md +++ b/FEATURES.md @@ -98,6 +98,35 @@ All AI features use the existing Ollama/OpenAI/Claude provider config β€” no new --- +## 🎬 OpenShot-inspired features + +Features adapted from OpenShot Video Editor that would fill capability gaps. + +### High-value + +- [ ] **Keyframe animations** β€” animate clip position, scale, opacity, and effects over time. Quadratic bezier, linear, or constant interpolation. +- [ ] **Unlimited tracks / layers** β€” overlay images, watermarks, background videos. Any transparency shows through to layer below. +- [ ] **Video transitions** β€” crossfade, wipe, dissolve between clips. Over 400 transition types. Auto-create transition when overlapping two clips. +- [ ] **Title / text overlays** β€” built-in title editor with 40+ vector templates. SVG-based. Adjust font, color, size, position. +- [ ] **Digital video effects** β€” brightness, contrast, hue, greyscale, saturation, chroma key (bluescreen/greenscreen). Drag onto clip, animate properties. +- [ ] **Chroma key / greenscreen** β€” wire up existing `background_removal.rs` as a per-clip effect with color picker + tolerance. + +### Medium-value + +- [ ] **Time-mapping and speed ramps** β€” animate speed up/down within a clip using keyframes (vs current constant-speed zones). +- [ ] **Audio split & per-channel mixing** β€” split audio from video clip, adjust each channel independently. +- [ ] **Frame-accurate stepping** β€” arrow keys to step frame by frame through the video. +- [ ] **Clip trimming on timeline** β€” drag clip edges to trim in/out points directly on the waveform timeline. +- [ ] **Snapping** β€” magnetic snap to markers, clip edges, playhead, zone boundaries. + +### Lower priority (partial overlap with existing features) + +- [ ] **Cross-platform project files** β€” save and open `.aive` projects across OSes (already works via JSON). +- [ ] **Desktop drag-and-drop** β€” drag video/audio/image files from file manager directly onto the timeline or welcome screen. +- [ ] **Multi-format export presets** β€” common format presets in export dialog (YouTube, Twitter/Instagram, Podcast). + +--- + ## πŸ’‘ TalkEdit competitive advantages to lean into These aren't features to build β€” they're things to make more visible in the UI and README: diff --git a/plan.md b/plan.md index 4f0d3b1..dfe9499 100644 --- a/plan.md +++ b/plan.md @@ -1,81 +1,158 @@ -# Plan for Building TalkEdit (Whisper.cpp + Tauri) +# TalkEdit β€” Launch Plan -Based on your original idea summary and our discussions, here's a detailed plan to build a standalone, local audio/video editor app. We'll modify CutScript as the base, migrate to **Tauri 2.0** (Rust backend + React frontend) for tiny, dependency-free installers, and use **Whisper.cpp** for fast, accurate transcription. This keeps the scope minimal, focuses on text-based editing for spoken content, and targets podcasters/YouTubers. +## Niche: "Descript for long-form content" -## 1. Overview -- **Goal**: Create an offline Descript alternative with word-level editing, transcription, and export. Users download one file (~10–20MB), install, and runβ€”no Python, FFmpeg, or external deps. -- **Why This Stack**: Tauri bundles everything into a native app; Whisper.cpp (C++ lib) integrates seamlessly with Rust for CPU-efficient transcription. Faster than rebuilding from scratch. -- **Target Users**: Creators editing podcasts/videos; free core + Pro upgrades. -- **Key Differentiators**: Fully local, text-based editing like Google Docs, smart cuts with fades. +TalkEdit's defensible position: **works on hour+ files without degrading**, fully offline, one-time payment. No competitor owns this β€” Descript chokes on long content, CapCut limits mobile uploads, and both require accounts. -## 2. Tech Stack -- **Frontend**: React + Vite + Tailwind CSS + shadcn/ui (from CutScript; minimal changes). -- **Backend**: Tauri 2.0 (Rust) – handles file I/O, FFmpeg calls, Whisper.cpp integration. -- **Transcription**: Whisper.cpp (via Rust bindings like `whisper-cpp-sys` or `whisper-rs`). -- **Audio/Video Processing**: FFmpeg (bundled or called via Rust wrappers like `ffmpeg-next`). -- **State Management**: Zustand (from CutScript). -- **Packaging**: Tauri's `tauri build` for cross-platform installers. -- **AI Features**: Local models only (no APIs); optional Ollama for fillers. +--- -## 3. Step-by-Step Development Plan -1. **Set Up Tauri in CutScript** (1–2 weeks): - - Install `tauri-cli` globally. - - In CutScript root: `npx tauri init` (choose Rust backend, link to existing React frontend). - - Migrate Electron main.js to Tauri's `src/main.rs` (handle window, file dialogs). - - Update `tauri.conf.json` for app metadata, bundle settings. +## Phase 1: Polish (pre-launch, do this first) -2. **Integrate Whisper.cpp in Rust** (2–3 weeks): - - Add `whisper-cpp` as a dependency in `Cargo.toml`. - - Create a Rust module for transcription: Load models, process audio, return word-level timestamps. - - Replace Python backend calls with Tauri commands (e.g., `invoke` from frontend to Rust for transcription). - - Handle model downloads on first run (store in app data dir). +### Reliability & error handling +- [ ] Handle backend crashes gracefully β€” if Python backend dies, show a reconnect banner, don't leave UI in broken state +- [ ] Transcription failure recovery β€” show which model/download step failed, suggest alternatives +- [ ] Export failure reporting β€” surface FFmpeg stderr to user in a readable way +- [ ] File locking / concurrent access β€” prevent exporting while transcription is running and vice versa -3. **Migrate Audio/Video Logic** (2 weeks): - - Port FFmpeg calls to Rust (use `ffmpeg-next` for cutting/export). - - Implement segment calculation: From edited transcript, build keep_segments with padding/fades. - - Add audio cleaning (noise reduction via bundled tools or Rust libs). +### UX roughness +- [ ] Drag-and-drop file import onto the welcome screen +- [ ] Loading spinners for every async action with descriptive messages ("Downloading model...", "Analyzing silence...") +- [ ] Undo/redo visual feedback β€” toast notification "Undo: removed cut" +- [ ] `?` keyboard shortcut opens a proper cheat sheet modal +- [ ] Save indicator β€” dot or "unsaved" badge next to project name +- [ ] Disabled state for all buttons during export/transcription to prevent double-clicks +- [ ] Empty states for every panel ("Add your first marker", "No silence detected", etc.) -4. **Frontend Polish** (1–2 weeks): - - Update UI for Tauri (file dialogs via `tauri-plugin-dialog`). - - Refine transcript editor: Better timestamp syncing, manual adjustments. - - Add export options (MP4 with subs, audio-only). +### Trial & licensing UX +- [ ] Wire up the "Activate" link to a real payment page (not placeholder `talked.it`) +- [ ] Show days remaining in the welcome screen bar (done) +- [ ] After trial expires, clearly explain what still works (export, loading) vs what's locked (editing, AI) +- [ ] License key field should handle paste + validate format client-side before sending to Rust -5. **Testing & Packaging** (1 week): - - Test on Windows/macOS/Linux; ensure Whisper runs offline. - - Bundle with `tauri build`; verify no external deps. - - Add auto-updater for Pro features. +### Performance +- [ ] Lazy-load the waveform for very long files (>2hr) β€” don't fetch entire WAV at once +- [ ] Virtualize the waveform canvas rendering (only draw visible portion), not just transcript +- [ ] Debounce project auto-save -6. **Launch & Iterate** (Ongoing): - - Open-source core on GitHub. - - Market on Product Hunt, Reddit; gather feedback. +--- -## 4. MVP Features (Minimal but Useful) -Focus on what creators need for spoken content: -- **Drag-and-drop import**: Audio/video files; auto-extract audio. -- **One-click transcription**: Whisper.cpp with model choice (Fast - less accurate: tiny/base; Slow - more accurate: small/medium/large). -- **Text-based editing**: Scrollable transcript; click word β†’ jump to video; select/delete words β†’ auto-cut audio with 150ms fades. -- **Smart cleanup**: Remove fillers ("um", pauses >0.8s) via local AI. -- **Preview & Export**: Synced preview; export MP4/audio with optional SRT subs. -- **Undo/Redo**: Full edit history. +## Phase 2: Standout features (own the niche) -No multi-track, voice cloning, or collaborationβ€”keep it simple. +### Long-form content (win here) +- [ ] **Chapter-based navigation** β€” markers auto-sorted, click to jump, usable on 3hr files (partially done) +- [ ] **Per-segment re-transcription** without losing surrounding context (done) +- [ ] **Append multiple clips** into one timeline (done) +- [ ] **Project stitching** β€” load multiple `.aive` projects, combine into one export +- [ ] **Session memory** β€” re-open last project on launch, auto-restore cursor position and scroll +- [ ] **Smart chunking** for transcription β€” for files >2hr, transcribe in overlapping chunks and stitch seamlessly -## 4. Notes -- Consider adding Parakeet TDT as a transcription option in the future for users who want alternatives to Whisper. +### Export differentiation +- [ ] **YouTube chapters** β€” auto-generate from markers, copy as timestamps (done) +- [ ] **Export chapter markers** β€” embed in MP4/MKV metadata for chapter skip in players +- [ ] **Batch export** β€” export multiple projects or multiple cuts from one project in sequence +- [ ] **Export transcript format presets** β€” SRT, VTT, TXT, TXT with timestamps, markdown (done) -## 5. Monetization Model -- **Free Forever**: Core editing/transcription (unlimited local use). -- **Pro License** ($29–49 one-time): Batch processing, high-quality voices (if adding TTS), custom presets, priority support. -- **Optional Add-Ons**: Cloud credits for long videos (rarely needed). +### AI features (local-first moat) -## 6. Timeline & Milestones -- **Weeks 1–4**: Tauri setup + Whisper integration. -- **Weeks 5–6**: Audio logic migration + frontend tweaks. -- **Weeks 7–8**: Testing, packaging, launch prep. -- **Total**: 6–10 weeks to MVP (solo dev + AI). +All powered by the bundled Qwen3 LLM. No API keys, no cloud calls. Features are grouped by how much they contribute to the core workflow. -## 7. Risks & Tips -- **Risks**: Whisper.cpp compilation issues; Rust learning curve if new to it. -- **Tips**: Start with small models (base ~70MB); test timestamp accuracy early. Use Tauri's docs for migration. If stuck, fall back to bundling Python for Whisper (but avoid for true standalone). -- **Resources**: Tauri docs, Whisper.cpp GitHub, Rust audio crates. -/home/dillon/_code/audio_editor/plan.md \ No newline at end of file +#### Content creation (biggest value) +- [ ] **Smart Shorts finder** β€” scan the full transcript for self-contained, engaging segments 10–90s long. Ranks by: narrative completeness (has a beginning/end), energy cues from transcript sentiment, and topic boundaries. Results shown as a list of suggested cut ranges with preview. One-click export as a separate short video or copy timecodes. +- [ ] **Sound bite / quotable moment finder** β€” find punchy, standalone sentences that work as clips for social media. Ranks by quotability. Different from Shorts finder: these are <15s, single-sentence, high-impact lines. +- [ ] **Hook analyzer** β€” score the first 30 seconds of the video for engagement. Suggests cuts or rephrases to make the intro stronger. Shows a "hook score" 1–10. +- [ ] **Title + description generator** β€” suggest 5 titles and a YouTube description from the transcript + markers. One-click copy. + +#### Editing acceleration +- [ ] **AI auto-chapters** β€” detect topic shifts from transcript β†’ create timeline markers. Uses topic segmentation from LLM, not just silence gaps. +- [ ] **AI Smart Clean** β€” one-pass: filler removal + silence trim + normalize (done) +- [ ] **AI sentence rephrase** β€” right-click word β†’ rephrase with AI β†’ replace in transcript (also done, uses backend) +- [ ] **AI pacing analysis** β€” flag segments where the speaker talks too fast or too slow for sustained periods. Suggests speed range adjustments or cuts. +- [ ] **AI dead-air finder** β€” finds moments where nothing interesting is said for >5s (rambling, off-topic, false starts). Different from silence trimmer β€” this is content-based, not audio-level-based. +- [ ] **AI readabilty scan** β€” flag sentences that are too long, complex, or jargon-heavy for spoken word. Suggests simpler alternatives. + +#### Metadata & distribution +- [ ] **AI show notes** β€” generate title, description, key moments, and timestamps from transcript + markers +- [ ] **AI keyword/tag extraction** β€” pull out 5–10 topic tags from the transcript. Useful for YouTube SEO or categorization +- [ ] **AI question finder** β€” detect all questions asked in the video (speaker or guest). Useful for Q&A videos, AMAs, interviews β€” jump to each question instantly +- [ ] **AI thumbnail text suggestion** β€” suggest short overlay text for video thumbnails based on the most compelling line in the video +- [ ] **AI call-to-action finder** β€” detect where the speaker asks for likes/subscribes/comments. Lets you trim or reposition CTAs + +#### Accessibility & compliance +- [ ] **AI content flagging** β€” flag profanity, sensitive topics, or copyrighted references in the transcript. Color-coded by category. Useful before publishing to restricted platforms +- [ ] **AI language leveling** β€” rewrite transcript segments at a target reading grade level (e.g., "simplify to 8th-grade level"). Useful for educational content or broad audiences + +### Bundled local LLM (killer friction-killer) +The biggest UX gap: users must set up Ollama or paste an API key to use AI features. Bundle two small models β€” download on first AI use, just like Whisper models. + +**Three-tier AI provider choice** (set once, persisted): + +| Option | Hardware needed | Download | Setup | +|--------|----------------|----------|-------| +| Qwen3 4B (recommended) | 8GB+ free RAM | 2.5 GB | None β€” auto-download | +| Qwen3 1.7B (lightweight) | 4GB+ free RAM | 1.0 GB | None β€” auto-download | +| Ollama (bring your own) | Any | None | User starts Ollama themselves | + +Default to Qwen3 4B. If the machine can't meet the RAM threshold (checked via Tauri at runtime), fall back to 1.7B or prompt to set up Ollama. + +- [ ] **Integrate llama.cpp Rust bindings** (`llama-cpp-rs` or `candle`) β€” replace Python `ai_provider.py` calls with native inference for bundled models +- [ ] **Auto-download Qwen3 4B or 1.7B** on first AI action based on hardware check (GGUF Q4_K_M format, ~2.5GB / ~1GB). Same UX as Whisper download: progress bar, resume on interrupt +- [ ] **Model selector** in Settings: "Qwen3 4B (fast, no setup)" vs "Qwen3 1.7B (lightweight)" vs Ollama vs OpenAI vs Claude. Default to Qwen3 4B +- [ ] **Hardware detection on first AI use** β€” check total system RAM, recommend 4B if β‰₯8GB free, 1.7B if less. Skip download entirely if machine can't run either (fall back to Ollama/API prompt) +- [ ] **GPU acceleration** β€” llama.cpp supports CUDA/Metal/Vulkan. Detect at runtime and enable if available +- [ ] **For lightweight AI tasks** (filler detection, chapter titles, summarization) the bundled model handles them directly. Only task requiring heavier reasoning (rephrase, smart speed) get the full model +- [ ] **Remove the Python backend dependency for AI** β€” once bundled LLM + Whisper.cpp handle everything, no need to ship Python for AI features. One less runtime dependency +- [ ] GGUF model files are cached in app data dir, same as Whisper models + +**Why this wins:** +- New users open the app, click "Detect filler words" and it just works. No API signup. No Docker. No "install Ollama" README steps. +- Descript charges $24/mo and still requires internet. CapCut's AI features are cloud-only. TalkEdit gives you local AI with zero setup. +- The same download-on-demand pattern already works for Whisper models β€” users understand it. +- Two size options means it works on everything from a 16GB gaming laptop to an 8GB office machine. + +--- + +## Phase 3: Launch prep + +### Marketing assets +- [ ] Compare page: "TalkEdit vs Descript" β€” focus on offline, no subscription, long file support +- [ ] Demo video showing a 2hr podcast cleaned in 3 clicks +- [ ] Tagline: *"The offline video editor that doesn't slow down on long files"* +- [ ] Landing page at talked.it with clear pricing (one-time license) + +### Distribution +- [ ] Product Hunt launch +- [ ] Post in r/podcasting, r/VideoEditing, r/selfhosted +- [ ] GitHub release with binaries for Windows/macOS/Linux +- [ ] Offer free licenses to podcasters with public feedback in exchange for testimonials + +### Pricing +- [ ] **Free trial**: 30 days, full features (already done) +- [ ] **Pro**: $39 one-time β€” permanent license, includes all current + future features +- [ ] **Business**: $79 one-time β€” same but with priority support + volume licensing +- [ ] No subscriptions, no recurring charges. One purchase = owned forever. + +--- + +## Phase 4: Post-launch + +### Retention +- [ ] In-app changelog on update +- [ ] Email list for major releases (optional, no account required) +- [ ] Community templates/sharing for export presets and filler lists + +### Growth features +- [ ] **Sample video download** β€” "Try without your own media" button downloads a test file + pre-made transcript +- [ ] **Built-in free music library** β€” 5-10 CC0 loops shipped with the app +- [ ] **Export presets** β€” community-contributed, loaded from a JSON file + +--- + +## Non-goals (explicitly defer) + +- Cloud sync / collaboration +- Voice cloning / TTS +- Full multi-track NLE timeline (transitions, picture-in-picture, etc.) +- Mobile app +- Subscription model +- Training/fine-tuning models in-app +- Image/video generation models (Stable Diffusion, etc.) β€” text-only LLM is sufficient for transcript tasks