updated features and docs

2026-05-06 16:15:38 -06:00
parent e4484a57f9
commit 813877a7b4
4 changed files with 276 additions and 269 deletions
--- a/README.md
+++ b/README.md
@ -1,26 +1,58 @@
-# CutScript
+# TalkEdit

-An open-source, local-first, Descript-like text-based audio and video editor powered by AI. Edit audio/video by editing text — delete a word from the transcript and it's cut from the audio/video.
+**Edit video by editing text.** An offline, local-first desktop video editor where deleting a word from the transcript cuts it from the video.

-<img width="1034" height="661" alt="image" src="https://github.com/user-attachments/assets/b1ed9505-792e-42ca-bb73-85458d0f02a5" />
+<img width="1034" height="661" alt="TalkEdit screenshot" src="https://github.com/user-attachments/assets/b1ed9505-792e-42ca-bb73-85458d0f02a5" />

+---

-## Architecture
+## Features

- **Tauri + React** desktop app with Tailwind CSS
- **FastAPI** Python backend (spawned as child process)
- **WhisperX** for word-level transcription with alignment
- **FFmpeg** for video processing (stream-copy and re-encode)
- **Ollama / OpenAI / Claude** for AI features (filler removal, clip creation)
+- **Text-based editing** — delete, reorder, or correct words in the transcript to edit the underlying video. No razor tool, no timeline slicing.
+- **Word-level transcription** — Whisper.cpp with per-word timestamps and confidence scores. Low-confidence words get a visual warning.
+- **Four zone types** — Cut, Mute, Sound Gain, and Speed Adjust. Create zones on the waveform timeline and drag edges to refine.
+- **Waveform timeline** — zoomable, scrollable waveform with playhead scrubbing, zone visualization, markers, chapters, and thumbnail strips.
+- **AI-powered editing**
+  - Filler word detection and removal
+  - Smart Clean: one-click filler removal + silence trim + noise reduction + loudness normalization
+  - Clip suggestions for social media shorts
+  - Sentence rephrase with AI alternatives
+  - Supports **Ollama** (local), **OpenAI**, and **Claude** backends
+- **Background music** — import a second audio track with auto-ducking via sidechain compression.
+- **Export** — fast stream-copy or full re-encode to MP4, MOV, WebM, or WAV. Resolution up to 4K.
+- **Captions** — generate SRT, VTT, or burn-in ASS subtitles with configurable font, color, and position.
+- **Speaker diarization** — identify and label multiple speakers.
+- **Audio tools** — noise reduction (DeepFilterNet), loudness normalization (LUFS targeting), background removal (MediaPipe), batch silence removal, video zoom/punch-in.
+- **Project save/load** — `.aive` JSON format preserves all edits, zones, markers, and AI config.
+- **Customizable hotkeys** — two presets (Standard / Left-hand) with per-key remapping and conflict detection.
+- **100% offline, no account required** — everything runs on your machine. No telemetry, no cloud dependency.
+- **7-day free trial** with one-time license key purchase. No subscription.
+
+---
+
+## Tech Stack
+
+| Layer | Technology |
+|-------|------------|
+| Desktop shell | **Tauri 2.0** (Rust) |
+| Frontend | **React** + **TypeScript** + **Tailwind CSS** |
+| State management | **Zustand** with Zundo undo/redo |
+| Transcription | **Whisper.cpp** (word-level timestamps) |
+| AI / LLM | **Ollama**, **OpenAI**, **Claude** (plugable backends) |
+| Media processing | **FFmpeg** |
+| Python services | **FastAPI** (spawned as a child process) |
+
+---

 ## Quick Start

 ### Prerequisites

- Node.js 18+
- Python 3.10+
- FFmpeg (in PATH)
- (Optional) Ollama for local AI features
+- **Node.js** 18+
+- **Python** 3.10+
+- **FFmpeg** (in PATH)
+- **Rust** toolchain (for Tauri)
+- **Ollama** (optional, for local AI features)

 ### Install

@ -36,65 +68,89 @@ cd backend && pip install -r requirements.txt && cd ..
 ### Run (Development)

 ```bash
-# Start Tauri dev environment (includes backend + frontend)
+# Start everything: backend + frontend + Tauri
 npm run dev:tauri
 ```

-Or run them separately:
+Or run components separately:

 ```bash
-# Terminal 1: Backend
-cd backend && python -m uvicorn main:app --reload --port 8642
+# Terminal 1: Python backend
+npm run dev:backend

 # Terminal 2: Frontend + Tauri
 cd frontend && cargo tauri dev
 ```

+### Build
+
+```bash
+npm run build:tauri
+```
+
+---
+
 ## Project Structure

 ```
 talkedit/
-├── src-tauri/       # Tauri Rust runtime
+├── src-tauri/             # Tauri 2.0 Rust runtime
 │   ├── Cargo.toml
-│   ├── src/
-│   │   ├── main.rs         # App entry & backend spawner
-│   │   └── commands/        # Tauri IPC handlers
-├── frontend/        # React + Vite + Tailwind
 │   └── src/
-│       ├── components/      # VideoPlayer, TranscriptEditor, etc.
-│       ├── store/           # Zustand state (editorStore, aiStore)
-│       ├── lib/tauri-bridge.ts  # Tauri API polyfill
-│       └── types/       # TypeScript interfaces
-├── backend/           # FastAPI Python backend
+│       ├── main.rs             # App entry, backend spawner
+│       ├── lib.rs              # Command handlers (IPC bridge)
+│       ├── transcription.rs    # Whisper.cpp integration
+│       ├── video_editor.rs     # FFmpeg-based editing
+│       ├── caption_generator.rs
+│       ├── diarization.rs
+│       ├── ai_provider.rs      # Ollama / OpenAI / Claude
+│       ├── audio_cleaner.rs
+│       ├── background_removal.rs
+│       ├── licensing.rs        # Trial + key activation
+│       ├── models.rs           # Shared data types
+│       └── paths.rs
+├── frontend/              # React + Vite + Tailwind
+│   └── src/
+│       ├── components/         # UI components
+│       │   ├── TranscriptEditor.tsx
+│       │   ├── WaveformTimeline.tsx
+│       │   ├── VideoPlayer.tsx
+│       │   ├── AIPanel.tsx
+│       │   ├── ExportDialog.tsx
+│       │   ├── SettingsPanel.tsx
+│       │   ├── BackgroundMusicPanel.tsx
+│       │   ├── MarkersPanel.tsx
+│       │   ├── ZoneEditor.tsx
+│       │   ├── SilenceTrimmerPanel.tsx
+│       │   ├── AppendClipPanel.tsx
+│       │   ├── LicenseDialog.tsx
+│       │   └── DevPanel.tsx
+│       ├── store/             # Zustand state (editorStore, aiStore, settingsStore)
+│       ├── hooks/             # Custom React hooks
+│       ├── lib/               # Utilities and Tauri bridge
+│       └── types/             # TypeScript interfaces
+├── backend/               # FastAPI Python services
 │   ├── main.py
-│   ├── routers/       # API endpoints
-│   ├── services/      # Core logic (transcription, editing, AI)
-│   └── utils/         # GPU, cache, audio helpers
-└── shared/            # Project schema
+│   ├── routers/               # API endpoints
+│   │   ├── transcribe.py
+│   │   ├── ai.py
+│   │   ├── audio.py
+│   │   ├── captions.py
+│   │   └── export.py
+│   ├── services/              # Core logic
+│   ├── video_editor.py
+│   ├── caption_generator.py
+│   ├── ai_provider.py
+│   ├── diarization.py
+│   ├── audio_cleaner.py
+│   ├── background_removal.py
+│   └── license_server.py
+├── shared/                # Schema definitions (project format)
+├── models/                # Whisper model storage
+└── docs/                  # Documentation
 ```

-## Features
-
-| Feature | Status |
-|---------|--------|
-| Word-level transcription (WhisperX) | Done |
-| Text-based video editing | Done |
-| Undo/redo | Done |
-| Waveform timeline | Done |
-| FFmpeg stream-copy export | Done |
-| FFmpeg re-encode (up to 4K) | Done |
-| AI filler word removal | Done |
-| AI clip creation (Shorts) | Done |
-| Ollama + OpenAI + Claude | Done |
-| Word-level captions (SRT/VTT/ASS) | Done |
-| Caption burn-in on export | Done |
-| Studio Sound (DeepFilterNet) | Done |
-| Keyboard shortcuts (J/K/L) | Done |
-| Speaker diarization | Done |
-| Virtualized transcript (react-virtuoso) | Done |
-| Encrypted API key storage | Done |
-| Project save/load (.cutscript) | Done |
-| AI background removal | Planned |
+---

 ## Keyboard Shortcuts

@ -102,28 +158,19 @@ talkedit/
 |-----|--------|
 | Space | Play / Pause |
 | J / K / L | Reverse / Pause / Forward |
+| I / O | Mark In / Mark Out |
 | ← / → | Seek ±5 seconds |
-| Delete | Delete selected words |
+| Delete | Delete selected words or zones |
 | Ctrl+Z | Undo |
 | Ctrl+Shift+Z | Redo |
 | Ctrl+S | Save project |
 | Ctrl+E | Export |
+| Ctrl+F | Search transcript |
+| Ctrl+Scroll | Zoom waveform |
 | ? | Shortcut cheatsheet |

-## API Endpoints
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| GET | /health | Health check |
-| POST | /transcribe | Transcribe video with WhisperX |
-| POST | /export | Export edited video (stream copy or re-encode) |
-| POST | /ai/filler-removal | Detect filler words via LLM |
-| POST | /ai/create-clip | AI-suggested clips for shorts |
-| GET | /ai/ollama-models | List local Ollama models |
-| POST | /captions | Generate SRT/VTT/ASS captions |
-| POST | /audio/clean | Noise reduction (DeepFilterNet) |
-| GET | /audio/capabilities | Check audio processing availability |
+---

 ## License

-MIT License — see [LICENSE](LICENSE) for details.
+Source code is MIT — see [LICENSE](LICENSE) for details. The distributed binary includes a 7-day free trial requiring a one-time license key purchase for continued use.