130 lines
3.8 KiB
Markdown
130 lines
3.8 KiB
Markdown
# CutScript
|
|
|
|
An open-source, local-first, Descript-like text-based audio and video editor powered by AI. Edit audio/video by editing text — delete a word from the transcript and it's cut from the audio/video.
|
|
|
|
<img width="1034" height="661" alt="image" src="https://github.com/user-attachments/assets/b1ed9505-792e-42ca-bb73-85458d0f02a5" />
|
|
|
|
|
|
## Architecture
|
|
|
|
- **Tauri + React** desktop app with Tailwind CSS
|
|
- **FastAPI** Python backend (spawned as child process)
|
|
- **WhisperX** for word-level transcription with alignment
|
|
- **FFmpeg** for video processing (stream-copy and re-encode)
|
|
- **Ollama / OpenAI / Claude** for AI features (filler removal, clip creation)
|
|
|
|
## Quick Start
|
|
|
|
### Prerequisites
|
|
|
|
- Node.js 18+
|
|
- Python 3.10+
|
|
- FFmpeg (in PATH)
|
|
- (Optional) Ollama for local AI features
|
|
|
|
### Install
|
|
|
|
```bash
|
|
# Root and frontend dependencies
|
|
npm install
|
|
cd frontend && npm install && cd ..
|
|
|
|
# Backend dependencies
|
|
cd backend && pip install -r requirements.txt && cd ..
|
|
```
|
|
|
|
### Run (Development)
|
|
|
|
```bash
|
|
# Start Tauri dev environment (includes backend + frontend)
|
|
npm run dev:tauri
|
|
```
|
|
|
|
Or run them separately:
|
|
|
|
```bash
|
|
# Terminal 1: Backend
|
|
cd backend && python -m uvicorn main:app --reload --port 8642
|
|
|
|
# Terminal 2: Frontend + Tauri
|
|
cd frontend && cargo tauri dev
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
talkedit/
|
|
├── src-tauri/ # Tauri Rust runtime
|
|
│ ├── Cargo.toml
|
|
│ ├── src/
|
|
│ │ ├── main.rs # App entry & backend spawner
|
|
│ │ └── commands/ # Tauri IPC handlers
|
|
├── frontend/ # React + Vite + Tailwind
|
|
│ └── src/
|
|
│ ├── components/ # VideoPlayer, TranscriptEditor, etc.
|
|
│ ├── store/ # Zustand state (editorStore, aiStore)
|
|
│ ├── lib/tauri-bridge.ts # Tauri API polyfill
|
|
│ └── types/ # TypeScript interfaces
|
|
├── backend/ # FastAPI Python backend
|
|
│ ├── main.py
|
|
│ ├── routers/ # API endpoints
|
|
│ ├── services/ # Core logic (transcription, editing, AI)
|
|
│ └── utils/ # GPU, cache, audio helpers
|
|
└── shared/ # Project schema
|
|
```
|
|
|
|
## Features
|
|
|
|
| Feature | Status |
|
|
|---------|--------|
|
|
| Word-level transcription (WhisperX) | Done |
|
|
| Text-based video editing | Done |
|
|
| Undo/redo | Done |
|
|
| Waveform timeline | Done |
|
|
| FFmpeg stream-copy export | Done |
|
|
| FFmpeg re-encode (up to 4K) | Done |
|
|
| AI filler word removal | Done |
|
|
| AI clip creation (Shorts) | Done |
|
|
| Ollama + OpenAI + Claude | Done |
|
|
| Word-level captions (SRT/VTT/ASS) | Done |
|
|
| Caption burn-in on export | Done |
|
|
| Studio Sound (DeepFilterNet) | Done |
|
|
| Keyboard shortcuts (J/K/L) | Done |
|
|
| Speaker diarization | Done |
|
|
| Virtualized transcript (react-virtuoso) | Done |
|
|
| Encrypted API key storage | Done |
|
|
| Project save/load (.cutscript) | Done |
|
|
| AI background removal | Planned |
|
|
|
|
## Keyboard Shortcuts
|
|
|
|
| Key | Action |
|
|
|-----|--------|
|
|
| Space | Play / Pause |
|
|
| J / K / L | Reverse / Pause / Forward |
|
|
| ← / → | Seek ±5 seconds |
|
|
| Delete | Delete selected words |
|
|
| Ctrl+Z | Undo |
|
|
| Ctrl+Shift+Z | Redo |
|
|
| Ctrl+S | Save project |
|
|
| Ctrl+E | Export |
|
|
| ? | Shortcut cheatsheet |
|
|
|
|
## API Endpoints
|
|
|
|
| Method | Endpoint | Description |
|
|
|--------|----------|-------------|
|
|
| GET | /health | Health check |
|
|
| POST | /transcribe | Transcribe video with WhisperX |
|
|
| POST | /export | Export edited video (stream copy or re-encode) |
|
|
| POST | /ai/filler-removal | Detect filler words via LLM |
|
|
| POST | /ai/create-clip | AI-suggested clips for shorts |
|
|
| GET | /ai/ollama-models | List local Ollama models |
|
|
| POST | /captions | Generate SRT/VTT/ASS captions |
|
|
| POST | /audio/clean | Noise reduction (DeepFilterNet) |
|
|
| GET | /audio/capabilities | Check audio processing availability |
|
|
|
|
## License
|
|
|
|
MIT License — see [LICENSE](LICENSE) for details.
|