# TalkEdit — Tech Stack, Tools, and Features This document summarizes the chosen technology, tooling, the full feature set, recommended additions, and items on the back burner. ## Overview - Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer. ## Tech Stack - Frontend: React 19 + Vite + TypeScript + Tailwind CSS + Zustand (with zundo undo/redo) + Virtuoso (virtualized transcript) - Backend: Tauri 2.0 (Rust) for file I/O, licensing, licensing crypto (Ed25519), model management, error logging - Transcription: Python faster-whisper with WhisperX for word-level alignment. Models downloaded on demand. - Audio/Video Processing: FFmpeg invoked from Rust via Python scripts (video_editor.py, audio_cleaner.py, caption_generator.py) - AI: Ollama, OpenAI, Claude through Python ai_provider.py. Bundled Qwen3 LLM planned. - State: Zustand (in-frontend store) + zundo middleware for undo/redo history - Packaging: Tauri `tauri build` for cross-platform installers ## Developer Tools - Rust toolchain (cargo, rustc) - Node.js + npm for frontend - Python 3.11+ (faster-whisper, WhisperX, AI providers) - FFmpeg binaries (platform-specific; bundled or downloaded at install) - Build/test: Tauri CLI, Vite dev server - Testing: Vitest (frontend), cargo test (Rust), pytest (Python) - CI: GitHub Actions (Rust clippy/test, Frontend tsc/vitest, Python pytest) ## Implemented Features - [x] 1. Media import via file dialog (audio/video auto audio-extract) - [x] 2. One-click local transcription with model selector (tiny/base → larger models) and model-size chooser - [x] 3. Scrollable, Google-Doc-style transcript editor (Virtuoso virtualized) - Click word → seek video/audio - Select words → cut corresponding media segment (smart 150–250ms fades) - [x] 4. Smart Cleanup - Filler word removal (configurable list per-project) - Silence trimming - [x] 5. Audio Polish chain (FFmpeg): normalize, compression, noise reduction - [x] 6. Preview with synced playback, undo/redo (zundo), project save/load - [x] 7. Export MP4/audio with SRT/VTT/ASS captions (speaker-labeled) - [x] 8. Speaker diarization - [x] 9. Custom filler lists per-project - [x] 10. Background music with auto-ducking - [x] 11. Append clips (concatenation) - [x] 12. Settings: AI provider config (Ollama, OpenAI, Claude) - [x] 13. Keyboard shortcuts with custom remapping - [x] 14. Help panel + cheatsheet - [x] 15. 7-day licensing with Ed25519-signed license keys ## Recommended Additions (near-term, high ROI) - [ ] Local GPU/CPU detection & recommended model/settings UI - [ ] Per-project incremental transcription: re-run only edited segments - [ ] "Preview cleaning" dry-run that highlights candidate removals before applying - [ ] Export size/time estimator and suggested export presets - [ ] Accessibility export presets (podcast vs YouTube presets) - [ ] Bundled Qwen3 LLM for offline AI features ## Remove / Defer (Back Burner) These broaden scope or add legal/privacy surface — defer for now. - Voice cloning / TTS: DEFER - Multi-track, full timeline NLE features: DEFER - Real-time collaboration / cloud sync: DEFER - Built-in cloud processing by default: DEFER (make optional add-on later) ## Risks & Mitigations - Large model sizes: don't bundle large models; download on-demand and document storage location. - Timestamp accuracy: WhisperX word-level alignment + manual per-segment re-run available. - FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance. ## Prioritized Quick Wins 1. Per-project incremental transcription 2. "Preview cleaning" dry-run UI 3. Export presets (podcast vs YouTube) ## Next Steps for Implementation - Bundle Qwen3 LLM for offline AI processing. - Implement incremental transcription to speed up re-editing workflows. - Add export presets and size estimation. - Improve GPU/CPU detection and model recommendations. --- Generated to capture tech, tools, implemented features, and the recommended add/remove/defer list.