TalkEdit — Tech Stack, Tools, and Features

This document summarizes the chosen technology, tooling, the full feature set, recommended additions, and items on the back burner.

Overview

Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.

Frontend: React 19 + Vite + TypeScript + Tailwind CSS + Zustand (with zundo undo/redo) + Virtuoso (virtualized transcript)
Backend: Tauri 2.0 (Rust) for file I/O, licensing, licensing crypto (Ed25519), model management, error logging
Transcription: Python faster-whisper with WhisperX for word-level alignment. Models downloaded on demand.
Audio/Video Processing: FFmpeg invoked from Rust via Python scripts (video_editor.py, audio_cleaner.py, caption_generator.py)
AI: Ollama, OpenAI, Claude through Python ai_provider.py. Bundled Qwen3 LLM planned.
State: Zustand (in-frontend store) + zundo middleware for undo/redo history
Packaging: Tauri tauri build for cross-platform installers

1. Media import via file dialog (audio/video auto audio-extract)
2. One-click local transcription with model selector (tiny/base → larger models) and model-size chooser
3. Scrollable, Google-Doc-style transcript editor (Virtuoso virtualized)
- Click word → seek video/audio
- Select words → cut corresponding media segment (smart 150–250ms fades)
4. Smart Cleanup
- Filler word removal (configurable list per-project)
- Silence trimming
5. Audio Polish chain (FFmpeg): normalize, compression, noise reduction
6. Preview with synced playback, undo/redo (zundo), project save/load
7. Export MP4/audio with SRT/VTT/ASS captions (speaker-labeled)
8. Speaker diarization
9. Custom filler lists per-project
10. Background music with auto-ducking
11. Append clips (concatenation)
12. Settings: AI provider config (Ollama, OpenAI, Claude)
13. Keyboard shortcuts with custom remapping
14. Help panel + cheatsheet
15. 7-day licensing with Ed25519-signed license keys

These broaden scope or add legal/privacy surface — defer for now.

Large model sizes: don't bundle large models; download on-demand and document storage location.
Timestamp accuracy: WhisperX word-level alignment + manual per-segment re-run available.
FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.

Generated to capture tech, tools, implemented features, and the recommended add/remove/defer list.

Powered by Gitea Version: 1.24.0 Page: 549ms Template: 4ms

English