TalkEdit/TECH_FEATURES.md

# TalkEdit — Tech Stack, Tools, and Features

This document summarizes the chosen technology, tooling, the full feature set, recommended additions, and items on the back burner.

## Overview
- Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.

## Tech Stack
- Frontend: React 19 + Vite + TypeScript + Tailwind CSS + Zustand (with zundo undo/redo) + Virtuoso (virtualized transcript)
- Backend: Tauri 2.0 (Rust) for file I/O, licensing, licensing crypto (Ed25519), model management, error logging
- Transcription: Python faster-whisper with WhisperX for word-level alignment. Models downloaded on demand.
- Audio/Video Processing: FFmpeg invoked from Rust via Python scripts (video_editor.py, audio_cleaner.py, caption_generator.py)
- AI: Ollama, OpenAI, Claude through Python ai_provider.py. Bundled Qwen3 LLM planned.
- State: Zustand (in-frontend store) + zundo middleware for undo/redo history
- Packaging: Tauri `tauri build` for cross-platform installers

## Developer Tools
- Rust toolchain (cargo, rustc)
- Node.js + npm for frontend
- Python 3.11+ (faster-whisper, WhisperX, AI providers)
- FFmpeg binaries (platform-specific; bundled or downloaded at install)
- Build/test: Tauri CLI, Vite dev server
- Testing: Vitest (frontend), cargo test (Rust), pytest (Python)
- CI: GitHub Actions (Rust clippy/test, Frontend tsc/vitest, Python pytest)

## Implemented Features

- [x] 1. Media import via file dialog (audio/video auto audio-extract)
- [x] 2. One-click local transcription with model selector (tiny/base → larger models) and model-size chooser
- [x] 3. Scrollable, Google-Doc-style transcript editor (Virtuoso virtualized)
   - Click word → seek video/audio
   - Select words → cut corresponding media segment (smart 150–250ms fades)
- [x] 4. Smart Cleanup
   - Filler word removal (configurable list per-project)
   - Silence trimming
- [x] 5. Audio Polish chain (FFmpeg): normalize, compression, noise reduction
- [x] 6. Preview with synced playback, undo/redo (zundo), project save/load
- [x] 7. Export MP4/audio with SRT/VTT/ASS captions (speaker-labeled)
- [x] 8. Speaker diarization
- [x] 9. Custom filler lists per-project
- [x] 10. Background music with auto-ducking
- [x] 11. Append clips (concatenation)
- [x] 12. Settings: AI provider config (Ollama, OpenAI, Claude)
- [x] 13. Keyboard shortcuts with custom remapping
- [x] 14. Help panel + cheatsheet
- [x] 15. 7-day licensing with Ed25519-signed license keys

## Recommended Additions (near-term, high ROI)

- [ ] Local GPU/CPU detection & recommended model/settings UI
- [ ] Per-project incremental transcription: re-run only edited segments
- [ ] "Preview cleaning" dry-run that highlights candidate removals before applying
- [ ] Export size/time estimator and suggested export presets
- [ ] Accessibility export presets (podcast vs YouTube presets)
- [ ] Bundled Qwen3 LLM for offline AI features

## Remove / Defer (Back Burner)
These broaden scope or add legal/privacy surface — defer for now.

- Voice cloning / TTS: DEFER
- Multi-track, full timeline NLE features: DEFER
- Real-time collaboration / cloud sync: DEFER
- Built-in cloud processing by default: DEFER (make optional add-on later)

## Risks & Mitigations
- Large model sizes: don't bundle large models; download on-demand and document storage location.
- Timestamp accuracy: WhisperX word-level alignment + manual per-segment re-run available.
- FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.

## Prioritized Quick Wins
1. Per-project incremental transcription
2. "Preview cleaning" dry-run UI
3. Export presets (podcast vs YouTube)

## Next Steps for Implementation
- Bundle Qwen3 LLM for offline AI processing.
- Implement incremental transcription to speed up re-editing workflows.
- Add export presets and size estimation.
- Improve GPU/CPU detection and model recommendations.

---

Generated to capture tech, tools, implemented features, and the recommended add/remove/defer list.
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								# TalkEdit — Tech Stack, Tools, and Features
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								This document summarizes the chosen technology, tooling, the full feature set, recommended additions, and items on the back burner.
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
 								## Overview
 								- Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.
 								## Tech Stack
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								- Frontend: React 19 + Vite + TypeScript + Tailwind CSS + Zustand (with zundo undo/redo) + Virtuoso (virtualized transcript)
 								- Backend: Tauri 2.0 (Rust) for file I/O, licensing, licensing crypto (Ed25519), model management, error logging
 								- Transcription: Python faster-whisper with WhisperX for word-level alignment. Models downloaded on demand.
 								- Audio/Video Processing: FFmpeg invoked from Rust via Python scripts (video_editor.py, audio_cleaner.py, caption_generator.py)
 								- AI: Ollama, OpenAI, Claude through Python ai_provider.py. Bundled Qwen3 LLM planned.
 								- State: Zustand (in-frontend store) + zundo middleware for undo/redo history
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
+								- Packaging: Tauri `tauri build` for cross-platform installers
 								## Developer Tools
 								- Rust toolchain (cargo, rustc)
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								- Node.js + npm for frontend
 								- Python 3.11+ (faster-whisper, WhisperX, AI providers)
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
+								- FFmpeg binaries (platform-specific; bundled or downloaded at install)
 								- Build/test: Tauri CLI, Vite dev server
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								- Testing: Vitest (frontend), cargo test (Rust), pytest (Python)
 								- CI: GitHub Actions (Rust clippy/test, Frontend tsc/vitest, Python pytest)
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								## Implemented Features
 								- [x] 1. Media import via file dialog (audio/video auto audio-extract)
 								- [x] 2. One-click local transcription with model selector (tiny/base → larger models) and model-size chooser
 								- [x] 3. Scrollable, Google-Doc-style transcript editor (Virtuoso virtualized)
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
+								   - Click word → seek video/audio
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								   - Select words → cut corresponding media segment (smart 150–250ms fades)
 								- [x] 4. Smart Cleanup
 								   - Filler word removal (configurable list per-project)
 								   - Silence trimming
 								- [x] 5. Audio Polish chain (FFmpeg): normalize, compression, noise reduction
 								- [x] 6. Preview with synced playback, undo/redo (zundo), project save/load
 								- [x] 7. Export MP4/audio with SRT/VTT/ASS captions (speaker-labeled)
 								- [x] 8. Speaker diarization
 								- [x] 9. Custom filler lists per-project
 								- [x] 10. Background music with auto-ducking
 								- [x] 11. Append clips (concatenation)
 								- [x] 12. Settings: AI provider config (Ollama, OpenAI, Claude)
 								- [x] 13. Keyboard shortcuts with custom remapping
 								- [x] 14. Help panel + cheatsheet
 								- [x] 15. 7-day licensing with Ed25519-signed license keys
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
 								## Recommended Additions (near-term, high ROI)
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
 								- [ ] Local GPU/CPU detection & recommended model/settings UI
 								- [ ] Per-project incremental transcription: re-run only edited segments
 								- [ ] "Preview cleaning" dry-run that highlights candidate removals before applying
 								- [ ] Export size/time estimator and suggested export presets
 								- [ ] Accessibility export presets (podcast vs YouTube presets)
 								- [ ] Bundled Qwen3 LLM for offline AI features
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
 								## Remove / Defer (Back Burner)
 								These broaden scope or add legal/privacy surface — defer for now.
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
+								- Voice cloning / TTS: DEFER
 								- Multi-track, full timeline NLE features: DEFER
 								- Real-time collaboration / cloud sync: DEFER
 								- Built-in cloud processing by default: DEFER (make optional add-on later)
 								## Risks & Mitigations
 								- Large model sizes: don't bundle large models; download on-demand and document storage location.
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								- Timestamp accuracy: WhisperX word-level alignment + manual per-segment re-run available.
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
+								- FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.
 								## Prioritized Quick Wins
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+. Per-project incremental transcription
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
+. "Preview cleaning" dry-run UI
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+. Export presets (podcast vs YouTube)
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
 								## Next Steps for Implementation
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
+								- Bundle Qwen3 LLM for offline AI processing.
 								- Implement incremental transcription to speed up re-editing workflows.
 								- Add export presets and size estimation.
 								- Improve GPU/CPU detection and model recommendations.
-												added features doc

											
										
										
											2026-03-25 00:11:35 -06:00
 								---
-												updated features and docs

											
										
										
											2026-05-06 16:15:38 -06:00
 								Generated to capture tech, tools, implemented features, and the recommended add/remove/defer list.