updated features and docs

2026-05-06 16:15:38 -06:00
parent e4484a57f9
commit 813877a7b4
4 changed files with 276 additions and 269 deletions
--- a/TECH_FEATURES.md
+++ b/TECH_FEATURES.md
@ -1,50 +1,62 @@
-# TalkEdit — Tech Stack, Tools, and Planned Features
+# TalkEdit — Tech Stack, Tools, and Features

-This document summarizes the chosen technology, tooling, the full planned feature set for the MVP, recommended additions, removals, and items to put on the back burner.
+This document summarizes the chosen technology, tooling, the full feature set, recommended additions, and items on the back burner.

 ## Overview
 - Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.

 ## Tech Stack
- Frontend: React + Vite + Tailwind CSS + shadcn/ui
- Backend: Tauri 2.0 (Rust) for file I/O, invoking native binaries, and exposing commands to the UI
- Transcription: Whisper.cpp (Rust bindings like `whisper-rs` / `whisper-cpp-sys`) — word-level timestamps
- Audio/Video Processing: FFmpeg invoked from Rust (or `ffmpeg-next` Rust crate)
- State: Zustand (in-frontend store)
+- Frontend: React 19 + Vite + TypeScript + Tailwind CSS + Zustand (with zundo undo/redo) + Virtuoso (virtualized transcript)
+- Backend: Tauri 2.0 (Rust) for file I/O, licensing, licensing crypto (Ed25519), model management, error logging
+- Transcription: Python faster-whisper with WhisperX for word-level alignment. Models downloaded on demand.
+- Audio/Video Processing: FFmpeg invoked from Rust via Python scripts (video_editor.py, audio_cleaner.py, caption_generator.py)
+- AI: Ollama, OpenAI, Claude through Python ai_provider.py. Bundled Qwen3 LLM planned.
+- State: Zustand (in-frontend store) + zundo middleware for undo/redo history
 - Packaging: Tauri `tauri build` for cross-platform installers
- Optional local tools: Ollama (optional local LLMs) for advanced on-device heuristics

 ## Developer Tools
 - Rust toolchain (cargo, rustc)
- Node.js + npm/yarn for frontend
+- Node.js + npm for frontend
+- Python 3.11+ (faster-whisper, WhisperX, AI providers)
 - FFmpeg binaries (platform-specific; bundled or downloaded at install)
 - Build/test: Tauri CLI, Vite dev server
+- Testing: Vitest (frontend), cargo test (Rust), pytest (Python)
+- CI: GitHub Actions (Rust clippy/test, Frontend tsc/vitest, Python pytest)

-## MVP Feature List (Planned)
-1. Drag-and-drop import (audio/video auto audio-extract)
-2. One-click local transcription (model selector: tiny/base → larger models)
-3. Scrollable, Google-Doc-style transcript editor
+## Implemented Features
+
+- [x] 1. Media import via file dialog (audio/video auto audio-extract)
+- [x] 2. One-click local transcription with model selector (tiny/base → larger models) and model-size chooser
+- [x] 3. Scrollable, Google-Doc-style transcript editor (Virtuoso virtualized)
   - Click word → seek video/audio
-   - Highlight + Delete → remove corresponding media segment (smart 150–250ms fades)
-4. One-click "Clean it" button
-   - Remove fillers (configurable list)
-   - Remove long pauses (>0.8s) by default
-5. One-click audio polish chain (FFmpeg): normalize, light compression, basic noise reduction
-6. Preview with synced playback, undo/redo, project save/load
-7. Export MP4/audio with optional SRT/VTT captions and burned-in captions
+   - Select words → cut corresponding media segment (smart 150–250ms fades)
+- [x] 4. Smart Cleanup
+   - Filler word removal (configurable list per-project)
+   - Silence trimming
+- [x] 5. Audio Polish chain (FFmpeg): normalize, compression, noise reduction
+- [x] 6. Preview with synced playback, undo/redo (zundo), project save/load
+- [x] 7. Export MP4/audio with SRT/VTT/ASS captions (speaker-labeled)
+- [x] 8. Speaker diarization
+- [x] 9. Custom filler lists per-project
+- [x] 10. Background music with auto-ducking
+- [x] 11. Append clips (concatenation)
+- [x] 12. Settings: AI provider config (Ollama, OpenAI, Claude)
+- [x] 13. Keyboard shortcuts with custom remapping
+- [x] 14. Help panel + cheatsheet
+- [x] 15. 7-day licensing with Ed25519-signed license keys

 ## Recommended Additions (near-term, high ROI)
- Model-size chooser + progressive fallback (start fast, upgrade model later)
- Local GPU/CPU detection & recommended model/settings UI
- Per-project incremental transcription: re-run only edited segments
- "Preview cleaning" dry-run that highlights candidate removals before applying
- Export size/time estimator and suggested export presets
- Custom filler lists per-project and import/export of filler lists
- High-quality offline captions export (SRT + VTT + speaker labels)
- Accessibility export presets (podcast vs YouTube presets)
+
+- [ ] Local GPU/CPU detection & recommended model/settings UI
+- [ ] Per-project incremental transcription: re-run only edited segments
+- [ ] "Preview cleaning" dry-run that highlights candidate removals before applying
+- [ ] Export size/time estimator and suggested export presets
+- [ ] Accessibility export presets (podcast vs YouTube presets)
+- [ ] Bundled Qwen3 LLM for offline AI features

 ## Remove / Defer (Back Burner)
 These broaden scope or add legal/privacy surface — defer for now.
+
 - Voice cloning / TTS: DEFER
 - Multi-track, full timeline NLE features: DEFER
 - Real-time collaboration / cloud sync: DEFER
@ -52,18 +64,20 @@ These broaden scope or add legal/privacy surface — defer for now.

 ## Risks & Mitigations
 - Large model sizes: don't bundle large models; download on-demand and document storage location.
- Timestamp accuracy: provide manual word-adjust UI and per-segment re-run.
+- Timestamp accuracy: WhisperX word-level alignment + manual per-segment re-run available.
 - FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.

 ## Prioritized Quick Wins
-1. Model chooser UI + auto-fallback settings
+1. Per-project incremental transcription
 2. "Preview cleaning" dry-run UI
-3. Per-project incremental transcription saving
+3. Export presets (podcast vs YouTube)

 ## Next Steps for Implementation
- Add model chooser UI and capability detection early in the frontend iteration.
- Implement Rust transcription command and a compact API for incremental transcription.
- Implement FFmpeg polish templates and a minimal preview pipeline.
+- Bundle Qwen3 LLM for offline AI processing.
+- Implement incremental transcription to speed up re-editing workflows.
+- Add export presets and size estimation.
+- Improve GPU/CPU detection and model recommendations.

 ---
-Generated as requested to capture tech, tools, planned features, and the recommended add/remove/defer list.
+
+Generated to capture tech, tools, implemented features, and the recommended add/remove/defer list.