Files
TalkEdit/TECH_FEATURES.md

70 lines
3.5 KiB
Markdown
Raw Normal View History

2026-03-25 00:11:35 -06:00
# TalkEdit — Tech Stack, Tools, and Planned Features
This document summarizes the chosen technology, tooling, the full planned feature set for the MVP, recommended additions, removals, and items to put on the back burner.
## Overview
- Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.
## Tech Stack
- Frontend: React + Vite + Tailwind CSS + shadcn/ui
- Backend: Tauri 2.0 (Rust) for file I/O, invoking native binaries, and exposing commands to the UI
- Transcription: Whisper.cpp (Rust bindings like `whisper-rs` / `whisper-cpp-sys`) — word-level timestamps
- Audio/Video Processing: FFmpeg invoked from Rust (or `ffmpeg-next` Rust crate)
- State: Zustand (in-frontend store)
- Packaging: Tauri `tauri build` for cross-platform installers
- Optional local tools: Ollama (optional local LLMs) for advanced on-device heuristics
## Developer Tools
- Rust toolchain (cargo, rustc)
- Node.js + npm/yarn for frontend
- FFmpeg binaries (platform-specific; bundled or downloaded at install)
- Build/test: Tauri CLI, Vite dev server
## MVP Feature List (Planned)
1. Drag-and-drop import (audio/video auto audio-extract)
2. One-click local transcription (model selector: tiny/base → larger models)
3. Scrollable, Google-Doc-style transcript editor
- Click word → seek video/audio
- Highlight + Delete → remove corresponding media segment (smart 150250ms fades)
4. One-click "Clean it" button
- Remove fillers (configurable list)
- Remove long pauses (>0.8s) by default
5. One-click audio polish chain (FFmpeg): normalize, light compression, basic noise reduction
6. Preview with synced playback, undo/redo, project save/load
7. Export MP4/audio with optional SRT/VTT captions and burned-in captions
## Recommended Additions (near-term, high ROI)
- Model-size chooser + progressive fallback (start fast, upgrade model later)
- Local GPU/CPU detection & recommended model/settings UI
- Per-project incremental transcription: re-run only edited segments
- "Preview cleaning" dry-run that highlights candidate removals before applying
- Export size/time estimator and suggested export presets
- Custom filler lists per-project and import/export of filler lists
- High-quality offline captions export (SRT + VTT + speaker labels)
- Accessibility export presets (podcast vs YouTube presets)
## Remove / Defer (Back Burner)
These broaden scope or add legal/privacy surface — defer for now.
- Voice cloning / TTS: DEFER
- Multi-track, full timeline NLE features: DEFER
- Real-time collaboration / cloud sync: DEFER
- Built-in cloud processing by default: DEFER (make optional add-on later)
## Risks & Mitigations
- Large model sizes: don't bundle large models; download on-demand and document storage location.
- Timestamp accuracy: provide manual word-adjust UI and per-segment re-run.
- FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.
## Prioritized Quick Wins
1. Model chooser UI + auto-fallback settings
2. "Preview cleaning" dry-run UI
3. Per-project incremental transcription saving
## Next Steps for Implementation
- Add model chooser UI and capability detection early in the frontend iteration.
- Implement Rust transcription command and a compact API for incremental transcription.
- Implement FFmpeg polish templates and a minimal preview pipeline.
---
Generated as requested to capture tech, tools, planned features, and the recommended add/remove/defer list.