diff --git a/TECH_FEATURES.md b/TECH_FEATURES.md new file mode 100644 index 0000000..fa6fc82 --- /dev/null +++ b/TECH_FEATURES.md @@ -0,0 +1,69 @@ +# TalkEdit — Tech Stack, Tools, and Planned Features + +This document summarizes the chosen technology, tooling, the full planned feature set for the MVP, recommended additions, removals, and items to put on the back burner. + +## Overview +- Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer. + +## Tech Stack +- Frontend: React + Vite + Tailwind CSS + shadcn/ui +- Backend: Tauri 2.0 (Rust) for file I/O, invoking native binaries, and exposing commands to the UI +- Transcription: Whisper.cpp (Rust bindings like `whisper-rs` / `whisper-cpp-sys`) — word-level timestamps +- Audio/Video Processing: FFmpeg invoked from Rust (or `ffmpeg-next` Rust crate) +- State: Zustand (in-frontend store) +- Packaging: Tauri `tauri build` for cross-platform installers +- Optional local tools: Ollama (optional local LLMs) for advanced on-device heuristics + +## Developer Tools +- Rust toolchain (cargo, rustc) +- Node.js + npm/yarn for frontend +- FFmpeg binaries (platform-specific; bundled or downloaded at install) +- Build/test: Tauri CLI, Vite dev server + +## MVP Feature List (Planned) +1. Drag-and-drop import (audio/video auto audio-extract) +2. One-click local transcription (model selector: tiny/base → larger models) +3. Scrollable, Google-Doc-style transcript editor + - Click word → seek video/audio + - Highlight + Delete → remove corresponding media segment (smart 150–250ms fades) +4. One-click "Clean it" button + - Remove fillers (configurable list) + - Remove long pauses (>0.8s) by default +5. One-click audio polish chain (FFmpeg): normalize, light compression, basic noise reduction +6. Preview with synced playback, undo/redo, project save/load +7. Export MP4/audio with optional SRT/VTT captions and burned-in captions + +## Recommended Additions (near-term, high ROI) +- Model-size chooser + progressive fallback (start fast, upgrade model later) +- Local GPU/CPU detection & recommended model/settings UI +- Per-project incremental transcription: re-run only edited segments +- "Preview cleaning" dry-run that highlights candidate removals before applying +- Export size/time estimator and suggested export presets +- Custom filler lists per-project and import/export of filler lists +- High-quality offline captions export (SRT + VTT + speaker labels) +- Accessibility export presets (podcast vs YouTube presets) + +## Remove / Defer (Back Burner) +These broaden scope or add legal/privacy surface — defer for now. +- Voice cloning / TTS: DEFER +- Multi-track, full timeline NLE features: DEFER +- Real-time collaboration / cloud sync: DEFER +- Built-in cloud processing by default: DEFER (make optional add-on later) + +## Risks & Mitigations +- Large model sizes: don't bundle large models; download on-demand and document storage location. +- Timestamp accuracy: provide manual word-adjust UI and per-segment re-run. +- FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance. + +## Prioritized Quick Wins +1. Model chooser UI + auto-fallback settings +2. "Preview cleaning" dry-run UI +3. Per-project incremental transcription saving + +## Next Steps for Implementation +- Add model chooser UI and capability detection early in the frontend iteration. +- Implement Rust transcription command and a compact API for incremental transcription. +- Implement FFmpeg polish templates and a minimal preview pipeline. + +--- +Generated as requested to capture tech, tools, planned features, and the recommended add/remove/defer list.