added features doc
This commit is contained in:
69
TECH_FEATURES.md
Normal file
69
TECH_FEATURES.md
Normal file
@ -0,0 +1,69 @@
|
|||||||
|
# TalkEdit — Tech Stack, Tools, and Planned Features
|
||||||
|
|
||||||
|
This document summarizes the chosen technology, tooling, the full planned feature set for the MVP, recommended additions, removals, and items to put on the back burner.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
- Goal: Offline, local text-based audio/video editor (Descript-style) focused on spoken-word creators (podcasters, YouTubers). Fast, privacy-first, single-file installer.
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
- Frontend: React + Vite + Tailwind CSS + shadcn/ui
|
||||||
|
- Backend: Tauri 2.0 (Rust) for file I/O, invoking native binaries, and exposing commands to the UI
|
||||||
|
- Transcription: Whisper.cpp (Rust bindings like `whisper-rs` / `whisper-cpp-sys`) — word-level timestamps
|
||||||
|
- Audio/Video Processing: FFmpeg invoked from Rust (or `ffmpeg-next` Rust crate)
|
||||||
|
- State: Zustand (in-frontend store)
|
||||||
|
- Packaging: Tauri `tauri build` for cross-platform installers
|
||||||
|
- Optional local tools: Ollama (optional local LLMs) for advanced on-device heuristics
|
||||||
|
|
||||||
|
## Developer Tools
|
||||||
|
- Rust toolchain (cargo, rustc)
|
||||||
|
- Node.js + npm/yarn for frontend
|
||||||
|
- FFmpeg binaries (platform-specific; bundled or downloaded at install)
|
||||||
|
- Build/test: Tauri CLI, Vite dev server
|
||||||
|
|
||||||
|
## MVP Feature List (Planned)
|
||||||
|
1. Drag-and-drop import (audio/video auto audio-extract)
|
||||||
|
2. One-click local transcription (model selector: tiny/base → larger models)
|
||||||
|
3. Scrollable, Google-Doc-style transcript editor
|
||||||
|
- Click word → seek video/audio
|
||||||
|
- Highlight + Delete → remove corresponding media segment (smart 150–250ms fades)
|
||||||
|
4. One-click "Clean it" button
|
||||||
|
- Remove fillers (configurable list)
|
||||||
|
- Remove long pauses (>0.8s) by default
|
||||||
|
5. One-click audio polish chain (FFmpeg): normalize, light compression, basic noise reduction
|
||||||
|
6. Preview with synced playback, undo/redo, project save/load
|
||||||
|
7. Export MP4/audio with optional SRT/VTT captions and burned-in captions
|
||||||
|
|
||||||
|
## Recommended Additions (near-term, high ROI)
|
||||||
|
- Model-size chooser + progressive fallback (start fast, upgrade model later)
|
||||||
|
- Local GPU/CPU detection & recommended model/settings UI
|
||||||
|
- Per-project incremental transcription: re-run only edited segments
|
||||||
|
- "Preview cleaning" dry-run that highlights candidate removals before applying
|
||||||
|
- Export size/time estimator and suggested export presets
|
||||||
|
- Custom filler lists per-project and import/export of filler lists
|
||||||
|
- High-quality offline captions export (SRT + VTT + speaker labels)
|
||||||
|
- Accessibility export presets (podcast vs YouTube presets)
|
||||||
|
|
||||||
|
## Remove / Defer (Back Burner)
|
||||||
|
These broaden scope or add legal/privacy surface — defer for now.
|
||||||
|
- Voice cloning / TTS: DEFER
|
||||||
|
- Multi-track, full timeline NLE features: DEFER
|
||||||
|
- Real-time collaboration / cloud sync: DEFER
|
||||||
|
- Built-in cloud processing by default: DEFER (make optional add-on later)
|
||||||
|
|
||||||
|
## Risks & Mitigations
|
||||||
|
- Large model sizes: don't bundle large models; download on-demand and document storage location.
|
||||||
|
- Timestamp accuracy: provide manual word-adjust UI and per-segment re-run.
|
||||||
|
- FFmpeg packaging/licensing: ship platform-specific binaries or use Tauri bundling guidance; document license compliance.
|
||||||
|
|
||||||
|
## Prioritized Quick Wins
|
||||||
|
1. Model chooser UI + auto-fallback settings
|
||||||
|
2. "Preview cleaning" dry-run UI
|
||||||
|
3. Per-project incremental transcription saving
|
||||||
|
|
||||||
|
## Next Steps for Implementation
|
||||||
|
- Add model chooser UI and capability detection early in the frontend iteration.
|
||||||
|
- Implement Rust transcription command and a compact API for incremental transcription.
|
||||||
|
- Implement FFmpeg polish templates and a minimal preview pipeline.
|
||||||
|
|
||||||
|
---
|
||||||
|
Generated as requested to capture tech, tools, planned features, and the recommended add/remove/defer list.
|
||||||
Reference in New Issue
Block a user