trying to fix export issue and waveform load

This commit is contained in:
2026-04-15 21:51:05 -06:00
parent 168676a9e9
commit 0c7a4c94c2
6 changed files with 157 additions and 45 deletions

View File

@ -4,41 +4,23 @@ Features are grouped by priority. Check off items as they are implemented.
---
## 🔴 High Priority — Core editing gaps
## 🔴 Highest Impact Next — Conversion and retention
- [x] [#001] **Cut / Mute sections**select a time range and choose to cut (remove entirely) or mute (silence audio while video continues). Cut sections show as red overlays, mute sections as transparent blue overlays on the timeline over the transcript text and audio waveform. Backend: `ffmpeg -af volume=0` for mute, time-based cutting for removal.
- [x] [#002] **Silence / pause trimmer (in progress)** — detect pauses using min duration (ms) + amplitude threshold (dB), then apply detected pauses as cut ranges. Initial endpoint: `/audio/detect-silence`; UI includes filter controls and an "Apply As Cuts" action.
- [x] [#003] **Operation-level undo for batch actions** — explicit undo entry for actions like "Apply Silence Trim" so one shortcut/click reverts the whole operation, while still allowing normal fine-grained undo/redo steps.
- [x] [#004] **Grouped silence-trim zones (editable batch)** — when pauses are applied, tag them as a batch (`trim_group_id`) so the user can: (1) delete all zones from that auto-trim pass at once, and (2) still select/resize/delete individual zones independently.
- [x] [#005] **Edit silence-trim group settings after apply** — allow reopening a trim group and changing its detection settings (min pause ms, threshold dB, pre/post buffers), then reapplying updates to that group without affecting unrelated edits.
- [x] [#006] **Volume / gain control** — per-selection or global audio gain slider. Every editor has this. Descript users constantly complain it's missing. Backend: `ffmpeg -af volume=Xdb`.
- [x] [#007] **Speed adjustment (4th zone type)** — add speed zones as the fourth editable timeline/transcript zone type (after cut, mute, gain), allowing slow/fast playback per range or globally. Backend: `ffmpeg -filter:v setpts` + `atempo`. Common use case: slightly speed up boring sections.
- [x] [#008] **Cut preview** — before committing a delete, play what the audio will sound like with that section removed (pre-listen across the edit point). Pure frontend using Web Audio API — splice the AudioBuffer and play the join.
- [x] [#009] **Timeline shows output length** — deleted regions should visually collapse (or show as narrow gaps) so the user sees the *output* duration, not just the source duration.
---
## 🟡 Medium Priority — Widely expected features
- [x] [#010] **Transcript search (Ctrl+F)** — find words/phrases in the transcript and highlight matches. Pure frontend. Critical for long-form content. Jump between matches with Enter.
- [x] [#011] **Mark In / Out + delete (I / O keys)** — keyboard shortcuts to mark a time range on the timeline, then delete it. Faster than click-dragging words. Store the in/out points in state, `Delete` removes them.
- [ ] [#012] **Low-confidence word highlighting** — WhisperX already returns `confidence` per word. Words below a threshold (e.g. < 0.6) should be visually underlined or tinted so the user knows where to double-check.
- [ ] [#015] **Word text correction**allow editing the transcript text of a word without affecting its timing. Whisper gets homophones/proper nouns wrong constantly. Pure frontend state change; no backend needed.
- [ ] [#013] **Re-transcribe selection** — if Whisper gets a section wrong, let the user select a word range and re-run transcription on just that segment (optionally with a different model or language).
- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.
- [ ] [#012] **Low-confidence word highlighting** — WhisperX already returns `confidence` per word. Words below a threshold (e.g. < 0.6) should be visually underlined or tinted so the user knows where to double-check.
- [ ] [#015] **Word text correction** allow editing the transcript text of a word without affecting its timing. Whisper gets homophones/proper nouns wrong constantly. Pure frontend state change; no backend needed.
- [ ] [#018] **Audio normalization / loudness targeting** single "Normalize" button that targets a LUFS level (-14 for YouTube, -16 for Spotify). Backend: `ffmpeg -af loudnorm`. Very high value for podcasters, ~23 hours of work.
- [ ] [#024] **Export to transcript text / SRT only** some users just want a clean `.txt` or `.srt` of the edited transcript without rendering video.
- [ ] [#023] **Batch silence removal** full-file scan + remove all pauses above threshold in one click. Distinct from the manual trimmer above; this is a "fix the whole file" operation.
---
## 🟡 Medium Impact — Workflow completeness
- [ ] [#016] **Named timeline markers** drop named marker pins on the waveform (like Resolve markers). Store as `{ id, time, label, color }` in the project. Rendered as colored triangles on the timeline canvas.
@ -46,23 +28,35 @@ Features are grouped by priority. Check off items as they are implemented.
- [ ] [#041] **Customizable hotkeys / keymap editor (left-hand focused)** allow users to view, remap, and reset keyboard shortcuts (transport, edit, save/export, zone tools), with a default preset optimized for left-hand reach (Q/W/E/R/A/S/D/F/Z/X/C/V + modifiers). Include conflict detection, an alternate standard preset, and one-click "restore defaults".
- [ ] [#022] **Clip thumbnail strip** video frame thumbnails along the timeline so users can navigate visually, not only by waveform. Backend: `ffmpeg` thumbnail extraction at regular intervals.
---
## 🟢 Lower Priority — Differentiating / power features
- [ ] [#018] **Audio normalization / loudness targeting** single "Normalize" button that targets a LUFS level (-14 for YouTube, -16 for Spotify). Backend: `ffmpeg -af loudnorm`. Very high value for podcasters, ~23 hours of work.
- [ ] [#019] **Background music track** a second audio track for background music with volume ducking. Major gap in Descript that TalkEdit could own. Backend: `ffmpeg` amix + `asendcmd` for auto-ducking.
## 🟢 Lower Impact — Expansion and advanced scope
- [ ] [#020] **Video zoom / punch-in** scale and position the video (crop, zoom, pan). Used constantly on talking-head videos for emphasis. Backend: `ffmpeg -vf crop/scale/zoompan`.
- [ ] [#021] **Multi-clip / append** load a second video and append it to the timeline. Even without a full multi-track timeline, "append clip" is a heavily used workflow.
- [ ] [#022] **Clip thumbnail strip** video frame thumbnails along the timeline so users can navigate visually, not only by waveform. Backend: `ffmpeg` thumbnail extraction at regular intervals.
- [ ] [#019] **Background music track** a second audio track for background music with volume ducking. Major gap in Descript that TalkEdit could own. Backend: `ffmpeg` amix + `asendcmd` for auto-ducking.
- [ ] [#023] **Batch silence removal** full-file scan + remove all pauses above threshold in one click. Distinct from the manual trimmer above; this is a "fix the whole file" operation.
- [ ] [#014] **Optional VibeVoice-ASR-HF transcription backend (future)** evaluate as an alternate transcription mode for long-form, speaker-attributed transcripts. Keep WhisperX as the default for word-level timestamp editing.
- [ ] [#024] **Export to transcript text / SRT only** some users just want a clean `.txt` or `.srt` of the edited transcript without rendering video.
---
## ✅ Completed high-impact foundation
- [x] [#001] **Cut / Mute sections**
- [x] [#002] **Silence / pause trimmer**
- [x] [#003] **Operation-level undo for batch actions**
- [x] [#004] **Grouped silence-trim zones (editable batch)**
- [x] [#005] **Edit silence-trim group settings after apply**
- [x] [#006] **Volume / gain control**
- [x] [#007] **Speed adjustment (4th zone type)**
- [x] [#008] **Cut preview**
- [x] [#009] **Timeline shows output length**
- [x] [#010] **Transcript search (Ctrl+F)**
- [x] [#011] **Mark In / Out + delete (I / O keys)**
---

View File

@ -13,6 +13,24 @@ from typing import List
logger = logging.getLogger(__name__)
def _input_has_video_stream(ffmpeg_cmd: str, input_path: str) -> bool:
"""Return True if the input contains at least one video stream."""
ffprobe = ffmpeg_cmd.replace("ffmpeg", "ffprobe")
cmd = [
ffprobe,
"-v", "error",
"-select_streams", "v:0",
"-show_entries", "stream=index",
"-of", "csv=p=0",
str(input_path),
]
try:
result = subprocess.run(cmd, capture_output=True, text=True)
return result.returncode == 0 and bool(result.stdout.strip())
except Exception:
return False
def _clamp_speed(speed: float) -> float:
return max(0.25, min(4.0, float(speed)))
@ -120,6 +138,10 @@ def export_stream_copy(
# Mute ranges require audio filtering, so fall back to re-encoding
return export_reencode(input_path, output_path, keep_segments, "1080p", "mp4", mute_ranges)
ffmpeg = _find_ffmpeg()
if not _input_has_video_stream(ffmpeg, input_path):
# Audio-only inputs cannot use TS segment stream-copy concat reliably.
return export_reencode(input_path, output_path, keep_segments)
input_path = str(Path(input_path).resolve())
output_path = str(Path(output_path).resolve())
@ -222,10 +244,66 @@ def export_reencode(
return ",".join(filters) if filters else "anull"
has_audio_filters = bool(mute_ranges) or bool(gain_ranges) or abs(float(global_gain_db)) > 1e-6
has_video = _input_has_video_stream(ffmpeg, input_path)
speed_segments = _split_keep_segments_by_speed(keep_segments, speed_ranges)
has_speed = any(abs(seg.get("speed", 1.0) - 1.0) > 1e-6 for seg in speed_segments)
if not has_video:
if not keep_segments:
raise ValueError("No segments to export")
segments_for_concat = speed_segments if speed_segments else _split_keep_segments_by_speed(keep_segments, None)
if not segments_for_concat:
raise ValueError("No segments to export")
filter_parts = []
for i, seg in enumerate(segments_for_concat):
speed = _clamp_speed(seg.get("speed", 1.0))
a_chain = f"atrim=start={seg['start']}:end={seg['end']},asetpts=PTS-STARTPTS"
if abs(speed - 1.0) > 1e-6:
a_chain += f",{_build_atempo_chain(speed)}"
filter_parts.append(f"[0:a]{a_chain}[a{i}];")
n = len(segments_for_concat)
concat_inputs = "".join(f"[a{i}]" for i in range(n))
filter_parts.append(f"{concat_inputs}concat=n={n}:v=0:a=1[outa_raw]")
audio_filter = build_audio_filter()
if audio_filter != "anull":
filter_parts.append(f";[outa_raw]{audio_filter}[outa]")
audio_map = "[outa]"
else:
audio_map = "[outa_raw]"
filter_complex = "".join(filter_parts)
audio_codec_args = ["-c:a", "aac", "-b:a", "192k"]
if format_hint == "webm":
audio_codec_args = ["-c:a", "libopus", "-b:a", "160k"]
cmd = [
ffmpeg, "-y",
"-i", input_path,
"-filter_complex", filter_complex,
"-map", audio_map,
*audio_codec_args,
output_path,
]
logger.info(
"Re-encoding audio-only input (%s segments, speed-adjusted=%s) -> %s",
n,
has_speed,
output_path,
)
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f"FFmpeg audio-only export failed: {result.stderr[-500:]}")
return output_path
# Handle filtered full-timeline audio case (mute/gain/global gain) when no speed warping is needed
if has_audio_filters and not has_speed:
audio_filter = build_audio_filter()
@ -344,6 +422,9 @@ def export_reencode_with_subs(
If mute_ranges are provided, applies audio muting instead of cutting.
"""
ffmpeg = _find_ffmpeg()
if not _input_has_video_stream(ffmpeg, input_path):
raise ValueError("Burn-in captions require a video track")
input_path = str(Path(input_path).resolve())
output_path = str(Path(output_path).resolve())
subtitle_path = str(Path(subtitle_path).resolve())

View File

@ -16,6 +16,7 @@ export default function ExportDialog() {
enhanceAudio: false,
captions: 'none',
});
const [exportError, setExportError] = useState<string | null>(null);
const handleExport = useCallback(async () => {
if (!videoPath) return;
@ -31,6 +32,7 @@ export default function ExportDialog() {
if (!outputPath) return;
setExporting(true, 0);
setExportError(null);
try {
const keepSegments = getKeepSegments();
@ -58,10 +60,20 @@ export default function ExportDialog() {
...options,
}),
});
if (!res.ok) throw new Error(`Export failed: ${res.statusText}`);
if (!res.ok) {
let detail = res.statusText;
try {
const body = await res.json();
if (body?.detail) detail = String(body.detail);
} catch {
// Keep statusText fallback when response body is not JSON.
}
throw new Error(`Export failed: ${detail}`);
}
setExporting(false, 100);
} catch (err) {
console.error('Export error:', err);
setExportError(err instanceof Error ? err.message : 'Export failed');
setExporting(false);
}
}, [videoPath, options, backendUrl, setExporting, getKeepSegments, cutRanges, muteRanges, gainRanges, speedRanges, globalGainDb, words]);
@ -159,6 +171,12 @@ export default function ExportDialog() {
)}
</button>
{exportError && (
<div className="rounded border border-red-500/40 bg-red-500/10 px-3 py-2 text-xs text-red-300">
{exportError}
</div>
)}
{options.mode === 'fast' && !hasCuts && (
<p className="text-[10px] text-editor-text-muted text-center">
Fast mode uses stream copy &mdash; no quality loss, exports in seconds.

View File

@ -354,7 +354,11 @@ export default function WaveformTimeline({
encodedPath: encodeURIComponent(videoPath ?? ''),
});
const waveformUrl2 = `${backendUrl}/audio/waveform?path=${encodeURIComponent(videoPath ?? '')}`;
setAudioError(`Waveform unavailable — ${err instanceof Error ? err.message : 'audio could not be decoded'} [URL: ${waveformUrl2}]`);
const errMessage = err instanceof Error ? err.message : 'audio could not be decoded';
const userMessage = errMessage === 'Load failed'
? `Cannot reach backend at ${backendUrl}. Start the backend server and reload the project.`
: errMessage;
setAudioError(`Waveform unavailable — ${userMessage} [URL: ${waveformUrl2}]`);
}
};

View File

@ -11,6 +11,9 @@ import { invoke } from '@tauri-apps/api/core';
import { open, save } from '@tauri-apps/plugin-dialog';
import { readTextFile, writeTextFile } from '@tauri-apps/plugin-fs';
const backendPort = import.meta.env.VITE_BACKEND_PORT || '8642';
const backendUrl = `http://127.0.0.1:${backendPort}`;
const VIDEO_FILTERS = [
{ name: 'Audio and Video Files', extensions: ['mp4', 'avi', 'mov', 'mkv', 'webm', 'm4a', 'wav', 'mp3', 'flac'] },
{ name: 'All Files', extensions: ['*'] },
@ -35,9 +38,13 @@ window.electronAPI = {
return typeof result === 'string' ? result : null;
},
saveFile: async (_options?: Record<string, unknown>): Promise<string | null> => {
void _options;
const result = await save({ filters: EXPORT_FILTERS });
saveFile: async (options?: Record<string, unknown>): Promise<string | null> => {
const result = await save({
defaultPath: typeof options?.defaultPath === 'string' ? options.defaultPath : undefined,
filters: Array.isArray(options?.filters)
? (options.filters as Array<{ name: string; extensions: string[] }>)
: EXPORT_FILTERS,
});
return result ?? null;
},
@ -61,8 +68,8 @@ window.electronAPI = {
},
getBackendUrl: (): Promise<string> => {
// Backend URL is fixed; avoid invoke() which triggers ipc:// CSP errors on Linux/WebKit2GTK
return Promise.resolve('http://127.0.0.1:8000');
// Avoid invoke() here because Linux/WebKit2GTK can log noisy ipc:// CSP warnings.
return Promise.resolve(backendUrl);
},
encryptString: (data: string): Promise<string> => {

View File

@ -1,5 +1,13 @@
/// <reference types="vite/client" />
interface ImportMetaEnv {
readonly VITE_BACKEND_PORT?: string;
}
interface ImportMeta {
readonly env: ImportMetaEnv;
}
interface DesktopAPI {
openFile: (options?: Record<string, unknown>) => Promise<string | null>;
saveFile: (options?: Record<string, unknown>) => Promise<string | null>;