Enhance file selection to support additional audio format (M4A) and update README.md to reflect new supported formats for video and audio recordings.

This commit is contained in:
Your Name
2025-08-05 11:18:36 -04:00
parent 3346b0df0f
commit f04853eba9
7 changed files with 203 additions and 16 deletions

View File

@ -11,7 +11,6 @@ on:
env: env:
REGISTRY: ghcr.io REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs: jobs:
build: build:
@ -34,11 +33,15 @@ jobs:
username: ${{ github.actor }} username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }} password: ${{ secrets.GITHUB_TOKEN }}
- name: Convert repository name to lowercase
id: lowercase-repo
run: echo "repository=$(echo ${{ github.repository }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
- name: Extract metadata - name: Extract metadata
id: meta id: meta
uses: docker/metadata-action@v5 uses: docker/metadata-action@v5
with: with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} images: ${{ env.REGISTRY }}/${{ steps.lowercase-repo.outputs.repository }}
tags: | tags: |
type=ref,event=branch type=ref,event=branch
type=ref,event=pr type=ref,event=pr
@ -66,8 +69,8 @@ jobs:
platforms: linux/amd64 platforms: linux/amd64
push: true push: true
tags: | tags: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest-gpu ${{ env.REGISTRY }}/${{ steps.lowercase-repo.outputs.repository }}:latest-gpu
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}-gpu ${{ env.REGISTRY }}/${{ steps.lowercase-repo.outputs.repository }}:${{ github.sha }}-gpu
labels: ${{ steps.meta.outputs.labels }} labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha cache-from: type=gha
cache-to: type=gha,mode=max cache-to: type=gha,mode=max

63
FIX-GITHUB-ACTIONS.md Normal file
View File

@ -0,0 +1,63 @@
# 🔧 GitHub Actions Fix Applied
## What Happened
The GitHub Actions build failed because Docker registry tags must be lowercase, but the repository name "DataAnts-AI" contains uppercase letters.
**Error**:
```
invalid tag "ghcr.io/DataAnts-AI/VideoTranscriber:latest-gpu": repository name must be lowercase
```
## What I Fixed
1. **Added Lowercase Conversion Step**:
```yaml
- name: Convert repository name to lowercase
id: lowercase-repo
run: echo "repository=$(echo ${{ github.repository }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
```
2. **Updated All References** to use the lowercase repository name:
- Metadata extraction
- Main image build
- GPU image build
## Current Status
- ✅ **Fix Applied**: All workflow files updated
- ⏳ **Waiting**: Next push will trigger corrected build
- 📦 **Images**: Will be available at `ghcr.io/dataants-ai/videotranscriber:latest`
## What You Should Do Now
### Option 1: Wait for Prebuilt Images (Recommended)
```bash
# Once the GitHub Actions complete successfully, you can use:
docker-compose -f docker-compose.prebuilt.yml up -d
```
### Option 2: Use Local Build (Immediate Fix)
```bash
# Use the fixed local build process:
docker-compose down
docker-compose up -d --build
```
## How to Check if Prebuilt Images are Ready
```bash
# Check if the image is available
docker pull ghcr.io/dataants-ai/videotranscriber:latest
# If successful, you can use prebuilt images
docker-compose -f docker-compose.prebuilt.yml up -d
```
## Expected Timeline
- **Immediate**: Local builds work with fixed dependencies
- **~20-30 minutes**: GitHub Actions will complete and publish images
- **Future**: All builds will use reliable prebuilt images
The core application fix (PyTorch version compatibility) is already in place, so local builds should work perfectly now!

115
M4A-SUPPORT.md Normal file
View File

@ -0,0 +1,115 @@
# 🎵 M4A Audio File Support
VideoTranscriber now supports M4A audio files! This format is commonly used by:
- Apple devices (iPhone, iPad, Mac)
- Voice recording apps
- Audio podcasts and interviews
- High-quality audio recordings
## What's New
### Supported Formats
- **Video**: MP4, AVI, MOV, MKV
- **Audio**: M4A ✨ (NEW!)
### How It Works
1. **Direct Processing**: M4A files are processed directly without audio extraction
2. **Same Features**: All transcription and summarization features work with M4A
3. **Optimized Performance**: Faster processing since no video-to-audio conversion needed
## Usage Examples
### Common M4A Sources
#### iPhone Voice Memos
```
/Users/yourname/Desktop/recordings/
├── meeting_notes.m4a
├── interview.m4a
└── lecture.m4a
```
#### Podcast Recordings
```
/path/to/podcast/
├── episode_001.m4a
├── episode_002.m4a
└── bonus_content.m4a
```
#### Professional Audio
```
/path/to/audio/
├── conference_call.m4a
├── client_interview.m4a
└── webinar_recording.m4a
```
## Technical Details
### File Processing
- **M4A Files**: Processed directly by Whisper (no conversion needed)
- **Video Files**: Audio extracted first, then processed
- **Performance**: M4A processing is faster due to no extraction step
### Quality Considerations
- M4A supports high-quality audio compression
- Better quality = more accurate transcription
- Recommended: 44.1kHz, 16-bit or higher
### Supported Features with M4A
-**Transcription**: Full Whisper model support
-**Summarization**: Ollama and HuggingFace models
-**Speaker Diarization**: Identify different speakers
-**Translation**: Multi-language support
-**Keyword Extraction**: Important terms with timestamps
-**Export Formats**: SRT, ASS, VTT subtitles
-**Caching**: Faster re-processing
## Getting Started
1. **Place M4A files** in your designated recordings folder
2. **Launch VideoTranscriber**
3. **Select folder** containing your M4A files
4. **Choose file** from dropdown (M4A files will appear alongside videos)
5. **Process** as normal - all features work identically!
## Troubleshooting
### M4A Files Not Appearing
- Check file extension is `.m4a` (lowercase)
- Ensure files are in the selected folder
- Verify no file corruption
### Processing Issues
- M4A files with DRM protection may not work
- Very large files (>2GB) may need more memory
- Ensure FFmpeg is properly installed
### Quality Issues
- Low-quality recordings may have poor transcription
- Background noise affects accuracy
- Consider noise reduction before processing
## Comparison: M4A vs Video Files
| Feature | M4A Files | Video Files |
|---------|-----------|-------------|
| Processing Speed | ⚡ Faster | Slower (audio extraction needed) |
| File Size | 📦 Smaller | Larger |
| Quality | 🎵 Audio-optimized | May have video artifacts |
| Compatibility | ✅ Direct support | ✅ Supported (via extraction) |
| Use Cases | Interviews, podcasts, meetings | Screen recordings, presentations |
## Future Enhancements
Planned improvements for audio file support:
- Additional audio formats (WAV, FLAC, AAC)
- Batch processing for multiple M4A files
- Audio quality analysis and optimization suggestions
- Integration with cloud audio services
---
M4A support makes VideoTranscriber more versatile for pure audio content while maintaining all the powerful transcription and analysis features you expect!

View File

@ -17,13 +17,16 @@ docker-compose up -d --build
## Better Solution: Use Prebuilt Images ## Better Solution: Use Prebuilt Images
Once available, use the prebuilt images instead: ⚠️ **Note**: GitHub Actions had a naming issue that's now fixed. See [FIX-GITHUB-ACTIONS.md](FIX-GITHUB-ACTIONS.md) for details.
Once prebuilt images are available, use them instead:
```bash ```bash
# Stop current container # Check if images are ready
docker-compose down docker pull ghcr.io/dataants-ai/videotranscriber:latest
# Use prebuilt image (no build required) # If successful, stop current container and use prebuilt image
docker-compose down
docker-compose -f docker-compose.prebuilt.yml up -d docker-compose -f docker-compose.prebuilt.yml up -d
``` ```

View File

@ -1,7 +1,9 @@
# Video Transcriber # Video Transcriber
## Project Overview ## Project Overview
The video Recording Transcriber is a Python application built with Streamlit that processes video recordings (particularly from OBS Studio) to generate transcripts and summaries using AI models. The application uses Whisper for transcription and Hugging Face Transformers for summarization. The Video Recording Transcriber is a Python application built with Streamlit that processes video and audio recordings to generate transcripts and summaries using AI models. The application uses Whisper for transcription and Hugging Face Transformers for summarization.
**Supported Formats**: MP4, AVI, MOV, MKV (video) and M4A (audio)
![SuiteQL_query_UI-1-Thumbnail](https://github.com/user-attachments/assets/72aaf238-6615-4739-b77f-c4eb9ff96996) ![SuiteQL_query_UI-1-Thumbnail](https://github.com/user-attachments/assets/72aaf238-6615-4739-b77f-c4eb9ff96996)
@ -84,8 +86,8 @@ streamlit run app.py
``` ```
## Usage ## Usage
1. Set your base folder where OBS recordings are stored 1. Set your base folder where video/audio recordings are stored
2. Select a recording from the dropdown 2. Select a recording from the dropdown (supports MP4, AVI, MOV, MKV, M4A)
3. Choose transcription and summarization models 3. Choose transcription and summarization models
4. Configure performance settings (GPU acceleration, caching) 4. Configure performance settings (GPU acceleration, caching)
5. Select export formats and compression options 5. Select export formats and compression options

6
app.py
View File

@ -319,15 +319,15 @@ def main():
st.markdown(f"- {error}") st.markdown(f"- {error}")
return return
# File selection - support multiple video formats # File selection - support multiple video and audio formats
supported_extensions = ["*.mp4", "*.avi", "*.mov", "*.mkv"] supported_extensions = ["*.mp4", "*.avi", "*.mov", "*.mkv", "*.m4a"]
recordings = [] recordings = []
for extension in supported_extensions: for extension in supported_extensions:
recordings.extend(base_path.glob(extension)) recordings.extend(base_path.glob(extension))
if not recordings: if not recordings:
st.warning(f"📂 No recordings found in the folder: {base_folder}!") st.warning(f"📂 No recordings found in the folder: {base_folder}!")
st.info("💡 Supported formats: MP4, AVI, MOV, MKV") st.info("💡 Supported formats: MP4, AVI, MOV, MKV, M4A")
return return
selected_file = st.selectbox("Choose a recording", recordings) selected_file = st.selectbox("Choose a recording", recordings)

View File

@ -51,8 +51,9 @@ def transcribe_audio(audio_path: Path, model=WHISPER_MODEL, use_cache=True, cach
logger.info(f"Using cached transcription for {audio_path}") logger.info(f"Using cached transcription for {audio_path}")
return cached_data.get("segments", []), cached_data.get("transcript", "") return cached_data.get("segments", []), cached_data.get("transcript", "")
# Extract audio if the input is a video file # Extract audio if the input is a video file (M4A is already audio)
if audio_path.suffix.lower() in ['.mp4', '.avi', '.mov', '.mkv']: video_extensions = ['.mp4', '.avi', '.mov', '.mkv']
if audio_path.suffix.lower() in video_extensions:
audio_path = extract_audio(audio_path) audio_path = extract_audio(audio_path)
# Configure GPU if available and requested # Configure GPU if available and requested