Enhance file selection to support additional audio format (M4A) and update README.md to reflect new supported formats for video and audio recordings.

2025-08-05 11:18:36 -04:00
parent 3346b0df0f
commit f04853eba9
7 changed files with 203 additions and 16 deletions
--- a/.github/workflows/docker-build.yml
+++ b/.github/workflows/docker-build.yml
@ -11,7 +11,6 @@ on:
 env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
 jobs:
  build:
@ -34,11 +33,15 @@ jobs:
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    - name: Convert repository name to lowercase
      id: lowercase-repo
      run: echo "repository=$(echo ${{ github.repository }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
-        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
+        images: ${{ env.REGISTRY }}/${{ steps.lowercase-repo.outputs.repository }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
@ -66,8 +69,8 @@ jobs:
        platforms: linux/amd64
        push: true
        tags: |
-          ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:latest-gpu
+          ${{ env.REGISTRY }}/${{ steps.lowercase-repo.outputs.repository }}:latest-gpu
-          ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}-gpu
+          ${{ env.REGISTRY }}/${{ steps.lowercase-repo.outputs.repository }}:${{ github.sha }}-gpu
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max 
--- a/FIX-GITHUB-ACTIONS.md
+++ b/FIX-GITHUB-ACTIONS.md
@ -0,0 +1,63 @@
 # 🔧 GitHub Actions Fix Applied
 ## What Happened
 The GitHub Actions build failed because Docker registry tags must be lowercase, but the repository name "DataAnts-AI" contains uppercase letters.
 **Error**: 
 ```
 invalid tag "ghcr.io/DataAnts-AI/VideoTranscriber:latest-gpu": repository name must be lowercase
 ```
 ## What I Fixed
 1. **Added Lowercase Conversion Step**:
   ```yaml
   - name: Convert repository name to lowercase
     id: lowercase-repo
     run: echo "repository=$(echo ${{ github.repository }} | tr '[:upper:]' '[:lower:]')" >> $GITHUB_OUTPUT
   ```
 2. **Updated All References** to use the lowercase repository name:
   - Metadata extraction
   - Main image build
   - GPU image build
 ## Current Status
 - ✅ **Fix Applied**: All workflow files updated
 - ⏳ **Waiting**: Next push will trigger corrected build
 - 📦 **Images**: Will be available at `ghcr.io/dataants-ai/videotranscriber:latest`
 ## What You Should Do Now
 ### Option 1: Wait for Prebuilt Images (Recommended)
 ```bash
 # Once the GitHub Actions complete successfully, you can use:
 docker-compose -f docker-compose.prebuilt.yml up -d
 ```
 ### Option 2: Use Local Build (Immediate Fix)
 ```bash
 # Use the fixed local build process:
 docker-compose down
 docker-compose up -d --build
 ```
 ## How to Check if Prebuilt Images are Ready
 ```bash
 # Check if the image is available
 docker pull ghcr.io/dataants-ai/videotranscriber:latest
 # If successful, you can use prebuilt images
 docker-compose -f docker-compose.prebuilt.yml up -d
 ```
 ## Expected Timeline
 - **Immediate**: Local builds work with fixed dependencies
 - **~20-30 minutes**: GitHub Actions will complete and publish images
 - **Future**: All builds will use reliable prebuilt images
 The core application fix (PyTorch version compatibility) is already in place, so local builds should work perfectly now! 
--- a/M4A-SUPPORT.md
+++ b/M4A-SUPPORT.md
@ -0,0 +1,115 @@
 # 🎵 M4A Audio File Support
 VideoTranscriber now supports M4A audio files! This format is commonly used by:
 - Apple devices (iPhone, iPad, Mac)
 - Voice recording apps
 - Audio podcasts and interviews
 - High-quality audio recordings
 ## What's New
 ### Supported Formats
 - **Video**: MP4, AVI, MOV, MKV
 - **Audio**: M4A ✨ (NEW!)
 ### How It Works
 1. **Direct Processing**: M4A files are processed directly without audio extraction
 2. **Same Features**: All transcription and summarization features work with M4A
 3. **Optimized Performance**: Faster processing since no video-to-audio conversion needed
 ## Usage Examples
 ### Common M4A Sources
 #### iPhone Voice Memos
 ```
 /Users/yourname/Desktop/recordings/
 ├── meeting_notes.m4a
 ├── interview.m4a
 └── lecture.m4a
 ```
 #### Podcast Recordings
 ```
 /path/to/podcast/
 ├── episode_001.m4a
 ├── episode_002.m4a
 └── bonus_content.m4a
 ```
 #### Professional Audio
 ```
 /path/to/audio/
 ├── conference_call.m4a
 ├── client_interview.m4a
 └── webinar_recording.m4a
 ```
 ## Technical Details
 ### File Processing
 - **M4A Files**: Processed directly by Whisper (no conversion needed)
 - **Video Files**: Audio extracted first, then processed
 - **Performance**: M4A processing is faster due to no extraction step
 ### Quality Considerations
 - M4A supports high-quality audio compression
 - Better quality = more accurate transcription
 - Recommended: 44.1kHz, 16-bit or higher
 ### Supported Features with M4A
 - ✅ **Transcription**: Full Whisper model support
 - ✅ **Summarization**: Ollama and HuggingFace models
 - ✅ **Speaker Diarization**: Identify different speakers
 - ✅ **Translation**: Multi-language support
 - ✅ **Keyword Extraction**: Important terms with timestamps
 - ✅ **Export Formats**: SRT, ASS, VTT subtitles
 - ✅ **Caching**: Faster re-processing
 ## Getting Started
 1. **Place M4A files** in your designated recordings folder
 2. **Launch VideoTranscriber**
 3. **Select folder** containing your M4A files
 4. **Choose file** from dropdown (M4A files will appear alongside videos)
 5. **Process** as normal - all features work identically!
 ## Troubleshooting
 ### M4A Files Not Appearing
 - Check file extension is `.m4a` (lowercase)
 - Ensure files are in the selected folder
 - Verify no file corruption
 ### Processing Issues
 - M4A files with DRM protection may not work
 - Very large files (>2GB) may need more memory
 - Ensure FFmpeg is properly installed
 ### Quality Issues
 - Low-quality recordings may have poor transcription
 - Background noise affects accuracy
 - Consider noise reduction before processing
 ## Comparison: M4A vs Video Files
 | Feature | M4A Files | Video Files |
 |---------|-----------|-------------|
 | Processing Speed | ⚡ Faster | Slower (audio extraction needed) |
 | File Size | 📦 Smaller | Larger |
 | Quality | 🎵 Audio-optimized | May have video artifacts |
 | Compatibility | ✅ Direct support | ✅ Supported (via extraction) |
 | Use Cases | Interviews, podcasts, meetings | Screen recordings, presentations |
 ## Future Enhancements
 Planned improvements for audio file support:
 - Additional audio formats (WAV, FLAC, AAC)
 - Batch processing for multiple M4A files
 - Audio quality analysis and optimization suggestions
 - Integration with cloud audio services
 ---
 M4A support makes VideoTranscriber more versatile for pure audio content while maintaining all the powerful transcription and analysis features you expect! 
--- a/QUICK-FIX.md
+++ b/QUICK-FIX.md
@ -17,13 +17,16 @@ docker-compose up -d --build
 ## Better Solution: Use Prebuilt Images
-Once available, use the prebuilt images instead:
+⚠️ **Note**: GitHub Actions had a naming issue that's now fixed. See [FIX-GITHUB-ACTIONS.md](FIX-GITHUB-ACTIONS.md) for details.
 Once prebuilt images are available, use them instead:
 ```bash
-# Stop current container
+# Check if images are ready
-docker-compose down
+docker pull ghcr.io/dataants-ai/videotranscriber:latest
-# Use prebuilt image (no build required)
+# If successful, stop current container and use prebuilt image
 docker-compose down
 docker-compose -f docker-compose.prebuilt.yml up -d
 ```
--- a/README.md
+++ b/README.md
@ -1,7 +1,9 @@
 # Video Transcriber
 ## Project Overview
-The video Recording Transcriber is a Python application built with Streamlit that processes video recordings (particularly from OBS Studio) to generate transcripts and summaries using AI models. The application uses Whisper for transcription and Hugging Face Transformers for summarization.
+The Video Recording Transcriber is a Python application built with Streamlit that processes video and audio recordings to generate transcripts and summaries using AI models. The application uses Whisper for transcription and Hugging Face Transformers for summarization.
 **Supported Formats**: MP4, AVI, MOV, MKV (video) and M4A (audio)
 ![SuiteQL_query_UI-1-Thumbnail](https://github.com/user-attachments/assets/72aaf238-6615-4739-b77f-c4eb9ff96996)
@ -84,8 +86,8 @@ streamlit run app.py
 ```
 ## Usage
-1. Set your base folder where OBS recordings are stored
+1. Set your base folder where video/audio recordings are stored
-2. Select a recording from the dropdown
+2. Select a recording from the dropdown (supports MP4, AVI, MOV, MKV, M4A)
 3. Choose transcription and summarization models
 4. Configure performance settings (GPU acceleration, caching)
 5. Select export formats and compression options
--- a/app.py
+++ b/app.py
@ -319,15 +319,15 @@ def main():
            st.markdown(f"- {error}")
        return
-    # File selection - support multiple video formats
+    # File selection - support multiple video and audio formats
-    supported_extensions = ["*.mp4", "*.avi", "*.mov", "*.mkv"]
+    supported_extensions = ["*.mp4", "*.avi", "*.mov", "*.mkv", "*.m4a"]
    recordings = []
    for extension in supported_extensions:
        recordings.extend(base_path.glob(extension))
    if not recordings:
        st.warning(f"📂 No recordings found in the folder: {base_folder}!")
-        st.info("💡 Supported formats: MP4, AVI, MOV, MKV")
+        st.info("💡 Supported formats: MP4, AVI, MOV, MKV, M4A")
        return
    selected_file = st.selectbox("Choose a recording", recordings)
--- a/utils/transcription.py
+++ b/utils/transcription.py
@ -51,8 +51,9 @@ def transcribe_audio(audio_path: Path, model=WHISPER_MODEL, use_cache=True, cach
            logger.info(f"Using cached transcription for {audio_path}")
            return cached_data.get("segments", []), cached_data.get("transcript", "")
-    # Extract audio if the input is a video file
+    # Extract audio if the input is a video file (M4A is already audio)
-    if audio_path.suffix.lower() in ['.mp4', '.avi', '.mov', '.mkv']:
+    video_extensions = ['.mp4', '.avi', '.mov', '.mkv']
    if audio_path.suffix.lower() in video_extensions:
        audio_path = extract_audio(audio_path)
    # Configure GPU if available and requested