Enhance README.md with Docker installation instructions and update Ollama API endpoint to be configurable via environment variable.

This commit is contained in:
Your Name
2025-07-17 00:05:23 -04:00
parent c2ee2394d2
commit dcf13c1423
7 changed files with 568 additions and 2 deletions

77
.dockerignore Normal file
View File

@ -0,0 +1,77 @@
# Git and version control
.git
.gitignore
.gitattributes
# Docker files
Dockerfile
docker-compose.yml
.dockerignore
# Environment and config files
.env
.env.*
docker.env.example
# Documentation
*.md
docs/
DOCKER.md
README.md
INSTALLATION.md
GEMINI_INSIGHTS.md
# Python cache and virtual environments
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*~
# OS generated files
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# Local directories that will be mounted as volumes
videos/
outputs/
cache/
config/
# Logs
*.log
logs/
# Temporary files
tmp/
temp/
*.tmp
# Test files
tests/
*_test.py
test_*.py
# Build artifacts
build/
dist/
*.egg-info/
# Jupyter notebooks
*.ipynb
.ipynb_checkpoints/

305
DOCKER.md Normal file
View File

@ -0,0 +1,305 @@
# Docker Deployment Guide for VideoTranscriber
This guide explains how to run VideoTranscriber in a Docker container while using Ollama models on your host system.
## Architecture Overview
```
┌─────────────────────────────────────────┐
│ Host System │
│ ┌─────────────────┐ ┌──────────────────│
│ │ Ollama Service │ │ Video Files │
│ │ (port 11434) │ │ Directory │
│ └─────────────────┘ └──────────────────│
│ ▲ ▲ │
│ │ │ │
│ ┌───────┼─────────────────────┼─────────│
│ │ Docker Container │ │
│ │ ┌─────▼─────────┐ │ │
│ │ │ VideoTranscriber │ │
│ │ │ - Streamlit App │ │
│ │ │ - Whisper Models │ │
│ │ │ - ML Dependencies │ │
│ │ └───────────────┘ │ │
│ └────────────────────────────┼─────────│
│ │ │
│ Mounted Volumes ─────┘ │
└─────────────────────────────────────────┘
```
## Quick Start
### Prerequisites
1. **Docker & Docker Compose** installed
2. **Ollama running on host**:
```bash
# Install Ollama (if not already installed)
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
ollama serve
# Pull a model (in another terminal)
ollama pull llama3
```
### 1. Setup Environment
```bash
# Copy environment template
cp docker.env.example .env
# Edit .env file with your paths
# Key settings to update:
VIDEO_PATH=/path/to/your/videos
OUTPUT_PATH=/path/to/save/outputs
HF_TOKEN=your_huggingface_token_if_needed
```
### 2. Create Required Directories
```bash
# Create directories for mounting
mkdir -p videos outputs cache config
```
### 3. Build and Run
```bash
# Build and start the container
docker-compose up -d
# View logs
docker-compose logs -f
# Access the application
# Open browser to: http://localhost:8501
```
## Configuration Options
### Environment Variables
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| `VIDEO_PATH` | Host directory containing video files | `./videos` | Yes |
| `OUTPUT_PATH` | Host directory for outputs | `./outputs` | Yes |
| `CACHE_PATH` | Host directory for model cache | `./cache` | No |
| `OLLAMA_API_URL` | Ollama API endpoint | `http://host.docker.internal:11434/api` | No |
| `HF_TOKEN` | HuggingFace token for advanced features | - | No |
| `CUDA_VISIBLE_DEVICES` | GPU devices to use | - | No |
### Volume Mounts
| Host Path | Container Path | Purpose |
|-----------|----------------|---------|
| `${VIDEO_PATH}` | `/app/data/videos` | Input video files |
| `${OUTPUT_PATH}` | `/app/data/outputs` | Generated transcripts/summaries |
| `${CACHE_PATH}` | `/app/data/cache` | Model and processing cache |
| `${CONFIG_PATH}` | `/app/config` | Configuration files |
## Platform-Specific Setup
### Windows (Docker Desktop)
```yaml
# In docker-compose.yml - use bridge networking
networks:
- videotranscriber-network
environment:
- OLLAMA_API_URL=http://host.docker.internal:11434/api
```
### macOS (Docker Desktop)
Same as Windows - uses `host.docker.internal` to access host services.
### Linux
Option 1 - Host Networking (Recommended):
```yaml
# In docker-compose.yml
network_mode: host
environment:
- OLLAMA_API_URL=http://localhost:11434/api
```
Option 2 - Bridge Networking:
```yaml
environment:
- OLLAMA_API_URL=http://172.17.0.1:11434/api # Docker bridge IP
```
## GPU Support
### NVIDIA GPU Setup
1. **Install NVIDIA Container Toolkit**:
```bash
# Ubuntu/Debian
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
```
2. **Enable in docker-compose.yml**:
```yaml
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
```
## Usage in Container
### Application Settings
When running in Docker, update these settings in the VideoTranscriber UI:
1. **Base Folder**: Set to `/app/data/videos`
2. **Ollama Models**: Should auto-detect from host
3. **GPU Settings**: Will use container GPU if configured
### File Access
- **Input Videos**: Place in your `${VIDEO_PATH}` directory on host
- **Outputs**: Generated files appear in `${OUTPUT_PATH}` on host
- **Cache**: Models cached in `${CACHE_PATH}` for faster subsequent runs
## Troubleshooting
### Common Issues
#### 1. Can't Connect to Ollama
**Symptoms**: "Ollama service is not available" message
**Solutions**:
- Verify Ollama is running: `curl http://localhost:11434/api/tags`
- Check firewall settings
- For Linux, try host networking mode
- Verify OLLAMA_API_URL in environment
#### 2. No Video Files Detected
**Symptoms**: "No recordings found" message
**Solutions**:
- Check VIDEO_PATH points to correct directory
- Ensure directory contains supported formats (.mp4, .avi, .mov, .mkv)
- Check file permissions
#### 3. GPU Not Detected
**Symptoms**: Processing is slow, no GPU utilization
**Solutions**:
- Install NVIDIA Container Toolkit
- Uncomment GPU section in docker-compose.yml
- Verify: `docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi`
#### 4. Permission Issues
**Symptoms**: Cannot write to output directory
**Solutions**:
```bash
# Fix permissions
sudo chown -R $(id -u):$(id -g) outputs cache config
chmod -R 755 outputs cache config
```
### Debugging
```bash
# View container logs
docker-compose logs -f videotranscriber
# Execute shell in container
docker-compose exec videotranscriber bash
# Check Ollama connectivity from container
docker-compose exec videotranscriber curl -f $OLLAMA_API_URL/tags
# Monitor resource usage
docker stats videotranscriber
```
## Advanced Configuration
### Custom Dockerfile
For specialized requirements, modify the Dockerfile:
```dockerfile
# Add custom dependencies
RUN pip install your-custom-package
# Set custom environment variables
ENV YOUR_CUSTOM_VAR=value
# Copy custom configuration
COPY custom-config.yaml /app/config/
```
### Multi-Instance Deployment
Run multiple instances for different use cases:
```bash
# Copy docker-compose.yml to docker-compose.prod.yml
# Modify ports and paths
docker-compose -f docker-compose.prod.yml up -d
```
### CI/CD Integration
```yaml
# .github/workflows/docker.yml
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build Docker image
run: docker build -t videotranscriber .
```
## Performance Optimization
### Memory Management
```yaml
# In docker-compose.yml
deploy:
resources:
limits:
memory: 8G
reservations:
memory: 4G
```
### Model Caching
- Use persistent volumes for `/app/data/cache`
- Pre-download models to reduce startup time
- Configure appropriate cache size limits
### Network Optimization
- Use host networking on Linux for better performance
- Consider running Ollama and VideoTranscriber on same machine
- Use SSD storage for cache directories

44
Dockerfile Normal file
View File

@ -0,0 +1,44 @@
FROM python:3.11-slim
# Set working directory
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
ffmpeg \
git \
wget \
curl \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Copy requirements first for better Docker layer caching
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Install PyTorch with CUDA support (adjust based on your needs)
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Copy application code
COPY . .
# Create directories for mounted volumes
RUN mkdir -p /app/data/videos /app/data/outputs /app/data/cache
# Set environment variables
ENV STREAMLIT_SERVER_PORT=8501
ENV STREAMLIT_SERVER_ADDRESS=0.0.0.0
ENV STREAMLIT_SERVER_HEADLESS=true
ENV STREAMLIT_BROWSER_GATHER_USAGE_STATS=false
# Expose Streamlit port
EXPOSE 8501
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8501/_stcore/health || exit 1
# Start the application
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

View File

@ -13,6 +13,32 @@ https://github.com/user-attachments/assets/990e63fc-232e-46a0-afdf-ca8836d46a13
## Installation ## Installation
### 🐳 Docker Installation (Recommended)
**Benefits**: Isolated environment, no dependency conflicts, easy deployment
```bash
# 1. Clone repository
git clone https://github.com/DataAnts-AI/VideoTranscriber.git
cd VideoTranscriber
# 2. Setup environment
cp docker.env.example .env
# Edit .env with your video directory paths
# 3. Ensure Ollama is running on host
ollama serve # In separate terminal
ollama pull llama3
# 4. Start with Docker Compose
docker-compose up -d
# 5. Access application
# Open browser to: http://localhost:8501
```
See [DOCKER.md](DOCKER.md) for complete Docker setup guide.
### Easy Installation (Recommended) ### Easy Installation (Recommended)
#### Windows #### Windows

51
docker-compose.yml Normal file
View File

@ -0,0 +1,51 @@
version: '3.8'
services:
videotranscriber:
build: .
container_name: videotranscriber
ports:
- "8501:8501"
volumes:
# Mount your video files directory (change the left path to your actual videos folder)
- "${VIDEO_PATH:-./videos}:/app/data/videos"
# Mount output directory for transcripts and summaries
- "${OUTPUT_PATH:-./outputs}:/app/data/outputs"
# Mount cache directory for model caching (optional, improves performance)
- "${CACHE_PATH:-./cache}:/app/data/cache"
# Mount a config directory if needed
- "${CONFIG_PATH:-./config}:/app/config"
environment:
# Ollama configuration for host access
- OLLAMA_API_URL=${OLLAMA_API_URL:-http://host.docker.internal:11434/api}
# Optional: HuggingFace token for advanced features
- HF_TOKEN=${HF_TOKEN:-}
# GPU configuration
- CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES:-}
# Cache settings
- TRANSFORMERS_CACHE=/app/data/cache/transformers
- WHISPER_CACHE=/app/data/cache/whisper
# For GPU access (uncomment if you have NVIDIA GPU and nvidia-docker)
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
restart: unless-stopped
# For Linux hosts, you might prefer host networking for better Ollama access
# network_mode: host # Uncomment for Linux hosts
# Use bridge networking for Windows/Mac with host.docker.internal
networks:
- videotranscriber-network
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8501/_stcore/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
networks:
videotranscriber-network:
driver: bridge

63
docker.env.example Normal file
View File

@ -0,0 +1,63 @@
# VideoTranscriber Docker Configuration
# Copy this file to .env and modify the values as needed
# =============================================================================
# DOCKER VOLUME PATHS (Host Directories)
# =============================================================================
# Path to your video files directory on the host
# This directory will be mounted into the container at /app/data/videos
VIDEO_PATH=./videos
# Path where outputs (transcripts, summaries) will be saved on the host
# This directory will be mounted into the container at /app/data/outputs
OUTPUT_PATH=./outputs
# Path for caching ML models and processed files (improves performance)
# This directory will be mounted into the container at /app/data/cache
CACHE_PATH=./cache
# Optional: Configuration directory for custom settings
CONFIG_PATH=./config
# =============================================================================
# OLLAMA CONFIGURATION
# =============================================================================
# Ollama API URL - how the container accesses your host Ollama service
# For Windows/Mac with Docker Desktop: use host.docker.internal
# For Linux: use host networking or the actual host IP
OLLAMA_API_URL=http://host.docker.internal:11434/api
# =============================================================================
# ML MODEL CONFIGURATION
# =============================================================================
# HuggingFace token for advanced features (speaker diarization, etc.)
# Get your token at: https://huggingface.co/settings/tokens
# Leave empty if not using advanced features
HF_TOKEN=
# GPU Configuration
# Specify which GPU devices to use (leave empty for all available)
# Examples: "0" for first GPU, "0,1" for first two GPUs
CUDA_VISIBLE_DEVICES=
# =============================================================================
# DOCKER-SPECIFIC SETTINGS
# =============================================================================
# Container name (change if you want to run multiple instances)
CONTAINER_NAME=videotranscriber
# Port mapping (host:container)
HOST_PORT=8501
# =============================================================================
# EXAMPLE USAGE
# =============================================================================
# 1. Copy this file: cp docker.env.example .env
# 2. Edit the paths to match your system
# 3. Make sure Ollama is running on your host: ollama serve
# 4. Start the container: docker-compose up -d
# 5. Access the app at: http://localhost:8501

View File

@ -13,8 +13,8 @@ import os
logging.basicConfig(level=logging.INFO) logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Default Ollama API endpoint # Default Ollama API endpoint - configurable via environment variable
OLLAMA_API_URL = "http://localhost:11434/api" OLLAMA_API_URL = os.environ.get("OLLAMA_API_URL", "http://localhost:11434/api")
def check_ollama_available(): def check_ollama_available():