Files

Your Name 7ea098bd05 Add installation scripts and update documentation for Phase 3 features

2025-03-01 20:37:52 -05:00

3.7 KiB

Raw Blame History

Installation Guide for OBS Recording Transcriber

This guide will help you install all the necessary dependencies for the OBS Recording Transcriber application, including the advanced features from Phase 3.

Prerequisites

Before installing the Python packages, you need to set up some prerequisites:

1. Python 3.8 or higher

Make sure you have Python 3.8 or higher installed. You can download it from python.org.

2. FFmpeg

FFmpeg is required for audio processing:

Windows:
- Download from gyan.dev/ffmpeg/builds
- Extract the ZIP file
- Add the bin folder to your system PATH
macOS:
```
brew install ffmpeg
```

Linux:

sudo apt update
sudo apt install ffmpeg

3. Visual C++ Build Tools (Windows only)

Some packages like tokenizers require C++ build tools:

Download and install Visual C++ Build Tools
During installation, select "Desktop development with C++"

Installation Steps

1. Create a Virtual Environment (Recommended)

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate

2. Install PyTorch

For better performance, install PyTorch with CUDA support if you have an NVIDIA GPU:

# Windows/Linux with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# macOS or CPU-only
pip install torch torchvision torchaudio

3. Install Dependencies

# Install all dependencies from requirements.txt
pip install -r requirements.txt

4. Troubleshooting Common Issues

Tokenizers Installation Issues

If you encounter issues with tokenizers installation:

Make sure you have Visual C++ Build Tools installed (Windows)
Try installing Rust: rustup.rs

Install tokenizers separately:

pip install tokenizers --no-binary tokenizers

PyAnnote.Audio Access

To use speaker diarization, you need a HuggingFace token with access to the pyannote models:

Create an account on HuggingFace
Generate an access token at huggingface.co/settings/tokens
Request access to pyannote/speaker-diarization-3.0

Set the token in the application when prompted or as an environment variable:

# Windows
set HF_TOKEN=your_token_here
# macOS/Linux
export HF_TOKEN=your_token_here

Memory Issues with Large Files

If you encounter memory issues with large files:

Use a smaller Whisper model (e.g., "base" instead of "large")
Reduce the GPU memory fraction in the application settings
Increase your system's swap space/virtual memory

Running the Application

After installation, run the application with:

streamlit run app.py

Optional: Ollama Setup for Local Summarization

To use Ollama for local summarization:

Install Ollama from ollama.ai
Pull a model:
```
ollama pull llama3
```
Uncomment the Ollama line in requirements.txt and install:
```
pip install ollama
```

Verifying Installation

To verify that all components are working correctly:

Run the application
Check that GPU acceleration is available (if applicable)
Test a small video file with basic transcription
Gradually enable advanced features like diarization and translation

If you encounter any issues, check the application logs for specific error messages.

3.7 KiB Raw Blame History