# Installation Guide for OBS Recording Transcriber This guide will help you install all the necessary dependencies for the OBS Recording Transcriber application, including the advanced features from Phase 3. ## Prerequisites Before installing the Python packages, you need to set up some prerequisites: ### 1. Python 3.8 or higher Make sure you have Python 3.8 or higher installed. You can download it from [python.org](https://www.python.org/downloads/). ### 2. FFmpeg FFmpeg is required for audio processing: - **Windows**: - Download from [gyan.dev/ffmpeg/builds](https://www.gyan.dev/ffmpeg/builds/) - Extract the ZIP file - Add the `bin` folder to your system PATH - **macOS**: ```bash brew install ffmpeg ``` - **Linux**: ```bash sudo apt update sudo apt install ffmpeg ``` ### 3. Visual C++ Build Tools (Windows only) Some packages like `tokenizers` require C++ build tools: 1. Download and install [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/) 2. During installation, select "Desktop development with C++" ## Installation Steps ### 1. Create a Virtual Environment (Recommended) ```bash # Create a virtual environment python -m venv venv # Activate the virtual environment # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate ``` ### 2. Install PyTorch For better performance, install PyTorch with CUDA support if you have an NVIDIA GPU: ```bash # Windows/Linux with CUDA pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # macOS or CPU-only pip install torch torchvision torchaudio ``` ### 3. Install Dependencies ```bash # Install all dependencies from requirements.txt pip install -r requirements.txt ``` ### 4. Troubleshooting Common Issues #### Tokenizers Installation Issues If you encounter issues with `tokenizers` installation: 1. Make sure you have Visual C++ Build Tools installed (Windows) 2. Try installing Rust: [rustup.rs](https://rustup.rs/) 3. Install tokenizers separately: ```bash pip install tokenizers --no-binary tokenizers ``` #### PyAnnote.Audio Access To use speaker diarization, you need a HuggingFace token with access to the pyannote models: 1. Create an account on [HuggingFace](https://huggingface.co/) 2. Generate an access token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) 3. Request access to [pyannote/speaker-diarization-3.0](https://huggingface.co/pyannote/speaker-diarization-3.0) 4. Set the token in the application when prompted or as an environment variable: ```bash # Windows set HF_TOKEN=your_token_here # macOS/Linux export HF_TOKEN=your_token_here ``` #### Memory Issues with Large Files If you encounter memory issues with large files: 1. Use a smaller Whisper model (e.g., "base" instead of "large") 2. Reduce the GPU memory fraction in the application settings 3. Increase your system's swap space/virtual memory ## Running the Application After installation, run the application with: ```bash streamlit run app.py ``` ## Optional: Ollama Setup for Local Summarization To use Ollama for local summarization: 1. Install Ollama from [ollama.ai](https://ollama.ai/) 2. Pull a model: ```bash ollama pull llama3 ``` 3. Uncomment the Ollama line in requirements.txt and install: ```bash pip install ollama ``` ## Verifying Installation To verify that all components are working correctly: 1. Run the application 2. Check that GPU acceleration is available (if applicable) 3. Test a small video file with basic transcription 4. Gradually enable advanced features like diarization and translation If you encounter any issues, check the application logs for specific error messages.