TalkEdit/INSTALLATION.md

# Installation Guide for OBS Recording Transcriber

This guide will help you install all the necessary dependencies for the OBS Recording Transcriber application, including the advanced features from Phase 3.

## Prerequisites

Before installing the Python packages, you need to set up some prerequisites:

### 1. Python 3.8 or higher

Make sure you have Python 3.8 or higher installed. You can download it from [python.org](https://www.python.org/downloads/).

### 2. FFmpeg

FFmpeg is required for audio processing:

- **Windows**:
  - Download from [gyan.dev/ffmpeg/builds](https://www.gyan.dev/ffmpeg/builds/)
  - Extract the ZIP file
  - Add the `bin` folder to your system PATH

- **macOS**:
  ```bash
  brew install ffmpeg
  ```

- **Linux**:
  ```bash
  sudo apt update
  sudo apt install ffmpeg
  ```

### 3. Visual C++ Build Tools (Windows only)

Some packages like `tokenizers` require C++ build tools:

1. Download and install [Visual C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools/)
2. During installation, select "Desktop development with C++"

## Installation Steps

### 1. Create a Virtual Environment (Recommended)

```bash
# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
```

### 2. Install PyTorch

For better performance, install PyTorch with CUDA support if you have an NVIDIA GPU:

```bash
# Windows/Linux with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# macOS or CPU-only
pip install torch torchvision torchaudio
```

### 3. Install Dependencies

```bash
# Install all dependencies from requirements.txt
pip install -r requirements.txt
```

### 4. Troubleshooting Common Issues

#### Tokenizers Installation Issues

If you encounter issues with `tokenizers` installation:

1. Make sure you have Visual C++ Build Tools installed (Windows)
2. Try installing Rust: [rustup.rs](https://rustup.rs/)
3. Install tokenizers separately:
   ```bash
   pip install tokenizers --no-binary tokenizers
   ```

#### PyAnnote.Audio Access

To use speaker diarization, you need a HuggingFace token with access to the pyannote models:

1. Create an account on [HuggingFace](https://huggingface.co/)
2. Generate an access token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
3. Request access to [pyannote/speaker-diarization-3.0](https://huggingface.co/pyannote/speaker-diarization-3.0)
4. Set the token in the application when prompted or as an environment variable:
   ```bash
   # Windows
   set HF_TOKEN=your_token_here
   # macOS/Linux
   export HF_TOKEN=your_token_here
   ```

#### Memory Issues with Large Files

If you encounter memory issues with large files:

1. Use a smaller Whisper model (e.g., "base" instead of "large")
2. Reduce the GPU memory fraction in the application settings
3. Increase your system's swap space/virtual memory

## Running the Application

After installation, run the application with:

```bash
streamlit run app.py
```

## Optional: Ollama Setup for Local Summarization

To use Ollama for local summarization:

1. Install Ollama from [ollama.ai](https://ollama.ai/)
2. Pull a model:
   ```bash
   ollama pull llama3
   ```
3. Uncomment the Ollama line in requirements.txt and install:
   ```bash
   pip install ollama
   ```

## Verifying Installation

To verify that all components are working correctly:

1. Run the application
2. Check that GPU acceleration is available (if applicable)
3. Test a small video file with basic transcription
4. Gradually enable advanced features like diarization and translation

If you encounter any issues, check the application logs for specific error messages.