# Image Labeling Guide for Knot Detection

## Quick Start: Label Studio (Recommended)

### 1. Install and Launch
```bash
# Install (outside your project venv is fine)
pip install label-studio

# Start the server
label-studio start
```

Open http://localhost:8080 in your browser.

### 2. Create Your Project
1. Click "Create Project"
2. Project Name: "Wood Knot Detection"
3. Data Import: Click "Upload Files" → select your wood images
4. Labeling Setup:
   - Template: "Object Detection with Bounding Boxes"
   - Add label: `knot` (or multiple types: `sound_knot`, `dead_knot`, etc.)

### 3. Label Images
**Keyboard Shortcuts (speeds up 3-5x):**
- `Alt + Click` = Create bounding box
- `Alt + R` = Select rectangle tool
- `Ctrl + Enter` = Submit and move to next
- `Ctrl + Z` = Undo

**Best Practices:**
- Draw boxes tight around each knot
- Include partial knots at image edges
- Label consistently (all knots, or only specific types)
- Take breaks every 30-50 images to maintain quality

### 4. Export to COCO Format
1. Click project name → **Export**
2. Format: **"COCO"**
3. Download the zip file
4. Extract and organize for RF-DETR:
   ```bash
   # After extracting export.zip:
   unzip export.zip -d exported/
   
   # Organize into RF-DETR format
   mkdir -p dataset/train dataset/valid dataset/test
   
   # Move images and rename JSON
   mv exported/images/* dataset/train/
   mv exported/result.json dataset/train/_annotations.coco.json
   
   # Split 80/10/10 manually or use a script
   # Move ~10% of images + their annotations to valid/
   # Move ~10% to test/
   ```

**Tip**: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The `validate_coco_dataset.py` script can help verify the structure.

---

## Alternative: CVAT (More Powerful)

### Setup with Docker
```bash
git clone https://github.com/opencv/cvat
cd cvat
docker compose up -d
```

Open http://localhost:8080 (default login: admin/admin)

### Features
- **Keyboard shortcuts**: `N` (new box), `Shift+arrows` (adjust box)
- **Interpolation**: Auto-label between frames (for video)
- **Team mode**: Multiple annotators on same project
- **Quality control**: Review mode for double-checking labels

### Export
1. Actions → Export task dataset
2. Format: "COCO 1.0"
3. Restructure files to match RF-DETR's expected format

---

## Alternative: labelImg (Desktop App)

### Quick Setup
```bash
pip install labelImg
labelImg /path/to/images
```

**Pros:**
- No web server needed
- Works offline
- Very simple interface

**Cons:**
- Exports Pascal VOC by default (not COCO)
- Need to convert format:
  ```bash
  # Use roboflow or pylabel library to convert VOC → COCO
  ```

---

## Model-Assisted Labeling Workflow

After you have a trained model, speed up labeling 10-20x:

### Step 1: Auto-label new images
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
  --weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \
  --images-dir unlabeled_images/ \
  --output-json predictions.json \
  --threshold 0.3
```

**Use low threshold (0.3)** to capture more candidates - easier to delete false positives than add missed knots.

### Step 2: Convert to Label Studio format
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
  --coco-json predictions.json \
  --images-dir unlabeled_images/ \
  --output-json label_studio_tasks.json
```

### Step 3: Import predictions into Label Studio
1. In Label Studio, open your project
2. **Settings → Storage → Add Source Storage → Local files**:
   - Storage Type: Local files
   - Absolute local path: `/full/path/to/unlabeled_images`
   - Click "Add Storage" then "Sync Storage"
3. **Import → Upload Files** → select `label_studio_tasks.json`
4. Each image now loads with **pre-drawn boxes from your model**
5. Click through images, fixing mistakes (much faster than labeling from scratch!)

### Step 4: Active Learning Loop with Label Studio
1. **Initial labeling**: Label 50-100 images manually in Label Studio
2. **Export & prepare**:
   ```bash
   # Export from Label Studio as COCO format
   # Split into train/valid/test folders
   ```
3. **Train RF-DETR**: Run for just 10 epochs (faster iteration)
   ```bash
   /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \
     --dataset-dir dataset/ \
     --output-dir runs/iteration_1 \
     --model medium \
     --epochs 10
   ```
4. **Auto-label new batch**: Get predictions on 500 unlabeled images
   ```bash
   /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
     --weights runs/iteration_1/checkpoint_best_total.pth \
     --images-dir batch_2_images/ \
     --output-json batch_2_predictions.json \
     --threshold 0.3
   
   /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
     --coco-json batch_2_predictions.json \
     --images-dir batch_2_images/ \
     --output-json batch_2_ls_tasks.json
   ```
5. **Review in Label Studio**: Import `batch_2_ls_tasks.json` → review/correct (10x faster than from scratch)
6. **Export & retrain**: Add corrected labels to dataset, retrain for 20-50 epochs
7. **Repeat**: Continue with batch 3, 4, etc.

This iterative approach typically achieves **95%+ accuracy with 5-10x less manual effort**.

---

## Tips for High-Quality Labels

### Consistency is Key
- **Same criteria every time**: Decide upfront if you label tiny knots, damaged areas, etc.
- **Box boundaries**: Tight around knot, or include some margin? Pick one and stick to it.
- **Occlusions**: Label partially visible knots? Document your decision.

### Speed vs. Quality
- **First 100 images**: Take your time, establish consistency
- **After 100**: Speed up - model will help catch inconsistencies later
- **Every 500 images**: Audit 20-30 random labels to check quality

### Common Mistakes
1. ❌ Inconsistent box sizes (sometimes tight, sometimes loose)
2. ❌ Missing small knots in some images but labeling them in others
3. ❌ Labeling knot-like wood grain patterns
4. ❌ Fatigue errors after 2+ hours - take breaks!

### Dataset Size Guidelines
- **Minimum**: 200 labeled images (split: 150 train, 30 valid, 20 test)
- **Good**: 500-1000 images
- **Excellent**: 2000+ images
- **With active learning**: Start with 100, grow to 500+ iteratively

---

## Converting Other Formats to COCO

If you have labels in another format:

### From YOLO format:
```python
from pylabel import importer

dataset = importer.ImportYoloV5(
    path="yolo_labels/",
    img_path="images/",
    cat_names=['knot']
)
dataset.export.ExportToCoco(output_path="coco_format/")
```

### From Pascal VOC:
```python
from pylabel import importer

dataset = importer.ImportVOC(path="voc_annotations/")
dataset.export.ExportToCoco()
```

---

## Troubleshooting

**Label Studio won't start:**
- Try: `label-studio reset` then `label-studio start`

**CVAT Docker issues:**
- Check: `docker compose logs`
- Ensure ports 8080, 8070 are free

**Export format doesn't match RF-DETR:**
- See [validate_coco_dataset.py](validate_coco_dataset.py) to check your format
- Your JSON needs: `images`, `annotations`, `categories` keys

**Need help?**
- Label Studio docs: https://labelstud.io/guide/
- CVAT docs: https://opencv.github.io/cvat/docs/