saw_mill_knot_detection/LABELING_GUIDE.md

# Image Labeling Guide for Knot Detection

## Quick Start: Label Studio (Recommended)

### 1. Install and Launch
```bash
# Install (outside your project venv is fine)
pip install label-studio

# Start the server
label-studio start
```

Open http://localhost:8080 in your browser.

### 2. Create Your Project
1. Click "Create Project"
2. Project Name: "Wood Knot Detection"
3. Data Import: Click "Upload Files" → select your wood images
4. Labeling Setup:
   - Template: "Object Detection with Bounding Boxes"
   - Add label: `knot` (or multiple types: `sound_knot`, `dead_knot`, etc.)

### 3. Label Images
**Keyboard Shortcuts (speeds up 3-5x):**
- `Alt + Click` = Create bounding box
- `Alt + R` = Select rectangle tool
- `Ctrl + Enter` = Submit and move to next
- `Ctrl + Z` = Undo

**Best Practices:**
- Draw boxes tight around each knot
- Include partial knots at image edges
- Label consistently (all knots, or only specific types)
- Take breaks every 30-50 images to maintain quality

### 4. Export to COCO Format
1. Click project name → **Export**
2. Format: **"COCO"**
3. Download the zip file
4. Extract and organize for RF-DETR:
   ```bash
   # After extracting export.zip:
   unzip export.zip -d exported/
   
   # Organize into RF-DETR format
   mkdir -p dataset/train dataset/valid dataset/test
   
   # Move images and rename JSON
   mv exported/images/* dataset/train/
   mv exported/result.json dataset/train/_annotations.coco.json
   
   # Split 80/10/10 manually or use a script
   # Move ~10% of images + their annotations to valid/
   # Move ~10% to test/
   ```

**Tip**: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The `validate_coco_dataset.py` script can help verify the structure.

---

## Alternative: CVAT (More Powerful)

### Setup with Docker
```bash
git clone https://github.com/opencv/cvat
cd cvat
docker compose up -d
```

Open http://localhost:8080 (default login: admin/admin)

### Features
- **Keyboard shortcuts**: `N` (new box), `Shift+arrows` (adjust box)
- **Interpolation**: Auto-label between frames (for video)
- **Team mode**: Multiple annotators on same project
- **Quality control**: Review mode for double-checking labels

### Export
1. Actions → Export task dataset
2. Format: "COCO 1.0"
3. Restructure files to match RF-DETR's expected format

---

## Alternative: labelImg (Desktop App)

### Quick Setup
```bash
pip install labelImg
labelImg /path/to/images
```

**Pros:**
- No web server needed
- Works offline
- Very simple interface

**Cons:**
- Exports Pascal VOC by default (not COCO)
- Need to convert format:
  ```bash
  # Use roboflow or pylabel library to convert VOC → COCO
  ```

---

## Model-Assisted Labeling Workflow

After you have a trained model, speed up labeling 10-20x:

### Step 1: Auto-label new images
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
  --weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \
  --images-dir unlabeled_images/ \
  --output-json predictions.json \
  --threshold 0.3
```

**Use low threshold (0.3)** to capture more candidates - easier to delete false positives than add missed knots.

### Step 2: Convert to Label Studio format
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
  --coco-json predictions.json \
  --images-dir unlabeled_images/ \
  --output-json label_studio_tasks.json
```

### Step 3: Import predictions into Label Studio
1. In Label Studio, open your project
2. **Settings → Storage → Add Source Storage → Local files**:
   - Storage Type: Local files
   - Absolute local path: `/full/path/to/unlabeled_images`
   - Click "Add Storage" then "Sync Storage"
3. **Import → Upload Files** → select `label_studio_tasks.json`
4. Each image now loads with **pre-drawn boxes from your model**
5. Click through images, fixing mistakes (much faster than labeling from scratch!)

### Step 4: Active Learning Loop with Label Studio
1. **Initial labeling**: Label 50-100 images manually in Label Studio
2. **Export & prepare**:
   ```bash
   # Export from Label Studio as COCO format
   # Split into train/valid/test folders
   ```
3. **Train RF-DETR**: Run for just 10 epochs (faster iteration)
   ```bash
   /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \
     --dataset-dir dataset/ \
     --output-dir runs/iteration_1 \
     --model medium \
     --epochs 10
   ```
4. **Auto-label new batch**: Get predictions on 500 unlabeled images
   ```bash
   /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
     --weights runs/iteration_1/checkpoint_best_total.pth \
     --images-dir batch_2_images/ \
     --output-json batch_2_predictions.json \
     --threshold 0.3
   
   /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
     --coco-json batch_2_predictions.json \
     --images-dir batch_2_images/ \
     --output-json batch_2_ls_tasks.json
   ```
5. **Review in Label Studio**: Import `batch_2_ls_tasks.json` → review/correct (10x faster than from scratch)
6. **Export & retrain**: Add corrected labels to dataset, retrain for 20-50 epochs
7. **Repeat**: Continue with batch 3, 4, etc.

This iterative approach typically achieves **95%+ accuracy with 5-10x less manual effort**.

---

## Tips for High-Quality Labels

### Consistency is Key
- **Same criteria every time**: Decide upfront if you label tiny knots, damaged areas, etc.
- **Box boundaries**: Tight around knot, or include some margin? Pick one and stick to it.
- **Occlusions**: Label partially visible knots? Document your decision.

### Speed vs. Quality
- **First 100 images**: Take your time, establish consistency
- **After 100**: Speed up - model will help catch inconsistencies later
- **Every 500 images**: Audit 20-30 random labels to check quality

### Common Mistakes
1. ❌ Inconsistent box sizes (sometimes tight, sometimes loose)
2. ❌ Missing small knots in some images but labeling them in others
3. ❌ Labeling knot-like wood grain patterns
4. ❌ Fatigue errors after 2+ hours - take breaks!

### Dataset Size Guidelines
- **Minimum**: 200 labeled images (split: 150 train, 30 valid, 20 test)
- **Good**: 500-1000 images
- **Excellent**: 2000+ images
- **With active learning**: Start with 100, grow to 500+ iteratively

---

## Converting Other Formats to COCO

If you have labels in another format:

### From YOLO format:
```python
from pylabel import importer

dataset = importer.ImportYoloV5(
    path="yolo_labels/",
    img_path="images/",
    cat_names=['knot']
)
dataset.export.ExportToCoco(output_path="coco_format/")
```

### From Pascal VOC:
```python
from pylabel import importer

dataset = importer.ImportVOC(path="voc_annotations/")
dataset.export.ExportToCoco()
```

---

## Troubleshooting

**Label Studio won't start:**
- Try: `label-studio reset` then `label-studio start`

**CVAT Docker issues:**
- Check: `docker compose logs`
- Ensure ports 8080, 8070 are free

**Export format doesn't match RF-DETR:**
- See [validate_coco_dataset.py](validate_coco_dataset.py) to check your format
- Your JSON needs: `images`, `annotations`, `categories` keys

**Need help?**
- Label Studio docs: https://labelstud.io/guide/
- CVAT docs: https://opencv.github.io/cvat/docs/
Initial commit: Wood knot detection model and GUI 2025-12-22 14:11:39 -07:00			`# Image Labeling Guide for Knot Detection`

			`## Quick Start: Label Studio (Recommended)`

			`### 1. Install and Launch`
			```bash
			`# Install (outside your project venv is fine)`
			`pip install label-studio`

			`# Start the server`
			`label-studio start`
			```

			`Open http://localhost:8080 in your browser.`

			`### 2. Create Your Project`
			`1. Click "Create Project"`
			`2. Project Name: "Wood Knot Detection"`
			`3. Data Import: Click "Upload Files" → select your wood images`
			`4. Labeling Setup:`
			`- Template: "Object Detection with Bounding Boxes"`
			- Add label: `knot` (or multiple types: `sound_knot`, `dead_knot`, etc.)

			`### 3. Label Images`
			`Keyboard Shortcuts (speeds up 3-5x):`
			- `Alt + Click` = Create bounding box
			- `Alt + R` = Select rectangle tool
			- `Ctrl + Enter` = Submit and move to next
			- `Ctrl + Z` = Undo

			`Best Practices:`
			`- Draw boxes tight around each knot`
			`- Include partial knots at image edges`
			`- Label consistently (all knots, or only specific types)`
			`- Take breaks every 30-50 images to maintain quality`

			`### 4. Export to COCO Format`
			`1. Click project name → Export`
			`2. Format: "COCO"`
			`3. Download the zip file`
			`4. Extract and organize for RF-DETR:`
			```bash
			`# After extracting export.zip:`
			`unzip export.zip -d exported/`

			`# Organize into RF-DETR format`
			`mkdir -p dataset/train dataset/valid dataset/test`

			`# Move images and rename JSON`
			`mv exported/images/* dataset/train/`
			`mv exported/result.json dataset/train/_annotations.coco.json`

			`# Split 80/10/10 manually or use a script`
			`# Move ~10% of images + their annotations to valid/`
			`# Move ~10% to test/`
			```

			Tip: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The `validate_coco_dataset.py` script can help verify the structure.

			`---`

			`## Alternative: CVAT (More Powerful)`

			`### Setup with Docker`
			```bash
			`git clone https://github.com/opencv/cvat`
			`cd cvat`
			`docker compose up -d`
			```

			`Open http://localhost:8080 (default login: admin/admin)`

			`### Features`
			- Keyboard shortcuts: `N` (new box), `Shift+arrows` (adjust box)
			`- Interpolation: Auto-label between frames (for video)`
			`- Team mode: Multiple annotators on same project`
			`- Quality control: Review mode for double-checking labels`

			`### Export`
			`1. Actions → Export task dataset`
			`2. Format: "COCO 1.0"`
			`3. Restructure files to match RF-DETR's expected format`

			`---`

			`## Alternative: labelImg (Desktop App)`

			`### Quick Setup`
			```bash
			`pip install labelImg`
			`labelImg /path/to/images`
			```

			`Pros:`
			`- No web server needed`
			`- Works offline`
			`- Very simple interface`

			`Cons:`
			`- Exports Pascal VOC by default (not COCO)`
			`- Need to convert format:`
			```bash
			`# Use roboflow or pylabel library to convert VOC → COCO`
			```

			`---`

			`## Model-Assisted Labeling Workflow`

			`After you have a trained model, speed up labeling 10-20x:`

			`### Step 1: Auto-label new images`
			```bash
			`/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \`
			`--weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \`
			`--images-dir unlabeled_images/ \`
			`--output-json predictions.json \`
			`--threshold 0.3`
			```

			`Use low threshold (0.3) to capture more candidates - easier to delete false positives than add missed knots.`

			`### Step 2: Convert to Label Studio format`
			```bash
			`/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \`
			`--coco-json predictions.json \`
			`--images-dir unlabeled_images/ \`
			`--output-json label_studio_tasks.json`
			```

			`### Step 3: Import predictions into Label Studio`
			`1. In Label Studio, open your project`
			`2. Settings → Storage → Add Source Storage → Local files:`
			`- Storage Type: Local files`
			- Absolute local path: `/full/path/to/unlabeled_images`
			`- Click "Add Storage" then "Sync Storage"`
			3. Import → Upload Files → select `label_studio_tasks.json`
			`4. Each image now loads with pre-drawn boxes from your model`
			`5. Click through images, fixing mistakes (much faster than labeling from scratch!)`

			`### Step 4: Active Learning Loop with Label Studio`
			`1. Initial labeling: Label 50-100 images manually in Label Studio`
			`2. Export & prepare:`
			```bash
			`# Export from Label Studio as COCO format`
			`# Split into train/valid/test folders`
			```
			`3. Train RF-DETR: Run for just 10 epochs (faster iteration)`
			```bash
			`/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \`
			`--dataset-dir dataset/ \`
			`--output-dir runs/iteration_1 \`
			`--model medium \`
			`--epochs 10`
			```
			`4. Auto-label new batch: Get predictions on 500 unlabeled images`
			```bash
			`/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \`
			`--weights runs/iteration_1/checkpoint_best_total.pth \`
			`--images-dir batch_2_images/ \`
			`--output-json batch_2_predictions.json \`
			`--threshold 0.3`

			`/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \`
			`--coco-json batch_2_predictions.json \`
			`--images-dir batch_2_images/ \`
			`--output-json batch_2_ls_tasks.json`
			```
			5. Review in Label Studio: Import `batch_2_ls_tasks.json` → review/correct (10x faster than from scratch)
			`6. Export & retrain: Add corrected labels to dataset, retrain for 20-50 epochs`
			`7. Repeat: Continue with batch 3, 4, etc.`

			`This iterative approach typically achieves 95%+ accuracy with 5-10x less manual effort.`

			`---`

			`## Tips for High-Quality Labels`

			`### Consistency is Key`
			`- Same criteria every time: Decide upfront if you label tiny knots, damaged areas, etc.`
			`- Box boundaries: Tight around knot, or include some margin? Pick one and stick to it.`
			`- Occlusions: Label partially visible knots? Document your decision.`

			`### Speed vs. Quality`
			`- First 100 images: Take your time, establish consistency`
			`- After 100: Speed up - model will help catch inconsistencies later`
			`- Every 500 images: Audit 20-30 random labels to check quality`

			`### Common Mistakes`
			`1. ❌ Inconsistent box sizes (sometimes tight, sometimes loose)`
			`2. ❌ Missing small knots in some images but labeling them in others`
			`3. ❌ Labeling knot-like wood grain patterns`
			`4. ❌ Fatigue errors after 2+ hours - take breaks!`

			`### Dataset Size Guidelines`
			`- Minimum: 200 labeled images (split: 150 train, 30 valid, 20 test)`
			`- Good: 500-1000 images`
			`- Excellent: 2000+ images`
			`- With active learning: Start with 100, grow to 500+ iteratively`

			`---`

			`## Converting Other Formats to COCO`

			`If you have labels in another format:`

			`### From YOLO format:`
			```python
			`from pylabel import importer`

			`dataset = importer.ImportYoloV5(`
			`path="yolo_labels/",`
			`img_path="images/",`
			`cat_names=['knot']`
			`)`
			`dataset.export.ExportToCoco(output_path="coco_format/")`
			```

			`### From Pascal VOC:`
			```python
			`from pylabel import importer`

			`dataset = importer.ImportVOC(path="voc_annotations/")`
			`dataset.export.ExportToCoco()`
			```

			`---`

			`## Troubleshooting`

			`Label Studio won't start:`
			- Try: `label-studio reset` then `label-studio start`

			`CVAT Docker issues:`
			- Check: `docker compose logs`
			`- Ensure ports 8080, 8070 are free`

			`Export format doesn't match RF-DETR:`
			`- See [validate_coco_dataset.py](validate_coco_dataset.py) to check your format`
			- Your JSON needs: `images`, `annotations`, `categories` keys

			`Need help?`
			`- Label Studio docs: https://labelstud.io/guide/`
			`- CVAT docs: https://opencv.github.io/cvat/docs/`