Files
saw_mill_knot_detection/LABELING_GUIDE.md

245 lines
7.2 KiB
Markdown
Raw Normal View History

# Image Labeling Guide for Knot Detection
## Quick Start: Label Studio (Recommended)
### 1. Install and Launch
```bash
# Install (outside your project venv is fine)
pip install label-studio
# Start the server
label-studio start
```
Open http://localhost:8080 in your browser.
### 2. Create Your Project
1. Click "Create Project"
2. Project Name: "Wood Knot Detection"
3. Data Import: Click "Upload Files" → select your wood images
4. Labeling Setup:
- Template: "Object Detection with Bounding Boxes"
- Add label: `knot` (or multiple types: `sound_knot`, `dead_knot`, etc.)
### 3. Label Images
**Keyboard Shortcuts (speeds up 3-5x):**
- `Alt + Click` = Create bounding box
- `Alt + R` = Select rectangle tool
- `Ctrl + Enter` = Submit and move to next
- `Ctrl + Z` = Undo
**Best Practices:**
- Draw boxes tight around each knot
- Include partial knots at image edges
- Label consistently (all knots, or only specific types)
- Take breaks every 30-50 images to maintain quality
### 4. Export to COCO Format
1. Click project name → **Export**
2. Format: **"COCO"**
3. Download the zip file
4. Extract and organize for RF-DETR:
```bash
# After extracting export.zip:
unzip export.zip -d exported/
# Organize into RF-DETR format
mkdir -p dataset/train dataset/valid dataset/test
# Move images and rename JSON
mv exported/images/* dataset/train/
mv exported/result.json dataset/train/_annotations.coco.json
# Split 80/10/10 manually or use a script
# Move ~10% of images + their annotations to valid/
# Move ~10% to test/
```
**Tip**: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The `validate_coco_dataset.py` script can help verify the structure.
---
## Alternative: CVAT (More Powerful)
### Setup with Docker
```bash
git clone https://github.com/opencv/cvat
cd cvat
docker compose up -d
```
Open http://localhost:8080 (default login: admin/admin)
### Features
- **Keyboard shortcuts**: `N` (new box), `Shift+arrows` (adjust box)
- **Interpolation**: Auto-label between frames (for video)
- **Team mode**: Multiple annotators on same project
- **Quality control**: Review mode for double-checking labels
### Export
1. Actions → Export task dataset
2. Format: "COCO 1.0"
3. Restructure files to match RF-DETR's expected format
---
## Alternative: labelImg (Desktop App)
### Quick Setup
```bash
pip install labelImg
labelImg /path/to/images
```
**Pros:**
- No web server needed
- Works offline
- Very simple interface
**Cons:**
- Exports Pascal VOC by default (not COCO)
- Need to convert format:
```bash
# Use roboflow or pylabel library to convert VOC → COCO
```
---
## Model-Assisted Labeling Workflow
After you have a trained model, speed up labeling 10-20x:
### Step 1: Auto-label new images
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
--weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \
--images-dir unlabeled_images/ \
--output-json predictions.json \
--threshold 0.3
```
**Use low threshold (0.3)** to capture more candidates - easier to delete false positives than add missed knots.
### Step 2: Convert to Label Studio format
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
--coco-json predictions.json \
--images-dir unlabeled_images/ \
--output-json label_studio_tasks.json
```
### Step 3: Import predictions into Label Studio
1. In Label Studio, open your project
2. **Settings → Storage → Add Source Storage → Local files**:
- Storage Type: Local files
- Absolute local path: `/full/path/to/unlabeled_images`
- Click "Add Storage" then "Sync Storage"
3. **Import → Upload Files** → select `label_studio_tasks.json`
4. Each image now loads with **pre-drawn boxes from your model**
5. Click through images, fixing mistakes (much faster than labeling from scratch!)
### Step 4: Active Learning Loop with Label Studio
1. **Initial labeling**: Label 50-100 images manually in Label Studio
2. **Export & prepare**:
```bash
# Export from Label Studio as COCO format
# Split into train/valid/test folders
```
3. **Train RF-DETR**: Run for just 10 epochs (faster iteration)
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \
--dataset-dir dataset/ \
--output-dir runs/iteration_1 \
--model medium \
--epochs 10
```
4. **Auto-label new batch**: Get predictions on 500 unlabeled images
```bash
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
--weights runs/iteration_1/checkpoint_best_total.pth \
--images-dir batch_2_images/ \
--output-json batch_2_predictions.json \
--threshold 0.3
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
--coco-json batch_2_predictions.json \
--images-dir batch_2_images/ \
--output-json batch_2_ls_tasks.json
```
5. **Review in Label Studio**: Import `batch_2_ls_tasks.json` → review/correct (10x faster than from scratch)
6. **Export & retrain**: Add corrected labels to dataset, retrain for 20-50 epochs
7. **Repeat**: Continue with batch 3, 4, etc.
This iterative approach typically achieves **95%+ accuracy with 5-10x less manual effort**.
---
## Tips for High-Quality Labels
### Consistency is Key
- **Same criteria every time**: Decide upfront if you label tiny knots, damaged areas, etc.
- **Box boundaries**: Tight around knot, or include some margin? Pick one and stick to it.
- **Occlusions**: Label partially visible knots? Document your decision.
### Speed vs. Quality
- **First 100 images**: Take your time, establish consistency
- **After 100**: Speed up - model will help catch inconsistencies later
- **Every 500 images**: Audit 20-30 random labels to check quality
### Common Mistakes
1. ❌ Inconsistent box sizes (sometimes tight, sometimes loose)
2. ❌ Missing small knots in some images but labeling them in others
3. ❌ Labeling knot-like wood grain patterns
4. ❌ Fatigue errors after 2+ hours - take breaks!
### Dataset Size Guidelines
- **Minimum**: 200 labeled images (split: 150 train, 30 valid, 20 test)
- **Good**: 500-1000 images
- **Excellent**: 2000+ images
- **With active learning**: Start with 100, grow to 500+ iteratively
---
## Converting Other Formats to COCO
If you have labels in another format:
### From YOLO format:
```python
from pylabel import importer
dataset = importer.ImportYoloV5(
path="yolo_labels/",
img_path="images/",
cat_names=['knot']
)
dataset.export.ExportToCoco(output_path="coco_format/")
```
### From Pascal VOC:
```python
from pylabel import importer
dataset = importer.ImportVOC(path="voc_annotations/")
dataset.export.ExportToCoco()
```
---
## Troubleshooting
**Label Studio won't start:**
- Try: `label-studio reset` then `label-studio start`
**CVAT Docker issues:**
- Check: `docker compose logs`
- Ensure ports 8080, 8070 are free
**Export format doesn't match RF-DETR:**
- See [validate_coco_dataset.py](validate_coco_dataset.py) to check your format
- Your JSON needs: `images`, `annotations`, `categories` keys
**Need help?**
- Label Studio docs: https://labelstud.io/guide/
- CVAT docs: https://opencv.github.io/cvat/docs/