245 lines
7.2 KiB
Markdown
245 lines
7.2 KiB
Markdown
|
|
# Image Labeling Guide for Knot Detection
|
||
|
|
|
||
|
|
## Quick Start: Label Studio (Recommended)
|
||
|
|
|
||
|
|
### 1. Install and Launch
|
||
|
|
```bash
|
||
|
|
# Install (outside your project venv is fine)
|
||
|
|
pip install label-studio
|
||
|
|
|
||
|
|
# Start the server
|
||
|
|
label-studio start
|
||
|
|
```
|
||
|
|
|
||
|
|
Open http://localhost:8080 in your browser.
|
||
|
|
|
||
|
|
### 2. Create Your Project
|
||
|
|
1. Click "Create Project"
|
||
|
|
2. Project Name: "Wood Knot Detection"
|
||
|
|
3. Data Import: Click "Upload Files" → select your wood images
|
||
|
|
4. Labeling Setup:
|
||
|
|
- Template: "Object Detection with Bounding Boxes"
|
||
|
|
- Add label: `knot` (or multiple types: `sound_knot`, `dead_knot`, etc.)
|
||
|
|
|
||
|
|
### 3. Label Images
|
||
|
|
**Keyboard Shortcuts (speeds up 3-5x):**
|
||
|
|
- `Alt + Click` = Create bounding box
|
||
|
|
- `Alt + R` = Select rectangle tool
|
||
|
|
- `Ctrl + Enter` = Submit and move to next
|
||
|
|
- `Ctrl + Z` = Undo
|
||
|
|
|
||
|
|
**Best Practices:**
|
||
|
|
- Draw boxes tight around each knot
|
||
|
|
- Include partial knots at image edges
|
||
|
|
- Label consistently (all knots, or only specific types)
|
||
|
|
- Take breaks every 30-50 images to maintain quality
|
||
|
|
|
||
|
|
### 4. Export to COCO Format
|
||
|
|
1. Click project name → **Export**
|
||
|
|
2. Format: **"COCO"**
|
||
|
|
3. Download the zip file
|
||
|
|
4. Extract and organize for RF-DETR:
|
||
|
|
```bash
|
||
|
|
# After extracting export.zip:
|
||
|
|
unzip export.zip -d exported/
|
||
|
|
|
||
|
|
# Organize into RF-DETR format
|
||
|
|
mkdir -p dataset/train dataset/valid dataset/test
|
||
|
|
|
||
|
|
# Move images and rename JSON
|
||
|
|
mv exported/images/* dataset/train/
|
||
|
|
mv exported/result.json dataset/train/_annotations.coco.json
|
||
|
|
|
||
|
|
# Split 80/10/10 manually or use a script
|
||
|
|
# Move ~10% of images + their annotations to valid/
|
||
|
|
# Move ~10% to test/
|
||
|
|
```
|
||
|
|
|
||
|
|
**Tip**: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The `validate_coco_dataset.py` script can help verify the structure.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternative: CVAT (More Powerful)
|
||
|
|
|
||
|
|
### Setup with Docker
|
||
|
|
```bash
|
||
|
|
git clone https://github.com/opencv/cvat
|
||
|
|
cd cvat
|
||
|
|
docker compose up -d
|
||
|
|
```
|
||
|
|
|
||
|
|
Open http://localhost:8080 (default login: admin/admin)
|
||
|
|
|
||
|
|
### Features
|
||
|
|
- **Keyboard shortcuts**: `N` (new box), `Shift+arrows` (adjust box)
|
||
|
|
- **Interpolation**: Auto-label between frames (for video)
|
||
|
|
- **Team mode**: Multiple annotators on same project
|
||
|
|
- **Quality control**: Review mode for double-checking labels
|
||
|
|
|
||
|
|
### Export
|
||
|
|
1. Actions → Export task dataset
|
||
|
|
2. Format: "COCO 1.0"
|
||
|
|
3. Restructure files to match RF-DETR's expected format
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternative: labelImg (Desktop App)
|
||
|
|
|
||
|
|
### Quick Setup
|
||
|
|
```bash
|
||
|
|
pip install labelImg
|
||
|
|
labelImg /path/to/images
|
||
|
|
```
|
||
|
|
|
||
|
|
**Pros:**
|
||
|
|
- No web server needed
|
||
|
|
- Works offline
|
||
|
|
- Very simple interface
|
||
|
|
|
||
|
|
**Cons:**
|
||
|
|
- Exports Pascal VOC by default (not COCO)
|
||
|
|
- Need to convert format:
|
||
|
|
```bash
|
||
|
|
# Use roboflow or pylabel library to convert VOC → COCO
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Model-Assisted Labeling Workflow
|
||
|
|
|
||
|
|
After you have a trained model, speed up labeling 10-20x:
|
||
|
|
|
||
|
|
### Step 1: Auto-label new images
|
||
|
|
```bash
|
||
|
|
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
|
||
|
|
--weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \
|
||
|
|
--images-dir unlabeled_images/ \
|
||
|
|
--output-json predictions.json \
|
||
|
|
--threshold 0.3
|
||
|
|
```
|
||
|
|
|
||
|
|
**Use low threshold (0.3)** to capture more candidates - easier to delete false positives than add missed knots.
|
||
|
|
|
||
|
|
### Step 2: Convert to Label Studio format
|
||
|
|
```bash
|
||
|
|
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
|
||
|
|
--coco-json predictions.json \
|
||
|
|
--images-dir unlabeled_images/ \
|
||
|
|
--output-json label_studio_tasks.json
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 3: Import predictions into Label Studio
|
||
|
|
1. In Label Studio, open your project
|
||
|
|
2. **Settings → Storage → Add Source Storage → Local files**:
|
||
|
|
- Storage Type: Local files
|
||
|
|
- Absolute local path: `/full/path/to/unlabeled_images`
|
||
|
|
- Click "Add Storage" then "Sync Storage"
|
||
|
|
3. **Import → Upload Files** → select `label_studio_tasks.json`
|
||
|
|
4. Each image now loads with **pre-drawn boxes from your model**
|
||
|
|
5. Click through images, fixing mistakes (much faster than labeling from scratch!)
|
||
|
|
|
||
|
|
### Step 4: Active Learning Loop with Label Studio
|
||
|
|
1. **Initial labeling**: Label 50-100 images manually in Label Studio
|
||
|
|
2. **Export & prepare**:
|
||
|
|
```bash
|
||
|
|
# Export from Label Studio as COCO format
|
||
|
|
# Split into train/valid/test folders
|
||
|
|
```
|
||
|
|
3. **Train RF-DETR**: Run for just 10 epochs (faster iteration)
|
||
|
|
```bash
|
||
|
|
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \
|
||
|
|
--dataset-dir dataset/ \
|
||
|
|
--output-dir runs/iteration_1 \
|
||
|
|
--model medium \
|
||
|
|
--epochs 10
|
||
|
|
```
|
||
|
|
4. **Auto-label new batch**: Get predictions on 500 unlabeled images
|
||
|
|
```bash
|
||
|
|
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
|
||
|
|
--weights runs/iteration_1/checkpoint_best_total.pth \
|
||
|
|
--images-dir batch_2_images/ \
|
||
|
|
--output-json batch_2_predictions.json \
|
||
|
|
--threshold 0.3
|
||
|
|
|
||
|
|
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
|
||
|
|
--coco-json batch_2_predictions.json \
|
||
|
|
--images-dir batch_2_images/ \
|
||
|
|
--output-json batch_2_ls_tasks.json
|
||
|
|
```
|
||
|
|
5. **Review in Label Studio**: Import `batch_2_ls_tasks.json` → review/correct (10x faster than from scratch)
|
||
|
|
6. **Export & retrain**: Add corrected labels to dataset, retrain for 20-50 epochs
|
||
|
|
7. **Repeat**: Continue with batch 3, 4, etc.
|
||
|
|
|
||
|
|
This iterative approach typically achieves **95%+ accuracy with 5-10x less manual effort**.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Tips for High-Quality Labels
|
||
|
|
|
||
|
|
### Consistency is Key
|
||
|
|
- **Same criteria every time**: Decide upfront if you label tiny knots, damaged areas, etc.
|
||
|
|
- **Box boundaries**: Tight around knot, or include some margin? Pick one and stick to it.
|
||
|
|
- **Occlusions**: Label partially visible knots? Document your decision.
|
||
|
|
|
||
|
|
### Speed vs. Quality
|
||
|
|
- **First 100 images**: Take your time, establish consistency
|
||
|
|
- **After 100**: Speed up - model will help catch inconsistencies later
|
||
|
|
- **Every 500 images**: Audit 20-30 random labels to check quality
|
||
|
|
|
||
|
|
### Common Mistakes
|
||
|
|
1. ❌ Inconsistent box sizes (sometimes tight, sometimes loose)
|
||
|
|
2. ❌ Missing small knots in some images but labeling them in others
|
||
|
|
3. ❌ Labeling knot-like wood grain patterns
|
||
|
|
4. ❌ Fatigue errors after 2+ hours - take breaks!
|
||
|
|
|
||
|
|
### Dataset Size Guidelines
|
||
|
|
- **Minimum**: 200 labeled images (split: 150 train, 30 valid, 20 test)
|
||
|
|
- **Good**: 500-1000 images
|
||
|
|
- **Excellent**: 2000+ images
|
||
|
|
- **With active learning**: Start with 100, grow to 500+ iteratively
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Converting Other Formats to COCO
|
||
|
|
|
||
|
|
If you have labels in another format:
|
||
|
|
|
||
|
|
### From YOLO format:
|
||
|
|
```python
|
||
|
|
from pylabel import importer
|
||
|
|
|
||
|
|
dataset = importer.ImportYoloV5(
|
||
|
|
path="yolo_labels/",
|
||
|
|
img_path="images/",
|
||
|
|
cat_names=['knot']
|
||
|
|
)
|
||
|
|
dataset.export.ExportToCoco(output_path="coco_format/")
|
||
|
|
```
|
||
|
|
|
||
|
|
### From Pascal VOC:
|
||
|
|
```python
|
||
|
|
from pylabel import importer
|
||
|
|
|
||
|
|
dataset = importer.ImportVOC(path="voc_annotations/")
|
||
|
|
dataset.export.ExportToCoco()
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
**Label Studio won't start:**
|
||
|
|
- Try: `label-studio reset` then `label-studio start`
|
||
|
|
|
||
|
|
**CVAT Docker issues:**
|
||
|
|
- Check: `docker compose logs`
|
||
|
|
- Ensure ports 8080, 8070 are free
|
||
|
|
|
||
|
|
**Export format doesn't match RF-DETR:**
|
||
|
|
- See [validate_coco_dataset.py](validate_coco_dataset.py) to check your format
|
||
|
|
- Your JSON needs: `images`, `annotations`, `categories` keys
|
||
|
|
|
||
|
|
**Need help?**
|
||
|
|
- Label Studio docs: https://labelstud.io/guide/
|
||
|
|
- CVAT docs: https://opencv.github.io/cvat/docs/
|