7.2 KiB
7.2 KiB
Image Labeling Guide for Knot Detection
Quick Start: Label Studio (Recommended)
1. Install and Launch
# Install (outside your project venv is fine)
pip install label-studio
# Start the server
label-studio start
Open http://localhost:8080 in your browser.
2. Create Your Project
- Click "Create Project"
- Project Name: "Wood Knot Detection"
- Data Import: Click "Upload Files" → select your wood images
- Labeling Setup:
- Template: "Object Detection with Bounding Boxes"
- Add label:
knot(or multiple types:sound_knot,dead_knot, etc.)
3. Label Images
Keyboard Shortcuts (speeds up 3-5x):
Alt + Click= Create bounding boxAlt + R= Select rectangle toolCtrl + Enter= Submit and move to nextCtrl + Z= Undo
Best Practices:
- Draw boxes tight around each knot
- Include partial knots at image edges
- Label consistently (all knots, or only specific types)
- Take breaks every 30-50 images to maintain quality
4. Export to COCO Format
- Click project name → Export
- Format: "COCO"
- Download the zip file
- Extract and organize for RF-DETR:
# After extracting export.zip: unzip export.zip -d exported/ # Organize into RF-DETR format mkdir -p dataset/train dataset/valid dataset/test # Move images and rename JSON mv exported/images/* dataset/train/ mv exported/result.json dataset/train/_annotations.coco.json # Split 80/10/10 manually or use a script # Move ~10% of images + their annotations to valid/ # Move ~10% to test/
Tip: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The validate_coco_dataset.py script can help verify the structure.
Alternative: CVAT (More Powerful)
Setup with Docker
git clone https://github.com/opencv/cvat
cd cvat
docker compose up -d
Open http://localhost:8080 (default login: admin/admin)
Features
- Keyboard shortcuts:
N(new box),Shift+arrows(adjust box) - Interpolation: Auto-label between frames (for video)
- Team mode: Multiple annotators on same project
- Quality control: Review mode for double-checking labels
Export
- Actions → Export task dataset
- Format: "COCO 1.0"
- Restructure files to match RF-DETR's expected format
Alternative: labelImg (Desktop App)
Quick Setup
pip install labelImg
labelImg /path/to/images
Pros:
- No web server needed
- Works offline
- Very simple interface
Cons:
- Exports Pascal VOC by default (not COCO)
- Need to convert format:
# Use roboflow or pylabel library to convert VOC → COCO
Model-Assisted Labeling Workflow
After you have a trained model, speed up labeling 10-20x:
Step 1: Auto-label new images
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
--weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \
--images-dir unlabeled_images/ \
--output-json predictions.json \
--threshold 0.3
Use low threshold (0.3) to capture more candidates - easier to delete false positives than add missed knots.
Step 2: Convert to Label Studio format
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
--coco-json predictions.json \
--images-dir unlabeled_images/ \
--output-json label_studio_tasks.json
Step 3: Import predictions into Label Studio
- In Label Studio, open your project
- Settings → Storage → Add Source Storage → Local files:
- Storage Type: Local files
- Absolute local path:
/full/path/to/unlabeled_images - Click "Add Storage" then "Sync Storage"
- Import → Upload Files → select
label_studio_tasks.json - Each image now loads with pre-drawn boxes from your model
- Click through images, fixing mistakes (much faster than labeling from scratch!)
Step 4: Active Learning Loop with Label Studio
- Initial labeling: Label 50-100 images manually in Label Studio
- Export & prepare:
# Export from Label Studio as COCO format # Split into train/valid/test folders - Train RF-DETR: Run for just 10 epochs (faster iteration)
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \ --dataset-dir dataset/ \ --output-dir runs/iteration_1 \ --model medium \ --epochs 10 - Auto-label new batch: Get predictions on 500 unlabeled images
/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \ --weights runs/iteration_1/checkpoint_best_total.pth \ --images-dir batch_2_images/ \ --output-json batch_2_predictions.json \ --threshold 0.3 /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \ --coco-json batch_2_predictions.json \ --images-dir batch_2_images/ \ --output-json batch_2_ls_tasks.json - Review in Label Studio: Import
batch_2_ls_tasks.json→ review/correct (10x faster than from scratch) - Export & retrain: Add corrected labels to dataset, retrain for 20-50 epochs
- Repeat: Continue with batch 3, 4, etc.
This iterative approach typically achieves 95%+ accuracy with 5-10x less manual effort.
Tips for High-Quality Labels
Consistency is Key
- Same criteria every time: Decide upfront if you label tiny knots, damaged areas, etc.
- Box boundaries: Tight around knot, or include some margin? Pick one and stick to it.
- Occlusions: Label partially visible knots? Document your decision.
Speed vs. Quality
- First 100 images: Take your time, establish consistency
- After 100: Speed up - model will help catch inconsistencies later
- Every 500 images: Audit 20-30 random labels to check quality
Common Mistakes
- ❌ Inconsistent box sizes (sometimes tight, sometimes loose)
- ❌ Missing small knots in some images but labeling them in others
- ❌ Labeling knot-like wood grain patterns
- ❌ Fatigue errors after 2+ hours - take breaks!
Dataset Size Guidelines
- Minimum: 200 labeled images (split: 150 train, 30 valid, 20 test)
- Good: 500-1000 images
- Excellent: 2000+ images
- With active learning: Start with 100, grow to 500+ iteratively
Converting Other Formats to COCO
If you have labels in another format:
From YOLO format:
from pylabel import importer
dataset = importer.ImportYoloV5(
path="yolo_labels/",
img_path="images/",
cat_names=['knot']
)
dataset.export.ExportToCoco(output_path="coco_format/")
From Pascal VOC:
from pylabel import importer
dataset = importer.ImportVOC(path="voc_annotations/")
dataset.export.ExportToCoco()
Troubleshooting
Label Studio won't start:
- Try:
label-studio resetthenlabel-studio start
CVAT Docker issues:
- Check:
docker compose logs - Ensure ports 8080, 8070 are free
Export format doesn't match RF-DETR:
- See validate_coco_dataset.py to check your format
- Your JSON needs:
images,annotations,categorieskeys
Need help?
- Label Studio docs: https://labelstud.io/guide/
- CVAT docs: https://opencv.github.io/cvat/docs/