Files
saw_mill_knot_detection/LABELING_GUIDE.md

7.2 KiB

Image Labeling Guide for Knot Detection

1. Install and Launch

# Install (outside your project venv is fine)
pip install label-studio

# Start the server
label-studio start

Open http://localhost:8080 in your browser.

2. Create Your Project

  1. Click "Create Project"
  2. Project Name: "Wood Knot Detection"
  3. Data Import: Click "Upload Files" → select your wood images
  4. Labeling Setup:
    • Template: "Object Detection with Bounding Boxes"
    • Add label: knot (or multiple types: sound_knot, dead_knot, etc.)

3. Label Images

Keyboard Shortcuts (speeds up 3-5x):

  • Alt + Click = Create bounding box
  • Alt + R = Select rectangle tool
  • Ctrl + Enter = Submit and move to next
  • Ctrl + Z = Undo

Best Practices:

  • Draw boxes tight around each knot
  • Include partial knots at image edges
  • Label consistently (all knots, or only specific types)
  • Take breaks every 30-50 images to maintain quality

4. Export to COCO Format

  1. Click project name → Export
  2. Format: "COCO"
  3. Download the zip file
  4. Extract and organize for RF-DETR:
    # After extracting export.zip:
    unzip export.zip -d exported/
    
    # Organize into RF-DETR format
    mkdir -p dataset/train dataset/valid dataset/test
    
    # Move images and rename JSON
    mv exported/images/* dataset/train/
    mv exported/result.json dataset/train/_annotations.coco.json
    
    # Split 80/10/10 manually or use a script
    # Move ~10% of images + their annotations to valid/
    # Move ~10% to test/
    

Tip: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The validate_coco_dataset.py script can help verify the structure.


Alternative: CVAT (More Powerful)

Setup with Docker

git clone https://github.com/opencv/cvat
cd cvat
docker compose up -d

Open http://localhost:8080 (default login: admin/admin)

Features

  • Keyboard shortcuts: N (new box), Shift+arrows (adjust box)
  • Interpolation: Auto-label between frames (for video)
  • Team mode: Multiple annotators on same project
  • Quality control: Review mode for double-checking labels

Export

  1. Actions → Export task dataset
  2. Format: "COCO 1.0"
  3. Restructure files to match RF-DETR's expected format

Alternative: labelImg (Desktop App)

Quick Setup

pip install labelImg
labelImg /path/to/images

Pros:

  • No web server needed
  • Works offline
  • Very simple interface

Cons:

  • Exports Pascal VOC by default (not COCO)
  • Need to convert format:
    # Use roboflow or pylabel library to convert VOC → COCO
    

Model-Assisted Labeling Workflow

After you have a trained model, speed up labeling 10-20x:

Step 1: Auto-label new images

/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
  --weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \
  --images-dir unlabeled_images/ \
  --output-json predictions.json \
  --threshold 0.3

Use low threshold (0.3) to capture more candidates - easier to delete false positives than add missed knots.

Step 2: Convert to Label Studio format

/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
  --coco-json predictions.json \
  --images-dir unlabeled_images/ \
  --output-json label_studio_tasks.json

Step 3: Import predictions into Label Studio

  1. In Label Studio, open your project
  2. Settings → Storage → Add Source Storage → Local files:
    • Storage Type: Local files
    • Absolute local path: /full/path/to/unlabeled_images
    • Click "Add Storage" then "Sync Storage"
  3. Import → Upload Files → select label_studio_tasks.json
  4. Each image now loads with pre-drawn boxes from your model
  5. Click through images, fixing mistakes (much faster than labeling from scratch!)

Step 4: Active Learning Loop with Label Studio

  1. Initial labeling: Label 50-100 images manually in Label Studio
  2. Export & prepare:
    # Export from Label Studio as COCO format
    # Split into train/valid/test folders
    
  3. Train RF-DETR: Run for just 10 epochs (faster iteration)
    /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \
      --dataset-dir dataset/ \
      --output-dir runs/iteration_1 \
      --model medium \
      --epochs 10
    
  4. Auto-label new batch: Get predictions on 500 unlabeled images
    /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
      --weights runs/iteration_1/checkpoint_best_total.pth \
      --images-dir batch_2_images/ \
      --output-json batch_2_predictions.json \
      --threshold 0.3
    
    /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
      --coco-json batch_2_predictions.json \
      --images-dir batch_2_images/ \
      --output-json batch_2_ls_tasks.json
    
  5. Review in Label Studio: Import batch_2_ls_tasks.json → review/correct (10x faster than from scratch)
  6. Export & retrain: Add corrected labels to dataset, retrain for 20-50 epochs
  7. Repeat: Continue with batch 3, 4, etc.

This iterative approach typically achieves 95%+ accuracy with 5-10x less manual effort.


Tips for High-Quality Labels

Consistency is Key

  • Same criteria every time: Decide upfront if you label tiny knots, damaged areas, etc.
  • Box boundaries: Tight around knot, or include some margin? Pick one and stick to it.
  • Occlusions: Label partially visible knots? Document your decision.

Speed vs. Quality

  • First 100 images: Take your time, establish consistency
  • After 100: Speed up - model will help catch inconsistencies later
  • Every 500 images: Audit 20-30 random labels to check quality

Common Mistakes

  1. Inconsistent box sizes (sometimes tight, sometimes loose)
  2. Missing small knots in some images but labeling them in others
  3. Labeling knot-like wood grain patterns
  4. Fatigue errors after 2+ hours - take breaks!

Dataset Size Guidelines

  • Minimum: 200 labeled images (split: 150 train, 30 valid, 20 test)
  • Good: 500-1000 images
  • Excellent: 2000+ images
  • With active learning: Start with 100, grow to 500+ iteratively

Converting Other Formats to COCO

If you have labels in another format:

From YOLO format:

from pylabel import importer

dataset = importer.ImportYoloV5(
    path="yolo_labels/",
    img_path="images/",
    cat_names=['knot']
)
dataset.export.ExportToCoco(output_path="coco_format/")

From Pascal VOC:

from pylabel import importer

dataset = importer.ImportVOC(path="voc_annotations/")
dataset.export.ExportToCoco()

Troubleshooting

Label Studio won't start:

  • Try: label-studio reset then label-studio start

CVAT Docker issues:

  • Check: docker compose logs
  • Ensure ports 8080, 8070 are free

Export format doesn't match RF-DETR:

Need help?