dillon_stuff/saw_mill_knot_detection

Fork 0

Files

dillonj aed092f09c Initial commit: Wood knot detection model and GUI

2025-12-22 14:11:39 -07:00

7.2 KiB

Raw Blame History

Image Labeling Guide for Knot Detection

Quick Start: Label Studio (Recommended)

1. Install and Launch

# Install (outside your project venv is fine)
pip install label-studio

# Start the server
label-studio start

Open http://localhost:8080 in your browser.

2. Create Your Project

Click "Create Project"
Project Name: "Wood Knot Detection"
Data Import: Click "Upload Files" → select your wood images
Labeling Setup:
- Template: "Object Detection with Bounding Boxes"
- Add label: knot (or multiple types: sound_knot, dead_knot, etc.)

3. Label Images

Keyboard Shortcuts (speeds up 3-5x):

Alt + Click = Create bounding box
Alt + R = Select rectangle tool
Ctrl + Enter = Submit and move to next
Ctrl + Z = Undo

Best Practices:

Draw boxes tight around each knot
Include partial knots at image edges
Label consistently (all knots, or only specific types)
Take breaks every 30-50 images to maintain quality

4. Export to COCO Format

Click project name → Export
Format: "COCO"
Download the zip file

Extract and organize for RF-DETR:

# After extracting export.zip:
unzip export.zip -d exported/

# Organize into RF-DETR format
mkdir -p dataset/train dataset/valid dataset/test

# Move images and rename JSON
mv exported/images/* dataset/train/
mv exported/result.json dataset/train/_annotations.coco.json

# Split 80/10/10 manually or use a script
# Move ~10% of images + their annotations to valid/
# Move ~10% to test/

Tip: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The validate_coco_dataset.py script can help verify the structure.

Alternative: CVAT (More Powerful)

Setup with Docker

git clone https://github.com/opencv/cvat
cd cvat
docker compose up -d

Open http://localhost:8080 (default login: admin/admin)

Features

Keyboard shortcuts: N (new box), Shift+arrows (adjust box)
Interpolation: Auto-label between frames (for video)
Team mode: Multiple annotators on same project
Quality control: Review mode for double-checking labels

Export

Actions → Export task dataset
Format: "COCO 1.0"
Restructure files to match RF-DETR's expected format

Alternative: labelImg (Desktop App)

Quick Setup

pip install labelImg
labelImg /path/to/images

Pros:

No web server needed
Works offline
Very simple interface

Cons:

Exports Pascal VOC by default (not COCO)

Need to convert format:

# Use roboflow or pylabel library to convert VOC → COCO

Model-Assisted Labeling Workflow

After you have a trained model, speed up labeling 10-20x:

Step 1: Auto-label new images

/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
  --weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \
  --images-dir unlabeled_images/ \
  --output-json predictions.json \
  --threshold 0.3

Use low threshold (0.3) to capture more candidates - easier to delete false positives than add missed knots.

Step 2: Convert to Label Studio format

/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
  --coco-json predictions.json \
  --images-dir unlabeled_images/ \
  --output-json label_studio_tasks.json

Step 3: Import predictions into Label Studio

In Label Studio, open your project
Settings → Storage → Add Source Storage → Local files:
- Storage Type: Local files
- Absolute local path: /full/path/to/unlabeled_images
- Click "Add Storage" then "Sync Storage"
Import → Upload Files → select label_studio_tasks.json
Each image now loads with pre-drawn boxes from your model
Click through images, fixing mistakes (much faster than labeling from scratch!)

Step 4: Active Learning Loop with Label Studio

Initial labeling: Label 50-100 images manually in Label Studio

Export & prepare:

# Export from Label Studio as COCO format
# Split into train/valid/test folders

Train RF-DETR: Run for just 10 epochs (faster iteration)

/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \
  --dataset-dir dataset/ \
  --output-dir runs/iteration_1 \
  --model medium \
  --epochs 10

Auto-label new batch: Get predictions on 500 unlabeled images

/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \
  --weights runs/iteration_1/checkpoint_best_total.pth \
  --images-dir batch_2_images/ \
  --output-json batch_2_predictions.json \
  --threshold 0.3

/home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \
  --coco-json batch_2_predictions.json \
  --images-dir batch_2_images/ \
  --output-json batch_2_ls_tasks.json

Review in Label Studio: Import batch_2_ls_tasks.json → review/correct (10x faster than from scratch)
Export & retrain: Add corrected labels to dataset, retrain for 20-50 epochs
Repeat: Continue with batch 3, 4, etc.

This iterative approach typically achieves 95%+ accuracy with 5-10x less manual effort.

Tips for High-Quality Labels

Consistency is Key

Same criteria every time: Decide upfront if you label tiny knots, damaged areas, etc.
Box boundaries: Tight around knot, or include some margin? Pick one and stick to it.
Occlusions: Label partially visible knots? Document your decision.

Speed vs. Quality

First 100 images: Take your time, establish consistency
After 100: Speed up - model will help catch inconsistencies later
Every 500 images: Audit 20-30 random labels to check quality

Common Mistakes

❌ Inconsistent box sizes (sometimes tight, sometimes loose)
❌ Missing small knots in some images but labeling them in others
❌ Labeling knot-like wood grain patterns
❌ Fatigue errors after 2+ hours - take breaks!

Dataset Size Guidelines

Minimum: 200 labeled images (split: 150 train, 30 valid, 20 test)
Good: 500-1000 images
Excellent: 2000+ images
With active learning: Start with 100, grow to 500+ iteratively

Converting Other Formats to COCO

If you have labels in another format:

From YOLO format:

from pylabel import importer

dataset = importer.ImportYoloV5(
    path="yolo_labels/",
    img_path="images/",
    cat_names=['knot']
)
dataset.export.ExportToCoco(output_path="coco_format/")

From Pascal VOC:

from pylabel import importer

dataset = importer.ImportVOC(path="voc_annotations/")
dataset.export.ExportToCoco()

Troubleshooting

Label Studio won't start:

Try: label-studio reset then label-studio start

CVAT Docker issues:

Check: docker compose logs
Ensure ports 8080, 8070 are free

Export format doesn't match RF-DETR:

See validate_coco_dataset.py to check your format
Your JSON needs: images, annotations, categories keys

Need help?

Label Studio docs: https://labelstud.io/guide/
CVAT docs: https://opencv.github.io/cvat/docs/

7.2 KiB Raw Blame History