# Image Labeling Guide for Knot Detection ## Quick Start: Label Studio (Recommended) ### 1. Install and Launch ```bash # Install (outside your project venv is fine) pip install label-studio # Start the server label-studio start ``` Open http://localhost:8080 in your browser. ### 2. Create Your Project 1. Click "Create Project" 2. Project Name: "Wood Knot Detection" 3. Data Import: Click "Upload Files" → select your wood images 4. Labeling Setup: - Template: "Object Detection with Bounding Boxes" - Add label: `knot` (or multiple types: `sound_knot`, `dead_knot`, etc.) ### 3. Label Images **Keyboard Shortcuts (speeds up 3-5x):** - `Alt + Click` = Create bounding box - `Alt + R` = Select rectangle tool - `Ctrl + Enter` = Submit and move to next - `Ctrl + Z` = Undo **Best Practices:** - Draw boxes tight around each knot - Include partial knots at image edges - Label consistently (all knots, or only specific types) - Take breaks every 30-50 images to maintain quality ### 4. Export to COCO Format 1. Click project name → **Export** 2. Format: **"COCO"** 3. Download the zip file 4. Extract and organize for RF-DETR: ```bash # After extracting export.zip: unzip export.zip -d exported/ # Organize into RF-DETR format mkdir -p dataset/train dataset/valid dataset/test # Move images and rename JSON mv exported/images/* dataset/train/ mv exported/result.json dataset/train/_annotations.coco.json # Split 80/10/10 manually or use a script # Move ~10% of images + their annotations to valid/ # Move ~10% to test/ ``` **Tip**: Label Studio keeps all annotations in one JSON. You'll need to split it into train/valid/test. The `validate_coco_dataset.py` script can help verify the structure. --- ## Alternative: CVAT (More Powerful) ### Setup with Docker ```bash git clone https://github.com/opencv/cvat cd cvat docker compose up -d ``` Open http://localhost:8080 (default login: admin/admin) ### Features - **Keyboard shortcuts**: `N` (new box), `Shift+arrows` (adjust box) - **Interpolation**: Auto-label between frames (for video) - **Team mode**: Multiple annotators on same project - **Quality control**: Review mode for double-checking labels ### Export 1. Actions → Export task dataset 2. Format: "COCO 1.0" 3. Restructure files to match RF-DETR's expected format --- ## Alternative: labelImg (Desktop App) ### Quick Setup ```bash pip install labelImg labelImg /path/to/images ``` **Pros:** - No web server needed - Works offline - Very simple interface **Cons:** - Exports Pascal VOC by default (not COCO) - Need to convert format: ```bash # Use roboflow or pylabel library to convert VOC → COCO ``` --- ## Model-Assisted Labeling Workflow After you have a trained model, speed up labeling 10-20x: ### Step 1: Auto-label new images ```bash /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \ --weights runs/knot_rfdetr_medium/checkpoint_best_total.pth \ --images-dir unlabeled_images/ \ --output-json predictions.json \ --threshold 0.3 ``` **Use low threshold (0.3)** to capture more candidates - easier to delete false positives than add missed knots. ### Step 2: Convert to Label Studio format ```bash /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \ --coco-json predictions.json \ --images-dir unlabeled_images/ \ --output-json label_studio_tasks.json ``` ### Step 3: Import predictions into Label Studio 1. In Label Studio, open your project 2. **Settings → Storage → Add Source Storage → Local files**: - Storage Type: Local files - Absolute local path: `/full/path/to/unlabeled_images` - Click "Add Storage" then "Sync Storage" 3. **Import → Upload Files** → select `label_studio_tasks.json` 4. Each image now loads with **pre-drawn boxes from your model** 5. Click through images, fixing mistakes (much faster than labeling from scratch!) ### Step 4: Active Learning Loop with Label Studio 1. **Initial labeling**: Label 50-100 images manually in Label Studio 2. **Export & prepare**: ```bash # Export from Label Studio as COCO format # Split into train/valid/test folders ``` 3. **Train RF-DETR**: Run for just 10 epochs (faster iteration) ```bash /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python train_rfdetr.py \ --dataset-dir dataset/ \ --output-dir runs/iteration_1 \ --model medium \ --epochs 10 ``` 4. **Auto-label new batch**: Get predictions on 500 unlabeled images ```bash /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python auto_label_images.py \ --weights runs/iteration_1/checkpoint_best_total.pth \ --images-dir batch_2_images/ \ --output-json batch_2_predictions.json \ --threshold 0.3 /home/dillon/_code/saw_mill_knot_detection/.venv/bin/python convert_to_label_studio.py \ --coco-json batch_2_predictions.json \ --images-dir batch_2_images/ \ --output-json batch_2_ls_tasks.json ``` 5. **Review in Label Studio**: Import `batch_2_ls_tasks.json` → review/correct (10x faster than from scratch) 6. **Export & retrain**: Add corrected labels to dataset, retrain for 20-50 epochs 7. **Repeat**: Continue with batch 3, 4, etc. This iterative approach typically achieves **95%+ accuracy with 5-10x less manual effort**. --- ## Tips for High-Quality Labels ### Consistency is Key - **Same criteria every time**: Decide upfront if you label tiny knots, damaged areas, etc. - **Box boundaries**: Tight around knot, or include some margin? Pick one and stick to it. - **Occlusions**: Label partially visible knots? Document your decision. ### Speed vs. Quality - **First 100 images**: Take your time, establish consistency - **After 100**: Speed up - model will help catch inconsistencies later - **Every 500 images**: Audit 20-30 random labels to check quality ### Common Mistakes 1. ❌ Inconsistent box sizes (sometimes tight, sometimes loose) 2. ❌ Missing small knots in some images but labeling them in others 3. ❌ Labeling knot-like wood grain patterns 4. ❌ Fatigue errors after 2+ hours - take breaks! ### Dataset Size Guidelines - **Minimum**: 200 labeled images (split: 150 train, 30 valid, 20 test) - **Good**: 500-1000 images - **Excellent**: 2000+ images - **With active learning**: Start with 100, grow to 500+ iteratively --- ## Converting Other Formats to COCO If you have labels in another format: ### From YOLO format: ```python from pylabel import importer dataset = importer.ImportYoloV5( path="yolo_labels/", img_path="images/", cat_names=['knot'] ) dataset.export.ExportToCoco(output_path="coco_format/") ``` ### From Pascal VOC: ```python from pylabel import importer dataset = importer.ImportVOC(path="voc_annotations/") dataset.export.ExportToCoco() ``` --- ## Troubleshooting **Label Studio won't start:** - Try: `label-studio reset` then `label-studio start` **CVAT Docker issues:** - Check: `docker compose logs` - Ensure ports 8080, 8070 are free **Export format doesn't match RF-DETR:** - See [validate_coco_dataset.py](validate_coco_dataset.py) to check your format - Your JSON needs: `images`, `annotations`, `categories` keys **Need help?** - Label Studio docs: https://labelstud.io/guide/ - CVAT docs: https://opencv.github.io/cvat/docs/