Files
saw_mill_knot_detection/RTDETR_README.md
2025-12-23 18:24:40 -07:00

4.4 KiB

RT-DETR Training for OAK-D Camera Deployment

RT-DETR (Real-Time Detection Transformer) is Apache 2.0 licensed - free for commercial use. It's designed for real-time detection and works great on edge devices like the OAK-D 4 Pro.

Why RT-DETR?

  • Apache 2.0 license - truly free for commercial use
  • Excellent OAK camera compatibility - exports cleanly to OpenVINO
  • Real-time performance - 30-60 FPS on OAK-D 4 Pro
  • Modern transformer architecture - competitive accuracy with YOLO
  • Easy deployment - direct export to OpenVINO format

Quick Start

1. Annotate Images

Use the Tkinter annotation GUI:

.venv/bin/python tk_annotation_gui.py --images-dir IMAGE/
  • Load your images from Settings
  • Annotate knots manually or use auto-labeling
  • Aim for 100+ annotated images for good results

2. Train Model

Train from the command line:

.venv/bin/python train_model.py \
   --framework rtdetr \
   --dataset dataset_prepared \
   --output runs/rtdetr_training \
   --model-size small \
   --epochs 100

3. Test Model

.venv/bin/python predict_rtdetr.py \
    --weights runs/rtdetr_training/training/weights/best.pt \
    --image test_image.jpg

4. Export for OAK-D

Export to OpenVINO format:

.venv/bin/python export_rtdetr_oak.py \
    --weights runs/rtdetr_training/training/weights/best.pt \
    --img-size 640

This creates:

  • best_openvino_model/ - OpenVINO IR format (.xml + .bin files)
  • best.onnx - ONNX format (intermediate)

5. Convert to Blob for OAK

Option A: Online converter (easiest)

  1. Go to https://blobconverter.luxonis.com/
  2. Upload best_openvino_model/model.xml
  3. Select "OAK-D 4 Pro"
  4. Download .blob file

Option B: Command line

pip install blobconverter
blobconverter --openvino-xml best_openvino_model/model.xml \
              --shaves 6

6. Deploy to OAK-D Camera

Example DepthAI script:

import depthai as dai
import cv2

# Create pipeline
pipeline = dai.Pipeline()

# Camera
cam = pipeline.createColorCamera()
cam.setPreviewSize(640, 640)
cam.setInterleaved(False)

# Neural network
nn = pipeline.createNeuralNetwork()
nn.setBlobPath("best.blob")
cam.preview.link(nn.input)

# Output
xout = pipeline.createXLinkOut()
xout.setStreamName("detections")
nn.out.link(xout.input)

# Run
with dai.Device(pipeline) as device:
    queue = device.getOutputQueue("detections")
    
    while True:
        detections = queue.get()
        # Process detections...

Model Comparison

Model Size Speed (OAK-D) Accuracy License
RT-DETR r18 ~15MB 30-40 FPS Good Apache 2.0
RT-DETR r34 ~30MB 20-30 FPS Better Apache 2.0
YOLOv11n ~6MB 50-60 FPS Good AGPL
YOLOv6n ~10MB 40-50 FPS Good MIT
RF-DETR nano ~15MB 10-20 FPS* Good Check repo

*May have compatibility issues with OpenVINO

Training Tips

  1. Dataset size:

    • Minimum: 50 images
    • Good: 200+ images
    • Excellent: 1000+ images
  2. Data diversity:

    • Different wood types
    • Various lighting conditions
    • Multiple knot sizes/types
    • Different angles
  3. Training settings:

    • Start with rtdetr-r18 for fastest iteration
    • Use batch-size=8 if you have 8GB+ GPU
    • Train for 100-200 epochs
    • Use early stopping (patience=20)
  4. Data augmentation (automatic):

    • Flips, rotations
    • Color adjustments
    • Crops and scales

Troubleshooting

Training is slow:

  • Reduce batch size
  • Use smaller model (r18)
  • Check GPU usage with nvidia-smi

Low accuracy:

  • Add more training data
  • Train longer (more epochs)
  • Use larger model (r34 or r50)
  • Check your annotations for errors

OAK deployment fails:

  • Ensure OpenVINO export succeeded
  • Check blob size (<200MB for OAK-D)
  • Verify input size matches training (640x640)
  • Try FP16 instead of FP32 to reduce size

Resources

License

RT-DETR is Apache 2.0 licensed - you can use it for:

  • Personal projects
  • Commercial products
  • Internal business tools
  • Proprietary software

No restrictions, no paid licenses required!