4.4 KiB
4.4 KiB
RT-DETR Training for OAK-D Camera Deployment
RT-DETR (Real-Time Detection Transformer) is Apache 2.0 licensed - free for commercial use. It's designed for real-time detection and works great on edge devices like the OAK-D 4 Pro.
Why RT-DETR?
- ✅ Apache 2.0 license - truly free for commercial use
- ✅ Excellent OAK camera compatibility - exports cleanly to OpenVINO
- ✅ Real-time performance - 30-60 FPS on OAK-D 4 Pro
- ✅ Modern transformer architecture - competitive accuracy with YOLO
- ✅ Easy deployment - direct export to OpenVINO format
Quick Start
1. Annotate Images
Use the Tkinter annotation GUI:
.venv/bin/python tk_annotation_gui.py --images-dir IMAGE/
- Load your images from Settings
- Annotate knots manually or use auto-labeling
- Aim for 100+ annotated images for good results
2. Train Model
Train from the command line:
.venv/bin/python train_model.py \
--framework rtdetr \
--dataset dataset_prepared \
--output runs/rtdetr_training \
--model-size small \
--epochs 100
3. Test Model
.venv/bin/python predict_rtdetr.py \
--weights runs/rtdetr_training/training/weights/best.pt \
--image test_image.jpg
4. Export for OAK-D
Export to OpenVINO format:
.venv/bin/python export_rtdetr_oak.py \
--weights runs/rtdetr_training/training/weights/best.pt \
--img-size 640
This creates:
best_openvino_model/- OpenVINO IR format (.xml + .bin files)best.onnx- ONNX format (intermediate)
5. Convert to Blob for OAK
Option A: Online converter (easiest)
- Go to https://blobconverter.luxonis.com/
- Upload
best_openvino_model/model.xml - Select "OAK-D 4 Pro"
- Download
.blobfile
Option B: Command line
pip install blobconverter
blobconverter --openvino-xml best_openvino_model/model.xml \
--shaves 6
6. Deploy to OAK-D Camera
Example DepthAI script:
import depthai as dai
import cv2
# Create pipeline
pipeline = dai.Pipeline()
# Camera
cam = pipeline.createColorCamera()
cam.setPreviewSize(640, 640)
cam.setInterleaved(False)
# Neural network
nn = pipeline.createNeuralNetwork()
nn.setBlobPath("best.blob")
cam.preview.link(nn.input)
# Output
xout = pipeline.createXLinkOut()
xout.setStreamName("detections")
nn.out.link(xout.input)
# Run
with dai.Device(pipeline) as device:
queue = device.getOutputQueue("detections")
while True:
detections = queue.get()
# Process detections...
Model Comparison
| Model | Size | Speed (OAK-D) | Accuracy | License |
|---|---|---|---|---|
| RT-DETR r18 | ~15MB | 30-40 FPS | Good | Apache 2.0 ✅ |
| RT-DETR r34 | ~30MB | 20-30 FPS | Better | Apache 2.0 ✅ |
| YOLOv11n | ~6MB | 50-60 FPS | Good | AGPL ❌ |
| YOLOv6n | ~10MB | 40-50 FPS | Good | MIT ✅ |
| RF-DETR nano | ~15MB | 10-20 FPS* | Good | Check repo |
*May have compatibility issues with OpenVINO
Training Tips
-
Dataset size:
- Minimum: 50 images
- Good: 200+ images
- Excellent: 1000+ images
-
Data diversity:
- Different wood types
- Various lighting conditions
- Multiple knot sizes/types
- Different angles
-
Training settings:
- Start with
rtdetr-r18for fastest iteration - Use
batch-size=8if you have 8GB+ GPU - Train for 100-200 epochs
- Use early stopping (patience=20)
- Start with
-
Data augmentation (automatic):
- Flips, rotations
- Color adjustments
- Crops and scales
Troubleshooting
Training is slow:
- Reduce batch size
- Use smaller model (r18)
- Check GPU usage with
nvidia-smi
Low accuracy:
- Add more training data
- Train longer (more epochs)
- Use larger model (r34 or r50)
- Check your annotations for errors
OAK deployment fails:
- Ensure OpenVINO export succeeded
- Check blob size (<200MB for OAK-D)
- Verify input size matches training (640x640)
- Try FP16 instead of FP32 to reduce size
Resources
License
RT-DETR is Apache 2.0 licensed - you can use it for:
- ✅ Personal projects
- ✅ Commercial products
- ✅ Internal business tools
- ✅ Proprietary software
No restrictions, no paid licenses required!