189 lines
4.6 KiB
Markdown
189 lines
4.6 KiB
Markdown
|
|
# RT-DETR Training for OAK-D Camera Deployment
|
||
|
|
|
||
|
|
RT-DETR (Real-Time Detection Transformer) is Apache 2.0 licensed - **free for commercial use**. It's designed for real-time detection and works great on edge devices like the OAK-D 4 Pro.
|
||
|
|
|
||
|
|
## Why RT-DETR?
|
||
|
|
|
||
|
|
- ✅ **Apache 2.0 license** - truly free for commercial use
|
||
|
|
- ✅ **Excellent OAK camera compatibility** - exports cleanly to OpenVINO
|
||
|
|
- ✅ **Real-time performance** - 30-60 FPS on OAK-D 4 Pro
|
||
|
|
- ✅ **Modern transformer architecture** - competitive accuracy with YOLO
|
||
|
|
- ✅ **Easy deployment** - direct export to OpenVINO format
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
### 1. Annotate Images
|
||
|
|
|
||
|
|
Use the annotation GUI:
|
||
|
|
```bash
|
||
|
|
.venv/bin/python annotation_gui.py
|
||
|
|
```
|
||
|
|
|
||
|
|
- Load your images from Settings
|
||
|
|
- Annotate knots manually or use auto-labeling
|
||
|
|
- Aim for 100+ annotated images for good results
|
||
|
|
|
||
|
|
### 2. Train Model
|
||
|
|
|
||
|
|
From the GUI:
|
||
|
|
1. Go to **Training** tab
|
||
|
|
2. Click "Prepare Dataset" (creates train/valid/test splits)
|
||
|
|
3. Select **RT-DETR** framework
|
||
|
|
4. Choose model size:
|
||
|
|
- `nano` (r18): Fastest, 30-40 FPS on OAK
|
||
|
|
- `small` (r34): Balanced
|
||
|
|
- `medium` (r50): More accurate
|
||
|
|
- `base` (l): Best accuracy, slower
|
||
|
|
5. Click "Start Training"
|
||
|
|
|
||
|
|
Or from command line:
|
||
|
|
```bash
|
||
|
|
.venv/bin/python train_rtdetr.py \
|
||
|
|
--dataset-dir dataset_prepared \
|
||
|
|
--model rtdetr-r18 \
|
||
|
|
--epochs 100 \
|
||
|
|
--batch-size 8
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Test Model
|
||
|
|
|
||
|
|
```bash
|
||
|
|
.venv/bin/python predict_rtdetr.py \
|
||
|
|
--weights runs/rtdetr_training/training/weights/best.pt \
|
||
|
|
--image test_image.jpg
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Export for OAK-D
|
||
|
|
|
||
|
|
Export to OpenVINO format:
|
||
|
|
```bash
|
||
|
|
.venv/bin/python export_rtdetr_oak.py \
|
||
|
|
--weights runs/rtdetr_training/training/weights/best.pt \
|
||
|
|
--img-size 640
|
||
|
|
```
|
||
|
|
|
||
|
|
This creates:
|
||
|
|
- `best_openvino_model/` - OpenVINO IR format (.xml + .bin files)
|
||
|
|
- `best.onnx` - ONNX format (intermediate)
|
||
|
|
|
||
|
|
### 5. Convert to Blob for OAK
|
||
|
|
|
||
|
|
**Option A: Online converter** (easiest)
|
||
|
|
1. Go to https://blobconverter.luxonis.com/
|
||
|
|
2. Upload `best_openvino_model/model.xml`
|
||
|
|
3. Select "OAK-D 4 Pro"
|
||
|
|
4. Download `.blob` file
|
||
|
|
|
||
|
|
**Option B: Command line**
|
||
|
|
```bash
|
||
|
|
pip install blobconverter
|
||
|
|
blobconverter --openvino-xml best_openvino_model/model.xml \
|
||
|
|
--shaves 6
|
||
|
|
```
|
||
|
|
|
||
|
|
### 6. Deploy to OAK-D Camera
|
||
|
|
|
||
|
|
Example DepthAI script:
|
||
|
|
```python
|
||
|
|
import depthai as dai
|
||
|
|
import cv2
|
||
|
|
|
||
|
|
# Create pipeline
|
||
|
|
pipeline = dai.Pipeline()
|
||
|
|
|
||
|
|
# Camera
|
||
|
|
cam = pipeline.createColorCamera()
|
||
|
|
cam.setPreviewSize(640, 640)
|
||
|
|
cam.setInterleaved(False)
|
||
|
|
|
||
|
|
# Neural network
|
||
|
|
nn = pipeline.createNeuralNetwork()
|
||
|
|
nn.setBlobPath("best.blob")
|
||
|
|
cam.preview.link(nn.input)
|
||
|
|
|
||
|
|
# Output
|
||
|
|
xout = pipeline.createXLinkOut()
|
||
|
|
xout.setStreamName("detections")
|
||
|
|
nn.out.link(xout.input)
|
||
|
|
|
||
|
|
# Run
|
||
|
|
with dai.Device(pipeline) as device:
|
||
|
|
queue = device.getOutputQueue("detections")
|
||
|
|
|
||
|
|
while True:
|
||
|
|
detections = queue.get()
|
||
|
|
# Process detections...
|
||
|
|
```
|
||
|
|
|
||
|
|
## Model Comparison
|
||
|
|
|
||
|
|
| Model | Size | Speed (OAK-D) | Accuracy | License |
|
||
|
|
|-------|------|---------------|----------|---------|
|
||
|
|
| RT-DETR r18 | ~15MB | 30-40 FPS | Good | Apache 2.0 ✅ |
|
||
|
|
| RT-DETR r34 | ~30MB | 20-30 FPS | Better | Apache 2.0 ✅ |
|
||
|
|
| YOLOv11n | ~6MB | 50-60 FPS | Good | AGPL ❌ |
|
||
|
|
| YOLOv6n | ~10MB | 40-50 FPS | Good | MIT ✅ |
|
||
|
|
| RF-DETR nano | ~15MB | 10-20 FPS* | Good | Check repo |
|
||
|
|
|
||
|
|
*May have compatibility issues with OpenVINO
|
||
|
|
|
||
|
|
## Training Tips
|
||
|
|
|
||
|
|
1. **Dataset size**:
|
||
|
|
- Minimum: 50 images
|
||
|
|
- Good: 200+ images
|
||
|
|
- Excellent: 1000+ images
|
||
|
|
|
||
|
|
2. **Data diversity**:
|
||
|
|
- Different wood types
|
||
|
|
- Various lighting conditions
|
||
|
|
- Multiple knot sizes/types
|
||
|
|
- Different angles
|
||
|
|
|
||
|
|
3. **Training settings**:
|
||
|
|
- Start with `rtdetr-r18` for fastest iteration
|
||
|
|
- Use `batch-size=8` if you have 8GB+ GPU
|
||
|
|
- Train for 100-200 epochs
|
||
|
|
- Use early stopping (patience=20)
|
||
|
|
|
||
|
|
4. **Data augmentation** (automatic):
|
||
|
|
- Flips, rotations
|
||
|
|
- Color adjustments
|
||
|
|
- Crops and scales
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
**Training is slow:**
|
||
|
|
- Reduce batch size
|
||
|
|
- Use smaller model (r18)
|
||
|
|
- Check GPU usage with `nvidia-smi`
|
||
|
|
|
||
|
|
**Low accuracy:**
|
||
|
|
- Add more training data
|
||
|
|
- Train longer (more epochs)
|
||
|
|
- Use larger model (r34 or r50)
|
||
|
|
- Check your annotations for errors
|
||
|
|
|
||
|
|
**OAK deployment fails:**
|
||
|
|
- Ensure OpenVINO export succeeded
|
||
|
|
- Check blob size (<200MB for OAK-D)
|
||
|
|
- Verify input size matches training (640x640)
|
||
|
|
- Try FP16 instead of FP32 to reduce size
|
||
|
|
|
||
|
|
## Resources
|
||
|
|
|
||
|
|
- [RT-DETR Paper](https://arxiv.org/abs/2304.08069)
|
||
|
|
- [Ultralytics RT-DETR Docs](https://docs.ultralytics.com/models/rtdetr/)
|
||
|
|
- [OAK-D Docs](https://docs.luxonis.com/)
|
||
|
|
- [DepthAI Examples](https://github.com/luxonis/depthai-experiments)
|
||
|
|
|
||
|
|
## License
|
||
|
|
|
||
|
|
RT-DETR is Apache 2.0 licensed - you can use it for:
|
||
|
|
- ✅ Personal projects
|
||
|
|
- ✅ Commercial products
|
||
|
|
- ✅ Internal business tools
|
||
|
|
- ✅ Proprietary software
|
||
|
|
|
||
|
|
No restrictions, no paid licenses required!
|