Object Detection¶
This package lists contributed object detection models.
Faster R-CNN¶
- class pl_bolts.models.detection.faster_rcnn.faster_rcnn_module.FasterRCNN(learning_rate=0.0001, num_classes=91, backbone=None, fpn=True, pretrained=False, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs)[source]
Bases:
pytorch_lightning.
PyTorch Lightning implementation of Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.
Paper authors: Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
- Model implemented by:
Teddy Koker <https://github.com/teddykoker>
- During training, the model expects both the input tensors, as well as targets (list of dictionary), containing:
boxes (FloatTensor[N, 4]): the ground truth boxes in [x1, y1, x2, y2] format.
labels (Int64Tensor[N]): the class label for each ground truh box
CLI command:
# PascalVOC python faster_rcnn_module.py --gpus 1 --pretrained True
- Parameters
num_classes¶ (
int
) – number of detection classes (including background)backbone¶ (
Union
[str
,Module
,None
]) – Pretained backbone CNN architecture or torch.nn.Module instance.fpn¶ (
bool
) – If True, creates a Feature Pyramind Network on top of Resnet based CNNs.pretrained¶ (
bool
) – if true, returns a model pre-trained on COCO train2017pretrained_backbone¶ (
bool
) – if true, returns a model with backbone pre-trained on Imagenettrainable_backbone_layers¶ (
int
) – number of trainable resnet layers starting from final block
RetinaNet¶
- class pl_bolts.models.detection.retinanet.retinanet_module.RetinaNet(learning_rate=0.0001, num_classes=91, backbone=None, fpn=True, pretrained=False, pretrained_backbone=True, trainable_backbone_layers=3, **kwargs)[source]
Bases:
pytorch_lightning.
PyTorch Lightning implementation of RetinaNet.
Paper: Focal Loss for Dense Object Detection.
Paper authors: Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár
- Model implemented by:
Aditya Oke <https://github.com/oke-aditya>
- During training, the model expects both the input tensors, as well as targets (list of dictionary), containing:
boxes (FloatTensor[N, 4]): the ground truth boxes in [x1, y1, x2, y2] format.
labels (Int64Tensor[N]): the class label for each ground truh box
CLI command:
# PascalVOC using LightningCLI python retinanet_module.py --trainer.gpus 1 --model.pretrained True
- Parameters
num_classes¶ (
int
) – number of detection classes (including background)backbone¶ (
Optional
[str
]) – Pretained backbone CNN architecture.fpn¶ (
bool
) – If True, creates a Feature Pyramind Network on top of Resnet based CNNs.pretrained¶ (
bool
) – if true, returns a model pre-trained on COCO train2017pretrained_backbone¶ (
bool
) – if true, returns a model with backbone pre-trained on Imagenettrainable_backbone_layers¶ (
int
) – number of trainable resnet layers starting from final block
YOLO¶
- class pl_bolts.models.detection.yolo.yolo_module.YOLO(network, optimizer=torch.optim.SGD, optimizer_params={'lr': 0.001, 'momentum': 0.9, 'weight_decay': 0.0005}, lr_scheduler=<class 'pl_bolts.optimizers.lr_scheduler.LinearWarmupCosineAnnealingLR'>, lr_scheduler_params={'max_epochs': 300, 'warmup_epochs': 1, 'warmup_start_lr': 0.0}, confidence_threshold=0.2, nms_threshold=0.45, max_predictions_per_image=-1)[source]
Bases:
pytorch_lightning.
PyTorch Lightning implementation of YOLOv3 and YOLOv4.
YOLOv3 paper: Joseph Redmon and Ali Farhadi
YOLOv4 paper: Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Mark Liao
Implementation: Seppo Enarvi
The network architecture can be read from a Darknet configuration file using the
YOLOConfiguration
class, or created by some other means, and provided as a list of PyTorch modules.The input from the data loader is expected to be a list of images. Each image is a tensor with shape
[channels, height, width]
. The images from a single batch will be stacked into a single tensor, so the sizes have to match. Different batches can have different image sizes, as long as the size is divisible by the ratio in which the network downsamples the input.During training, the model expects both the input tensors and a list of targets. Each target is a dictionary containing:
boxes (
FloatTensor[N, 4]
): the ground-truth boxes in (x1, y1, x2, y2) formatlabels (
Int64Tensor[N]
): the class label for each ground-truth box
forward()
method returns all predictions from all detection layers in all images in one tensor with shape[images, predictors, classes + 5]
. The coordinates are scaled to the input image size. During training it also returns a dictionary containing the classification, box overlap, and confidence losses.During inference, the model requires only the input tensors.
infer()
method filters and processes the predictions. The processed output includes the following tensors:boxes (
FloatTensor[N, 4]
): predicted bounding box (x1, y1, x2, y2) coordinates in image spacescores (
FloatTensor[N]
): detection confidenceslabels (
Int64Tensor[N]
): the predicted labels for each image
Weights can be loaded from a Darknet model file using
load_darknet_weights()
.CLI command:
# PascalVOC wget https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny-3l.cfg python yolo_module.py --config yolov4-tiny-3l.cfg --data_dir . --gpus 8 --batch_size 8
- Parameters
network¶ (
ModuleList
) – A list of network modules. This can be obtained from a Darknet configuration using theget_network()
method.optimizer¶ (
Type
[Optimizer
]) – Which optimizer class to use for training.optimizer_params¶ (
Dict
[str
,Any
]) – Parameters to pass to the optimizer constructor.lr_scheduler¶ (
Type
[LRScheduler
]) – Which learning rate scheduler class to use for training.lr_scheduler_params¶ (
Dict
[str
,Any
]) – Parameters to pass to the learning rate scheduler constructor.confidence_threshold¶ (
float
) – Postprocessing will remove bounding boxes whose confidence score is not higher than this threshold.nms_threshold¶ (
float
) – Non-maximum suppression will remove bounding boxes whose IoU with a higher confidence box is higher than this threshold, if the predicted categories are equal.max_predictions_per_image¶ (
int
) – If non-negative, keep at most this number of highest-confidence predictions per image.
- configure_optimizers()[source]
Constructs the optimizer and learning rate scheduler.
- forward(images, targets=None)[source]
Runs a forward pass through the network (all layers listed in
self.network
), and if training targets are provided, computes the losses from the detection layers.Detections are concatenated from the detection layers. Each image will produce N * num_anchors * grid_height * grid_width detections, where N depends on the number of detection layers. For one detection layer N = 1, and each detection layer increases it by a number that depends on the size of the feature map on that layer. For example, if the feature map is twice as wide and high as the grid, the layer will add four times more features.
- Parameters
- Returns
Detections, and if targets were provided, a dictionary of losses. Detections are shaped
[batch_size, num_predictors, num_classes + 5]
, wherenum_predictors
is the total number of cells in all detection layers times the number of boxes predicted by one cell. The predicted box coordinates are in (x1, y1, x2, y2) format and scaled to the input image size.- Return type
- infer(image)[source]
Feeds an image to the network and returns the detected bounding boxes, confidence scores, and class labels.
- load_darknet_weights(weight_file)[source]
Loads weights to layer modules from a pretrained Darknet model.
One may want to continue training from the pretrained weights, on a dataset with a different number of object categories. The number of kernels in the convolutional layers just before each detection layer depends on the number of output classes. The Darknet solution is to truncate the weight file and stop reading weights at the first incompatible layer. For this reason the function silently leaves the rest of the layers unchanged, when the weight file ends.
- Parameters
weight_file¶ – A file object containing model weights in the Darknet binary format.
- test_step(batch, batch_idx)[source]
Evaluates a batch of data from the test set.
- training_step(batch, batch_idx)[source]
Computes the training loss.
- Parameters
- Return type
- Returns
A dictionary that includes the training loss in ‘loss’.
- validation_step(batch, batch_idx)[source]
Evaluates a batch of data from the validation set.