Object detection under rainy conditions for Autonomous vehicles
Self-driving vehicles are typically cars or trucks in which human drivers are never required to take control to safely operate the vehicle. They are also known as autonomous or “driverless” cars. These active safety vehicles use a combination of sensors and software solutions to control, navigate, and drive the vehicle. In order to achieve high levels of safety while the cars and trucks maneuver in their environments, it utilizes visual data, in the form of images or video feed for object detection and classification. This is done by employing cameras and deep learning-based methods. The neural networks are usually trained using a large amount of visual data captured in favorable good weather conditions. However, the performance of such systems in challenging weather, such as rainy, fog, or snow conditions, is degraded due to impaired and distorted images.
The degradation of visual data results in minimized scene contrast and visibility, intensity variations in images and video frames due to variations in raindrop sizes, speed. These distortions significantly affect the ability of the vehicle to detect critical objects in the environment. Due to the fact that it is quite difficult to detect and isolate raindrops, and it is equally problematic to restore the information that is lost or occluded by rain. One way to solve this problem is by using deraining algorithms with object detection approaches to correct the derained signal. Although robust, this approach comes with limitations. These algorithms give reasonable performance metrics however, these do not reflect a viable measure for analyzing the performance of the system for more complex tasks, such as object detection.
The scope of this article is limited to survey object detection methods that are being considered for integration into autonomous vehicles’ artificial intelligence (AI) platforms.
OBJECT DETECTION FOR AUTONOMOUS VEHICLES IN CLEAR AND RAINY CONDITIONS
The level of degradation in the performance of an object detection method, trained in certain conditions, is influenced heavily by:
1) how different the training and testing domains are and
2) the type of deep learning-based architecture used for object detection
While there are many object detectors algorithms like convolutional neural networks, Single shot multibox detector, YOLO, and Faster R-CNN, our focus is on Faster R-CNN and You-Only-Look-once (YOLO). These represent two major classes of object detection frameworks. Faster R-CNN is widely used and is based on a two-stage deep learning architecture; one stage is for identifying region proposals (RPs), and the second is for refining and assigning class probabilities for the corresponding regions. YOLO, on the other hand, is a representative of detection frameworks that directly operate on the whole image.
A. Faster R-CNN
Convolutional neural networks have been used for object detection for a long time, initially known as R-CNN where R stands for regions. Later came a fast version of R-CNN and then the idea of merging RP networks (RPNs) with Fast R-CNN emerged which led to a unified architecture known as Faster R-CNN, which is computationally more efficient in detection. At a very high level, an image is fed into the network to generate feature maps. Then the RPN part of the network is utilized to predict the regions that can potentially contain objects of interest. As there can be multiple overlapping regions, an operation called non-maximum suppression is used to remove redundant regions while keeping the ones with high prediction scores. As a next step, each regional proposal that survives the NMS process is used by a region-of-interest (RoI) pooling layer to crop the corresponding features from the feature map. This cropping process produces a feature vector that is fed to two fully connected layers: one predicts the offset values of a bounding box of an object with respect to the regional proposal, and the other predicts class probabilities for the predicted bounding box.
B. YOLO
YOLO on the other hand was developed as a unified neural network model that predicts the bounding box and class probability directly from the full image in one evaluation. In YOLO, the image is divided into multiple grids and each cell in the grid is responsible for detecting the object of interest within its cell. As there can potentially be multiple wrongly predicted boxes, a threshold is used on the confidence scores of the bounding boxes.
DERAINING IN CONJUNCTION WITH OBJECT DETECTION
The following section briefly described three deep learning-based methods to restore an image that has been distorted by rain while preserving the important visual details.
A. Deep Detail Network
This approach uses a convolutional neural network(CNN), which is ResNet, to predict the difference between a good and a raindrop-affected image. Then this difference is used to remove the distortions from the image.
B. Attentive Generative Adversarial Network
In this method, a generative adversarial network (GAN) with visual attention is employed to learn raindrop areas and their surroundings. The first part of the generative network, known as the Attentive-Recurrent Network (ARN), produces an attention map that guides the next stage of the framework. ARN comprises ResNet, LSTM, and CNN layers. The expectation from the two stages of the network is to remove raindrops from the image. The architecture also includes a discriminative layer that compares the generated images with real good images, during the training process.
C. Progressive Image Deraining Network
In this architecture, raindrops are removed recursively from an image. In each iteration, some rain is removed and as the iterations progress, rain is expected to be removed completely leading to a rain-free image. The network includes ResNet, CNN layer, and LSTM.
ALTERNATIVE TRAINING APPROACHES FOR DEEP LEARNING-BASED OBJECT DETECTION
The biggest challenge in achieving good results for object detection in images affected by rainy conditions is the unavailability of data. Either the data is not available or data with proper annotation is not available. This section discussed some alternate techniques for object detection under rainy conditions.
A. Unsupervised image-to-image translation
In this technique, the image is translated from one domain to another while maintaining the visual content. GANs have proven to be effective in the area of image translation.
B. Domain adaptation
This technique employs an adversarial training strategy to learn robust features that are domain-invariant. In other words, it makes the distribution of features extracted from images in the two domains indistinguishable
Reference
- https://arxiv.org/pdf/2006.16471.pdf✎ EditSign