International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(2): 105-113
Published online June 25, 2024
https://doi.org/10.5391/IJFIS.2024.24.2.105
© The Korean Institute of Intelligent Systems
Jeongmin Kim1 and Hyukdoo Choi2
1AI Technology Development Team 2, Autonomous A2Z, Anyang, Korea
2Department of Electronic Materials, Devices, and Equipment Engineering, Soonchunhyang University, Asan, Korea
Correspondence to :
Hyukdoo Choi (hyukdoo.choi@sch.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Lane detection is a critical component of autonomous driving technologies that face challenges such as varied road conditions and diverse lane orientations. In this study, we aim to address these challenges by proposing PolyLaneDet, a novel lane detection model that utilizes a freeform polyline, termed ‘polylane,’ which adapts to both vertical and horizontal lane orientations without the need for post-processing. Our method builds on the YOLOv4 architecture to avoid restricting the number of detectable lanes. This model can regress both vertical and horizontal coordinates, thereby improving the adaptability and accuracy of lane detection in various scenarios. We conducted extensive experiments using the CULane benchmark and a custom dataset to validate the effectiveness of the proposed approach. The results demonstrate that PolyLaneDet achieves a competitive performance, particularly in detecting horizontal lane markings and stop lines, which are often omitted in traditional models. In conclusion, PolyLaneDet advances lane detection technology by combining flexible lane representation with robust detection capabilities, making it suitable for real-world applications with diverse road geometries.
Keywords: Lane detection, CNN, Deep learning
Lane detection is a fundamental component of advanced driver assistance systems (ADAS) and self-driving systems. Given its importance, numerous studies have been conducted in the computer vision community to enhance lane detection [1, 2]. Because lane markings are thin lines, cameras are the only sensors that can reliably recognize lanes, making convolutional neural networks (CNNs) the foundational framework for lane detection. However, lane detectors are vulnerable to camera limitations including severe occlusions, lighting conditions, weather conditions, and blurry markings.
Existing studies can be categorized based on lane representation: segmentation and parametric representations. The segmentation representation produced by semantic segmentation models [3, 4] is a pixel-wise classification map. Although it is easy to train and does not limit the number or shape of lane markings, it is user-friendly for humans but not for machines. An additional post-processing step is required to extract a parametric representation to make the segmentation map usable for autonomous driving systems. Although this approach may lead to high-performance benchmarks, its practical application is challenging. In contrast, parametric representation defines specific lane-marking instances with parameters. This is straightforward to use but is limited to a certain number and shape of lane markings. Most parametric representations assume that a vehicle is driving in a long, regular lane, without considering lane changes or crossroads. Although such situations are infrequent in benchmarks, they often occur and are critical to self-driving vehicles.
In this study, we present a form-free representation of lane markings designed to reliably detect lane markings in various practical cases. Unlike previous works [5, 6], our representation takes the form of a polyline but does not fix the y-axis coordinates on the images. The use of fixed y-coordinates assumes only vertical lane markings, making them unsuitable for handling horizontal lane markings. Our lane detector can be applied to stop lines or crosswalks that are more appropriately represented by lines than by bounding boxes. Furthermore, our model, which is based on the object detector YOLOv4 [7], can detect all lane markings in an image, unlike existing parametric representation models that are designed to output up to four lanes [8, 9].
The main contributions of this paper are summarized as follows: First, we propose a novel representation of lane markings, termed a polylane, which is a free-form polyline that has not previously been used for lane detection and does not require any post-processing. Second, our model is based on the widely used YOLOv4, which has a relatively simple structure compared with existing models [10, 11]. Third, we develop a new nonmaximal suppression (NMS) algorithm specifically for lane marking. Because our model densely predicts lane markings from the output feature map, an NMS algorithm is necessary. Our PolyLane NMS (PL-NMS) algorithm effectively filters out overlapping predictions. Finally, our algorithms were validated using a custom road image dataset and the widely recognized CULane dataset.
Because lane markings are detected from camera images, most lane detection models are based on CNN. Lane detectors are categorized based on their output representations. Lane markings are typically represented using pixel-wise segmentation maps or instance-wise parameters. Existing studies were reviewed based on these criteria.
From early studies on lane detection based on deep learning, semantic segmentation has become a popular approach to lane detection tasks [8–14]. The segmentation map consists of (C+1) channels that predict the probability of each pixel belonging to one of the lane or background classes. Pan et al. [10] proposed a spatial CNN for propagating local information globally. It treats the row or column slices of a feature map as internal layers and performs convolutions through the slices in four directions. As a result, each output pixel secures a global field of view; however, this slows down through many slice operations. Recurrent feature-shift aggregator (RESA) [11] developed this idea more efficiently. They proposed RESA module to remove sequential operations within a layer. It parallelizes row-wise or column-wise convolutions and iterates layers with different slice strides.
However, there are different approaches for outputting a lane segmentation map. Using prior knowledge that lane markings are long thin vertical lines, they classified lane pixels over same-row pixels rather than traditional channel-wise classification. Ultra-fast lane detection (UFLD) [15] invented this approach to achieve high-speed segmentation. It formulates a lane-detection task as a column selection problem in each row of a low-resolution feature map. In segmentation-based approaches, segmentation map outputs are easy to interpret and visualize; however, heuristic post-processing must recognize connected lane markings.
Unlike in segmentation representations, there are diverse representations of lane markings in parametric representations. LineCNN [16] is an anchor-based lane detector. Lane markings were detected from the border grid cells of the feature map, and multiple line proposals based on predefined angles were presented for each grid cell. The line with the highest score was selected, and its detailed shape was tuned using the horizontal offset parameters of the selected line. LaneATT [17] developed an anchor-based lane detector using an attentional mechanism. It extracts feature vectors along predefined anchor lines on a feature map and fuses them with attention weights to obtain global features. The global feature helps to detect occluded lane markings. These anchor-based models are vulnerable to occlusions occurring at the edges of images because they detect lane markings starting from the image borders. In addition, the use of many anchor lines leads to computational inefficiency.
CondLaneNet [5] inherits a row-wise segmentation map from UFLD; however, lane markings are located using the weighted mean of grid cell locations rather than classification. Lane instances are detected at the starting points of the lane markings at the bottom of the image, as in LineCNN [16]. They proposed a recurrent instance module (RIM) to handle complex shapes such as forked lanes; however, utilizing long short-term memory (LSTM)-based dynamic kernels overly complicates the model structure. PolyLaneNet [18] directly outputs polynomial coefficients for lane markings from the fully connected layers on top of the CNN backbone. Most recently, CLRNet [6] represented lane markings using vertically equally spaced 2D points; however, they were derived from lane priors. A lane prior consists of a starting point and angle, and the shape of the lane is fine-tuned using horizontal offsets. CLRNet [6] contains a cross-layer refinement structure to fully utilize high-level features to low-level features and is trained by Line intersection over union (IoU) loss to reduce the horizontal distance to the ground-truth (GT) lane.
Although various parametric representations of lane markings exist, a vertical lane is assumed. This can be an issue when entering a new lane on the left or right side. However, our polylane representation has no prior assumptions and can detect more than lane markings.
Methods should be described in sufficient detail to allow others to replicate and build on published results.
Our model was designed to overcome the limitations of parametric representation approaches with respect to the number and form of lane markings. This detector-based model can detect a significant number of lane markings and the polyline representation, referred to as a polylane, dismisses the assumption of vertical lines. The network architecture is shown in Figure 1. An RGB image served as the input to the backbone network CSPDarkNet53 in YOLOv4, and the backbone feature maps were then aggregated in the path aggregation network (PAN) [19]. The PAN structure fuses high- and low-level features using both upward and downward paths. The lane detection head is linked to the mid-level feature of the PAN and outputs the lane parameters in each grid cell. Because the lane class is closely tied to the location and lane markings are represented by relative coordinates, spatial encoding is appended to the PAN feature before head convolutions. The resolution of the output feature map is 1/16 that of the input image.
The output parameters include lane points {
In this equation, the output parameter
We modified the output parameters from YOLOv4, which implies that there are minimal constraints on the number of output objects as well as their position and direction. Furthermore, this approach can be applied to any object detection model and not just YOLOv4.
Given that four types of output parameters exist, the training loss comprises the following four components: For the lane-point regression task, we employed SmoothL1 loss [20] as follows:
The GT coordinates for lane points are derived by uniformly spacing
The total loss function is the sum of four losses with weights, as expressed by the following equation:
where the
In this study, for our free-form polylane, a novel NMS algorithm is devised PL-NMS. However, calculating the distance between complex-shaped polylines proved challenging. Fortunately, lane markings typically display simple shapes with low curvatures and are sparse and almost equally distributed. As such, simplifying polylanes into line segments does not pose any issues when eliminating overlapping polylanes.
Polylanes were densely extrapolated from the output feature map, with most filtered out using centerness thresholding. However, overlapping polylanes remain that are closely related to their position and direction. Typically, in object detection, a model produces bounding boxes and the NMS algorithm is utilized to eliminate overlapping boxes. In our case, the NMS algorithm required modifications to accommodate the polylanes. The key deviation from existing NMS lies in the method used to calculate the IoU.. CLRNet [6] introduced the Line IoU, which computes the intersection of two lines of fixed width solely on a horizontal axis. Although it is simple to compute and differentiable, it assumes vertical lines, rendering it inapplicable to horizontal lines.
The process for simplifying a polylane into a line segment is illustrated in Figure 2(a). Given
When the polylanes are simplified into line segments, the calculation of the distances between these segments becomes straightforward, as illustrated in Figure 2(b). The distance between line segments
Here,
Our model was verified using the popular lane-detection dataset CULane [10]. CULane provides camera images and lane labels in various environments, including difficult conditions, such as night, dazzles, and shadows. The dataset contained 133,235 images with an image resolution of 1640 × 590. The model was evaluated using the CULane benchmark tool.
As we started our research on the private project, we also created a custom dataset for lane detection. Although we cannot disclose the data due to security issues, the custom dataset contains approximately 20,000 frames. Roadmarks, including stop lines and lane marks, were annotated in the dataset. Our model was trained to detect lane marks and stop lines.
For training, a single Nvidia RTX 3090 was used, and the model was optimized using Adam. The learning rate was initially set to 0.001 and reduced by 1/10 every 10 epochs. Training was conducted for 40 epochs with a batch size of 2. Our model was implemented using TensorFlow 2.8.
The qualitative results derived from the CULane dataset are shown in Figure 3. The figure comprises eight cells corresponding to eight distinct scenes, with each cell featuring four images juxtaposed to facilitate a comparative study of lane markings within the same scene. The order of representation from top to bottom includes the original image, GT label image, CLR-Net results, and PolyLaneDet results. CLRNet, which is one of the latest lane detection models, was trained from scratch. Each column, moving from left to right, shows different scene categories: Normal, Shadowed, No Line, and Cross. These composite images were collated from the entire test dataset and encompassed over 23,000 frames. Observations from these images indicate that although our model occasionally overlooks a few lane markings, it performs effectively in most cases, given the availability of sufficiently discernible visual cues. Nonetheless, numerous frames were labeled without any apparent visual cues. In the shadow category, lane markings are overly faint to be discernible, even for human observers. The ‘No Line’ category lacks entirely visible lane markings; however, lane markings are instinctively labeled by human annotators. Additionally, the ‘Cross’ category solely features empty labels. The presence of numerous frames labeled without visible cues, juxtaposed with the absence of labels for the ‘Cross’ class despite clearly visible lane markings, introduces a measure of confusion. The CULane dataset was chosen for evaluation because of its popularity in lane detection and its extensive collection of images. However, the reliability of these labels remains debatable.
To evaluate the quantitative performance, our model was evaluated using the CULane benchmark, as summarized in Table 1. We only evaluated the F1 score of ‘Normal’ category, where most of the lane marks are visible, whereas a large part of other categories has invisible lane markings. The F1 score with an IoU threshold of 0.5 is referred to as F1@50. Our model performance appears to be inferior to that of state-of-the-art models, but it is essential to consider the context. Our model is not specifically optimized for detecting vertical lines. Rather, it offers extended flexibility because it does not impose restrictions on the number of detected lanes or their orientations. By contrast, the CULane benchmark restricts lane detection to a maximum of four segments per image. Other models align with this benchmark by focusing on limited starting points and predefined directions. However, the lack of constraints in our model allows it to detect more potential lane candidates, resulting in excessively higher false positives. In addition, when lane visual features are faint or ambiguous, our model may struggle to identify them effectively. The value of our approach was revealed using a custom dataset.
An ablation study was conducted to test different numbers of polylane vertices, as summarized in Table 2. It seems that more vertices help polylanes fit lane labels more closely, but the table shows the opposite results. More vertices render polylane shapes unnecessarily complex, resulting in poor performance. Because most lane markings are straight lines, a simple parameterization is more advantageous.
Our customized road image dataset encompasses approximately 20,000 annotated frames, 1,000 of which comprise a test split. The annotations include two classes: stop lines and lane markings. Image frames were gathered during the daytime, with annotations made solely on the visible portions of the lane markings and stop lines. Our model was trained to detect lane markings and stop lines. The qualitative results are shown in Figure 4. True positives (TP), represented by green lines, indicate instances in which the GT labels correspond to predictions. The red lines in the middle row represent false negatives (FN), and those in the bottom row represent false positives (FP). Despite the presence of red lines signifying FNs and FPs, many FNs corresponded closely to FPs. In line with the CULane methodology, GT labels, and predictions were matched based on the IoU of the two-lane masks. The width of the lanes was set to 10 pixels, and the IoU threshold was set to 0.5. In contrast to the CULane, the number of lanes was not restricted.
The qualitative results are presented in Table 3. The detection performance for stop lines is lower than that for lane markings because stop lines are generally located farther away and appear shorter owing to occlusion. The reason for the lower lane detection performance on the custom dataset compared with the CULane benchmark is the difference in the performance calculation method. In the CULane benchmark, lanes were drawn with a width of 30 pixels on low-resolution images with a width of 800 pixels, and the IoU was calculated accordingly. However, in the custom dataset, lanes were drawn with a width of 10 pixels on high-resolution images of 1200 pixels width, and the IoU was calculated based on these dimensions. Owing to our model’s capacity to output freeform polylines, the qualitative and quantitative results demonstrate that our model can effectively detect lane markings and horizontal stop lines.
In this study, we introduced a novel lane detector, PolyLaneDet, which is equipped with a unique parametric representation of lane markings. This novel representation, called a polylane, operates as a free-form polyline that does not constrain the
The authors declare no conflicts of interest regarding this study.
PolyLaneDet architecture. CSPDarkNet53 is used as a backbone network, and PAN is employed as a neck network. A couple of convolutions are applied to the middle level PAN feature in the head.
PolyLane NMS. (a) Polylane is represented by a polyline, and it is simplified to a line segment before the NMS process. (b) Orthogonal distances from a line endpoint to the other line is used to compute the distance between the polylanes.
Qualitative results from CULane. There are 8 grid cells, and each cell shows a pack of 4 images. They are, from the top, a source image, GT label, CLRNet result and our PolyLaneDet result
Qualitative results from the custom dataset. There are four samples, and each sample consists of three rows: (from the top) a source image, GT label, and our PolyLaneDet result.
Table 1. Quantitative results from CULane benchmark.
Method | Backbone | F1@50 |
---|---|---|
SCNN [10] | VGG16 | 90.60 |
RESA [11] | ResNet50 | 92.10 |
UFLD [15] | ResNet34 | 90.70 |
CondLane [5] | ResNet101 | 93.47 |
CLRNet [6] | DLA34 | 93.73 |
PolyLaneDet | CSPDarkNet53 | 84.29 |
The metric is F1@50 of the Normal category in the CULane benchmark..
Table 2. F1@50 metrics of Normal category with the different numbers of vertices of polylanes.
Parameter | F1@50 |
---|---|
84.29 | |
61.54 | |
53.43 |
Table 3. Quantitative results (%) from the custom road dataset.
Class | Recall | Precision | F1@50 |
---|---|---|---|
Lane | 68.78 | 68.94 | 68.86 |
Stop line | 62.90 | 45.24 | 52.63 |
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(2): 105-113
Published online June 25, 2024 https://doi.org/10.5391/IJFIS.2024.24.2.105
Copyright © The Korean Institute of Intelligent Systems.
Jeongmin Kim1 and Hyukdoo Choi2
1AI Technology Development Team 2, Autonomous A2Z, Anyang, Korea
2Department of Electronic Materials, Devices, and Equipment Engineering, Soonchunhyang University, Asan, Korea
Correspondence to:Hyukdoo Choi (hyukdoo.choi@sch.ac.kr)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Lane detection is a critical component of autonomous driving technologies that face challenges such as varied road conditions and diverse lane orientations. In this study, we aim to address these challenges by proposing PolyLaneDet, a novel lane detection model that utilizes a freeform polyline, termed ‘polylane,’ which adapts to both vertical and horizontal lane orientations without the need for post-processing. Our method builds on the YOLOv4 architecture to avoid restricting the number of detectable lanes. This model can regress both vertical and horizontal coordinates, thereby improving the adaptability and accuracy of lane detection in various scenarios. We conducted extensive experiments using the CULane benchmark and a custom dataset to validate the effectiveness of the proposed approach. The results demonstrate that PolyLaneDet achieves a competitive performance, particularly in detecting horizontal lane markings and stop lines, which are often omitted in traditional models. In conclusion, PolyLaneDet advances lane detection technology by combining flexible lane representation with robust detection capabilities, making it suitable for real-world applications with diverse road geometries.
Keywords: Lane detection, CNN, Deep learning
Lane detection is a fundamental component of advanced driver assistance systems (ADAS) and self-driving systems. Given its importance, numerous studies have been conducted in the computer vision community to enhance lane detection [1, 2]. Because lane markings are thin lines, cameras are the only sensors that can reliably recognize lanes, making convolutional neural networks (CNNs) the foundational framework for lane detection. However, lane detectors are vulnerable to camera limitations including severe occlusions, lighting conditions, weather conditions, and blurry markings.
Existing studies can be categorized based on lane representation: segmentation and parametric representations. The segmentation representation produced by semantic segmentation models [3, 4] is a pixel-wise classification map. Although it is easy to train and does not limit the number or shape of lane markings, it is user-friendly for humans but not for machines. An additional post-processing step is required to extract a parametric representation to make the segmentation map usable for autonomous driving systems. Although this approach may lead to high-performance benchmarks, its practical application is challenging. In contrast, parametric representation defines specific lane-marking instances with parameters. This is straightforward to use but is limited to a certain number and shape of lane markings. Most parametric representations assume that a vehicle is driving in a long, regular lane, without considering lane changes or crossroads. Although such situations are infrequent in benchmarks, they often occur and are critical to self-driving vehicles.
In this study, we present a form-free representation of lane markings designed to reliably detect lane markings in various practical cases. Unlike previous works [5, 6], our representation takes the form of a polyline but does not fix the y-axis coordinates on the images. The use of fixed y-coordinates assumes only vertical lane markings, making them unsuitable for handling horizontal lane markings. Our lane detector can be applied to stop lines or crosswalks that are more appropriately represented by lines than by bounding boxes. Furthermore, our model, which is based on the object detector YOLOv4 [7], can detect all lane markings in an image, unlike existing parametric representation models that are designed to output up to four lanes [8, 9].
The main contributions of this paper are summarized as follows: First, we propose a novel representation of lane markings, termed a polylane, which is a free-form polyline that has not previously been used for lane detection and does not require any post-processing. Second, our model is based on the widely used YOLOv4, which has a relatively simple structure compared with existing models [10, 11]. Third, we develop a new nonmaximal suppression (NMS) algorithm specifically for lane marking. Because our model densely predicts lane markings from the output feature map, an NMS algorithm is necessary. Our PolyLane NMS (PL-NMS) algorithm effectively filters out overlapping predictions. Finally, our algorithms were validated using a custom road image dataset and the widely recognized CULane dataset.
Because lane markings are detected from camera images, most lane detection models are based on CNN. Lane detectors are categorized based on their output representations. Lane markings are typically represented using pixel-wise segmentation maps or instance-wise parameters. Existing studies were reviewed based on these criteria.
From early studies on lane detection based on deep learning, semantic segmentation has become a popular approach to lane detection tasks [8–14]. The segmentation map consists of (C+1) channels that predict the probability of each pixel belonging to one of the lane or background classes. Pan et al. [10] proposed a spatial CNN for propagating local information globally. It treats the row or column slices of a feature map as internal layers and performs convolutions through the slices in four directions. As a result, each output pixel secures a global field of view; however, this slows down through many slice operations. Recurrent feature-shift aggregator (RESA) [11] developed this idea more efficiently. They proposed RESA module to remove sequential operations within a layer. It parallelizes row-wise or column-wise convolutions and iterates layers with different slice strides.
However, there are different approaches for outputting a lane segmentation map. Using prior knowledge that lane markings are long thin vertical lines, they classified lane pixels over same-row pixels rather than traditional channel-wise classification. Ultra-fast lane detection (UFLD) [15] invented this approach to achieve high-speed segmentation. It formulates a lane-detection task as a column selection problem in each row of a low-resolution feature map. In segmentation-based approaches, segmentation map outputs are easy to interpret and visualize; however, heuristic post-processing must recognize connected lane markings.
Unlike in segmentation representations, there are diverse representations of lane markings in parametric representations. LineCNN [16] is an anchor-based lane detector. Lane markings were detected from the border grid cells of the feature map, and multiple line proposals based on predefined angles were presented for each grid cell. The line with the highest score was selected, and its detailed shape was tuned using the horizontal offset parameters of the selected line. LaneATT [17] developed an anchor-based lane detector using an attentional mechanism. It extracts feature vectors along predefined anchor lines on a feature map and fuses them with attention weights to obtain global features. The global feature helps to detect occluded lane markings. These anchor-based models are vulnerable to occlusions occurring at the edges of images because they detect lane markings starting from the image borders. In addition, the use of many anchor lines leads to computational inefficiency.
CondLaneNet [5] inherits a row-wise segmentation map from UFLD; however, lane markings are located using the weighted mean of grid cell locations rather than classification. Lane instances are detected at the starting points of the lane markings at the bottom of the image, as in LineCNN [16]. They proposed a recurrent instance module (RIM) to handle complex shapes such as forked lanes; however, utilizing long short-term memory (LSTM)-based dynamic kernels overly complicates the model structure. PolyLaneNet [18] directly outputs polynomial coefficients for lane markings from the fully connected layers on top of the CNN backbone. Most recently, CLRNet [6] represented lane markings using vertically equally spaced 2D points; however, they were derived from lane priors. A lane prior consists of a starting point and angle, and the shape of the lane is fine-tuned using horizontal offsets. CLRNet [6] contains a cross-layer refinement structure to fully utilize high-level features to low-level features and is trained by Line intersection over union (IoU) loss to reduce the horizontal distance to the ground-truth (GT) lane.
Although various parametric representations of lane markings exist, a vertical lane is assumed. This can be an issue when entering a new lane on the left or right side. However, our polylane representation has no prior assumptions and can detect more than lane markings.
Methods should be described in sufficient detail to allow others to replicate and build on published results.
Our model was designed to overcome the limitations of parametric representation approaches with respect to the number and form of lane markings. This detector-based model can detect a significant number of lane markings and the polyline representation, referred to as a polylane, dismisses the assumption of vertical lines. The network architecture is shown in Figure 1. An RGB image served as the input to the backbone network CSPDarkNet53 in YOLOv4, and the backbone feature maps were then aggregated in the path aggregation network (PAN) [19]. The PAN structure fuses high- and low-level features using both upward and downward paths. The lane detection head is linked to the mid-level feature of the PAN and outputs the lane parameters in each grid cell. Because the lane class is closely tied to the location and lane markings are represented by relative coordinates, spatial encoding is appended to the PAN feature before head convolutions. The resolution of the output feature map is 1/16 that of the input image.
The output parameters include lane points {
In this equation, the output parameter
We modified the output parameters from YOLOv4, which implies that there are minimal constraints on the number of output objects as well as their position and direction. Furthermore, this approach can be applied to any object detection model and not just YOLOv4.
Given that four types of output parameters exist, the training loss comprises the following four components: For the lane-point regression task, we employed SmoothL1 loss [20] as follows:
The GT coordinates for lane points are derived by uniformly spacing
The total loss function is the sum of four losses with weights, as expressed by the following equation:
where the
In this study, for our free-form polylane, a novel NMS algorithm is devised PL-NMS. However, calculating the distance between complex-shaped polylines proved challenging. Fortunately, lane markings typically display simple shapes with low curvatures and are sparse and almost equally distributed. As such, simplifying polylanes into line segments does not pose any issues when eliminating overlapping polylanes.
Polylanes were densely extrapolated from the output feature map, with most filtered out using centerness thresholding. However, overlapping polylanes remain that are closely related to their position and direction. Typically, in object detection, a model produces bounding boxes and the NMS algorithm is utilized to eliminate overlapping boxes. In our case, the NMS algorithm required modifications to accommodate the polylanes. The key deviation from existing NMS lies in the method used to calculate the IoU.. CLRNet [6] introduced the Line IoU, which computes the intersection of two lines of fixed width solely on a horizontal axis. Although it is simple to compute and differentiable, it assumes vertical lines, rendering it inapplicable to horizontal lines.
The process for simplifying a polylane into a line segment is illustrated in Figure 2(a). Given
When the polylanes are simplified into line segments, the calculation of the distances between these segments becomes straightforward, as illustrated in Figure 2(b). The distance between line segments
Here,
Our model was verified using the popular lane-detection dataset CULane [10]. CULane provides camera images and lane labels in various environments, including difficult conditions, such as night, dazzles, and shadows. The dataset contained 133,235 images with an image resolution of 1640 × 590. The model was evaluated using the CULane benchmark tool.
As we started our research on the private project, we also created a custom dataset for lane detection. Although we cannot disclose the data due to security issues, the custom dataset contains approximately 20,000 frames. Roadmarks, including stop lines and lane marks, were annotated in the dataset. Our model was trained to detect lane marks and stop lines.
For training, a single Nvidia RTX 3090 was used, and the model was optimized using Adam. The learning rate was initially set to 0.001 and reduced by 1/10 every 10 epochs. Training was conducted for 40 epochs with a batch size of 2. Our model was implemented using TensorFlow 2.8.
The qualitative results derived from the CULane dataset are shown in Figure 3. The figure comprises eight cells corresponding to eight distinct scenes, with each cell featuring four images juxtaposed to facilitate a comparative study of lane markings within the same scene. The order of representation from top to bottom includes the original image, GT label image, CLR-Net results, and PolyLaneDet results. CLRNet, which is one of the latest lane detection models, was trained from scratch. Each column, moving from left to right, shows different scene categories: Normal, Shadowed, No Line, and Cross. These composite images were collated from the entire test dataset and encompassed over 23,000 frames. Observations from these images indicate that although our model occasionally overlooks a few lane markings, it performs effectively in most cases, given the availability of sufficiently discernible visual cues. Nonetheless, numerous frames were labeled without any apparent visual cues. In the shadow category, lane markings are overly faint to be discernible, even for human observers. The ‘No Line’ category lacks entirely visible lane markings; however, lane markings are instinctively labeled by human annotators. Additionally, the ‘Cross’ category solely features empty labels. The presence of numerous frames labeled without visible cues, juxtaposed with the absence of labels for the ‘Cross’ class despite clearly visible lane markings, introduces a measure of confusion. The CULane dataset was chosen for evaluation because of its popularity in lane detection and its extensive collection of images. However, the reliability of these labels remains debatable.
To evaluate the quantitative performance, our model was evaluated using the CULane benchmark, as summarized in Table 1. We only evaluated the F1 score of ‘Normal’ category, where most of the lane marks are visible, whereas a large part of other categories has invisible lane markings. The F1 score with an IoU threshold of 0.5 is referred to as F1@50. Our model performance appears to be inferior to that of state-of-the-art models, but it is essential to consider the context. Our model is not specifically optimized for detecting vertical lines. Rather, it offers extended flexibility because it does not impose restrictions on the number of detected lanes or their orientations. By contrast, the CULane benchmark restricts lane detection to a maximum of four segments per image. Other models align with this benchmark by focusing on limited starting points and predefined directions. However, the lack of constraints in our model allows it to detect more potential lane candidates, resulting in excessively higher false positives. In addition, when lane visual features are faint or ambiguous, our model may struggle to identify them effectively. The value of our approach was revealed using a custom dataset.
An ablation study was conducted to test different numbers of polylane vertices, as summarized in Table 2. It seems that more vertices help polylanes fit lane labels more closely, but the table shows the opposite results. More vertices render polylane shapes unnecessarily complex, resulting in poor performance. Because most lane markings are straight lines, a simple parameterization is more advantageous.
Our customized road image dataset encompasses approximately 20,000 annotated frames, 1,000 of which comprise a test split. The annotations include two classes: stop lines and lane markings. Image frames were gathered during the daytime, with annotations made solely on the visible portions of the lane markings and stop lines. Our model was trained to detect lane markings and stop lines. The qualitative results are shown in Figure 4. True positives (TP), represented by green lines, indicate instances in which the GT labels correspond to predictions. The red lines in the middle row represent false negatives (FN), and those in the bottom row represent false positives (FP). Despite the presence of red lines signifying FNs and FPs, many FNs corresponded closely to FPs. In line with the CULane methodology, GT labels, and predictions were matched based on the IoU of the two-lane masks. The width of the lanes was set to 10 pixels, and the IoU threshold was set to 0.5. In contrast to the CULane, the number of lanes was not restricted.
The qualitative results are presented in Table 3. The detection performance for stop lines is lower than that for lane markings because stop lines are generally located farther away and appear shorter owing to occlusion. The reason for the lower lane detection performance on the custom dataset compared with the CULane benchmark is the difference in the performance calculation method. In the CULane benchmark, lanes were drawn with a width of 30 pixels on low-resolution images with a width of 800 pixels, and the IoU was calculated accordingly. However, in the custom dataset, lanes were drawn with a width of 10 pixels on high-resolution images of 1200 pixels width, and the IoU was calculated based on these dimensions. Owing to our model’s capacity to output freeform polylines, the qualitative and quantitative results demonstrate that our model can effectively detect lane markings and horizontal stop lines.
In this study, we introduced a novel lane detector, PolyLaneDet, which is equipped with a unique parametric representation of lane markings. This novel representation, called a polylane, operates as a free-form polyline that does not constrain the
PolyLaneDet architecture. CSPDarkNet53 is used as a backbone network, and PAN is employed as a neck network. A couple of convolutions are applied to the middle level PAN feature in the head.
PolyLane NMS. (a) Polylane is represented by a polyline, and it is simplified to a line segment before the NMS process. (b) Orthogonal distances from a line endpoint to the other line is used to compute the distance between the polylanes.
Qualitative results from CULane. There are 8 grid cells, and each cell shows a pack of 4 images. They are, from the top, a source image, GT label, CLRNet result and our PolyLaneDet result
Qualitative results from the custom dataset. There are four samples, and each sample consists of three rows: (from the top) a source image, GT label, and our PolyLaneDet result.
Table 2 . F1@50 metrics of Normal category with the different numbers of vertices of polylanes.
Parameter | F1@50 |
---|---|
84.29 | |
61.54 | |
53.43 |
Table 3 . Quantitative results (%) from the custom road dataset.
Class | Recall | Precision | F1@50 |
---|---|---|---|
Lane | 68.78 | 68.94 | 68.86 |
Stop line | 62.90 | 45.24 | 52.63 |
Gayoung Kim
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(3): 287-294 https://doi.org/10.5391/IJFIS.2024.24.3.287Xinzhi Hu, Wang-Su Jeon, Grezgorz Cielniak, and Sang-Yong Rhee
International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(1): 1-9 https://doi.org/10.5391/IJFIS.2024.24.1.1Igor V. Arinichev, Sergey V. Polyanskikh, Irina V. Arinicheva, Galina V. Volkova, and Irina P. Matveeva
International Journal of Fuzzy Logic and Intelligent Systems 2022; 22(1): 106-115 https://doi.org/10.5391/IJFIS.2022.22.1.106PolyLaneDet architecture. CSPDarkNet53 is used as a backbone network, and PAN is employed as a neck network. A couple of convolutions are applied to the middle level PAN feature in the head.
|@|~(^,^)~|@|PolyLane NMS. (a) Polylane is represented by a polyline, and it is simplified to a line segment before the NMS process. (b) Orthogonal distances from a line endpoint to the other line is used to compute the distance between the polylanes.
|@|~(^,^)~|@|Qualitative results from CULane. There are 8 grid cells, and each cell shows a pack of 4 images. They are, from the top, a source image, GT label, CLRNet result and our PolyLaneDet result
|@|~(^,^)~|@|Qualitative results from the custom dataset. There are four samples, and each sample consists of three rows: (from the top) a source image, GT label, and our PolyLaneDet result.