Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(2): 105-113

Published online June 25, 2024

https://doi.org/10.5391/IJFIS.2024.24.2.105

© The Korean Institute of Intelligent Systems

PolyLaneDet: Lane Detection with Free-Form Polyline

Jeongmin Kim1 and Hyukdoo Choi2

1AI Technology Development Team 2, Autonomous A2Z, Anyang, Korea
2Department of Electronic Materials, Devices, and Equipment Engineering, Soonchunhyang University, Asan, Korea

Correspondence to :
Hyukdoo Choi (hyukdoo.choi@sch.ac.kr)

Received: August 1, 2023; Revised: May 4, 2024; Accepted: May 27, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Lane detection is a critical component of autonomous driving technologies that face challenges such as varied road conditions and diverse lane orientations. In this study, we aim to address these challenges by proposing PolyLaneDet, a novel lane detection model that utilizes a freeform polyline, termed ‘polylane,’ which adapts to both vertical and horizontal lane orientations without the need for post-processing. Our method builds on the YOLOv4 architecture to avoid restricting the number of detectable lanes. This model can regress both vertical and horizontal coordinates, thereby improving the adaptability and accuracy of lane detection in various scenarios. We conducted extensive experiments using the CULane benchmark and a custom dataset to validate the effectiveness of the proposed approach. The results demonstrate that PolyLaneDet achieves a competitive performance, particularly in detecting horizontal lane markings and stop lines, which are often omitted in traditional models. In conclusion, PolyLaneDet advances lane detection technology by combining flexible lane representation with robust detection capabilities, making it suitable for real-world applications with diverse road geometries.

Keywords: Lane detection, CNN, Deep learning

Lane detection is a fundamental component of advanced driver assistance systems (ADAS) and self-driving systems. Given its importance, numerous studies have been conducted in the computer vision community to enhance lane detection [1, 2]. Because lane markings are thin lines, cameras are the only sensors that can reliably recognize lanes, making convolutional neural networks (CNNs) the foundational framework for lane detection. However, lane detectors are vulnerable to camera limitations including severe occlusions, lighting conditions, weather conditions, and blurry markings.

Existing studies can be categorized based on lane representation: segmentation and parametric representations. The segmentation representation produced by semantic segmentation models [3, 4] is a pixel-wise classification map. Although it is easy to train and does not limit the number or shape of lane markings, it is user-friendly for humans but not for machines. An additional post-processing step is required to extract a parametric representation to make the segmentation map usable for autonomous driving systems. Although this approach may lead to high-performance benchmarks, its practical application is challenging. In contrast, parametric representation defines specific lane-marking instances with parameters. This is straightforward to use but is limited to a certain number and shape of lane markings. Most parametric representations assume that a vehicle is driving in a long, regular lane, without considering lane changes or crossroads. Although such situations are infrequent in benchmarks, they often occur and are critical to self-driving vehicles.

In this study, we present a form-free representation of lane markings designed to reliably detect lane markings in various practical cases. Unlike previous works [5, 6], our representation takes the form of a polyline but does not fix the y-axis coordinates on the images. The use of fixed y-coordinates assumes only vertical lane markings, making them unsuitable for handling horizontal lane markings. Our lane detector can be applied to stop lines or crosswalks that are more appropriately represented by lines than by bounding boxes. Furthermore, our model, which is based on the object detector YOLOv4 [7], can detect all lane markings in an image, unlike existing parametric representation models that are designed to output up to four lanes [8, 9].

The main contributions of this paper are summarized as follows: First, we propose a novel representation of lane markings, termed a polylane, which is a free-form polyline that has not previously been used for lane detection and does not require any post-processing. Second, our model is based on the widely used YOLOv4, which has a relatively simple structure compared with existing models [10, 11]. Third, we develop a new nonmaximal suppression (NMS) algorithm specifically for lane marking. Because our model densely predicts lane markings from the output feature map, an NMS algorithm is necessary. Our PolyLane NMS (PL-NMS) algorithm effectively filters out overlapping predictions. Finally, our algorithms were validated using a custom road image dataset and the widely recognized CULane dataset.

Because lane markings are detected from camera images, most lane detection models are based on CNN. Lane detectors are categorized based on their output representations. Lane markings are typically represented using pixel-wise segmentation maps or instance-wise parameters. Existing studies were reviewed based on these criteria.

2.1 Segmentation Representation

From early studies on lane detection based on deep learning, semantic segmentation has become a popular approach to lane detection tasks [814]. The segmentation map consists of (C+1) channels that predict the probability of each pixel belonging to one of the lane or background classes. Pan et al. [10] proposed a spatial CNN for propagating local information globally. It treats the row or column slices of a feature map as internal layers and performs convolutions through the slices in four directions. As a result, each output pixel secures a global field of view; however, this slows down through many slice operations. Recurrent feature-shift aggregator (RESA) [11] developed this idea more efficiently. They proposed RESA module to remove sequential operations within a layer. It parallelizes row-wise or column-wise convolutions and iterates layers with different slice strides.

However, there are different approaches for outputting a lane segmentation map. Using prior knowledge that lane markings are long thin vertical lines, they classified lane pixels over same-row pixels rather than traditional channel-wise classification. Ultra-fast lane detection (UFLD) [15] invented this approach to achieve high-speed segmentation. It formulates a lane-detection task as a column selection problem in each row of a low-resolution feature map. In segmentation-based approaches, segmentation map outputs are easy to interpret and visualize; however, heuristic post-processing must recognize connected lane markings.

2.2 Parametric Representation

Unlike in segmentation representations, there are diverse representations of lane markings in parametric representations. LineCNN [16] is an anchor-based lane detector. Lane markings were detected from the border grid cells of the feature map, and multiple line proposals based on predefined angles were presented for each grid cell. The line with the highest score was selected, and its detailed shape was tuned using the horizontal offset parameters of the selected line. LaneATT [17] developed an anchor-based lane detector using an attentional mechanism. It extracts feature vectors along predefined anchor lines on a feature map and fuses them with attention weights to obtain global features. The global feature helps to detect occluded lane markings. These anchor-based models are vulnerable to occlusions occurring at the edges of images because they detect lane markings starting from the image borders. In addition, the use of many anchor lines leads to computational inefficiency.

CondLaneNet [5] inherits a row-wise segmentation map from UFLD; however, lane markings are located using the weighted mean of grid cell locations rather than classification. Lane instances are detected at the starting points of the lane markings at the bottom of the image, as in LineCNN [16]. They proposed a recurrent instance module (RIM) to handle complex shapes such as forked lanes; however, utilizing long short-term memory (LSTM)-based dynamic kernels overly complicates the model structure. PolyLaneNet [18] directly outputs polynomial coefficients for lane markings from the fully connected layers on top of the CNN backbone. Most recently, CLRNet [6] represented lane markings using vertically equally spaced 2D points; however, they were derived from lane priors. A lane prior consists of a starting point and angle, and the shape of the lane is fine-tuned using horizontal offsets. CLRNet [6] contains a cross-layer refinement structure to fully utilize high-level features to low-level features and is trained by Line intersection over union (IoU) loss to reduce the horizontal distance to the ground-truth (GT) lane.

Although various parametric representations of lane markings exist, a vertical lane is assumed. This can be an issue when entering a new lane on the left or right side. However, our polylane representation has no prior assumptions and can detect more than lane markings.

Methods should be described in sufficient detail to allow others to replicate and build on published results.

3.1 Network Architecture

Our model was designed to overcome the limitations of parametric representation approaches with respect to the number and form of lane markings. This detector-based model can detect a significant number of lane markings and the polyline representation, referred to as a polylane, dismisses the assumption of vertical lines. The network architecture is shown in Figure 1. An RGB image served as the input to the backbone network CSPDarkNet53 in YOLOv4, and the backbone feature maps were then aggregated in the path aggregation network (PAN) [19]. The PAN structure fuses high- and low-level features using both upward and downward paths. The lane detection head is linked to the mid-level feature of the PAN and outputs the lane parameters in each grid cell. Because the lane class is closely tied to the location and lane markings are represented by relative coordinates, spatial encoding is appended to the PAN feature before head convolutions. The resolution of the output feature map is 1/16 that of the input image.

The output parameters include lane points {xi, yi}i=1,...,N , centerness c, laneness , and class probabilities {pi}i=1,...,C . Lane points correspond to the coordinates of the polylane vertices. Centerness denotes the probability of the center point of the lane marking residing within a specific grid cell, whereas laneness implies the likelihood of a polyline crossing the grid cell. The output parameters are derived from a raw feature map. As the values resulting from the convolution operation on this raw feature map are unbounded, the employment of suitable activation functions is necessary. The activation functions allocated to each type of parameter are defined as follows:

(xi,yi)=2(σ((xi,yi))-0.5)+(gx,gy),c=σ(c),=σ(l),pi=σ(pi).

In this equation, the output parameter x is derived from the corresponding raw feature x′ . (gx, gy) are the coordinates of the upper left point of the grid cell within the range of [0, 1], and σ() denotes a sigmoid function. Points (xi, yi) can cover any position in an image.

We modified the output parameters from YOLOv4, which implies that there are minimal constraints on the number of output objects as well as their position and direction. Furthermore, this approach can be applied to any object detection model and not just YOLOv4.

3.2 Loss

Given that four types of output parameters exist, the training loss comprises the following four components: For the lane-point regression task, we employed SmoothL1 loss [20] as follows:

point=i=1Nx{x,y}SmoothL1(x^i,xi).

The GT coordinates for lane points are derived by uniformly spacing N points along a labeled lane mark. The parameters for both centerness and laneness pertaining to binary classification tasks undergo training via focal loss [21], which emphasizes binary cross-entropy in areas of high loss. The class probabilities were trained according to the sum of the binary cross-entropies, adhering to the method implemented in YOLOv3. The definitions of classification losses are given in Eqs. (6), (7), and (8).

center=(c^-c)2{c^log c+(1-c^)log(1-c)},lane=(^-)2{^log l+(1-^)log(1-)},class=i=1Cp^ilog (pi)+(1-p^i)]log(1-pi).

The total loss function is the sum of four losses with weights, as expressed by the following equation:

total=λpointpoint+λcentercenter+λlanelane+λclassclass,

where the λ parameters are the weights of the losses that follow.

In this study, for our free-form polylane, a novel NMS algorithm is devised PL-NMS. However, calculating the distance between complex-shaped polylines proved challenging. Fortunately, lane markings typically display simple shapes with low curvatures and are sparse and almost equally distributed. As such, simplifying polylanes into line segments does not pose any issues when eliminating overlapping polylanes.

3.3 PolyLane NMS

Polylanes were densely extrapolated from the output feature map, with most filtered out using centerness thresholding. However, overlapping polylanes remain that are closely related to their position and direction. Typically, in object detection, a model produces bounding boxes and the NMS algorithm is utilized to eliminate overlapping boxes. In our case, the NMS algorithm required modifications to accommodate the polylanes. The key deviation from existing NMS lies in the method used to calculate the IoU.. CLRNet [6] introduced the Line IoU, which computes the intersection of two lines of fixed width solely on a horizontal axis. Although it is simple to compute and differentiable, it assumes vertical lines, rendering it inapplicable to horizontal lines.

The process for simplifying a polylane into a line segment is illustrated in Figure 2(a). Given N points on a polylane, an optimal line equation is fitted in the form of ax + by = 1, with the intention of minimizing the mean distance to these points. The point projected onto the line from the first point (x1, y1), is designated as the starting point, (χ̄s, s) whereas the end point (χ̄e, e) is the point projected from the final point (xN, yN).

When the polylanes are simplified into line segments, the calculation of the distances between these segments becomes straightforward, as illustrated in Figure 2(b). The distance between line segments i and k is formulated as follows:

di,k=dk,i=min{max(dsi,k,dei,k),max(dsk,i,dek,i)}.

Here, dsi,k signifies the orthogonal distance from the starting point of line i to line k and dei,k represents the orthogonal distance from the end point of line i to line k. max(dsi,k, dei,k) indicates the maximum distance from line i to line k, and the minimum operation is used to eliminate short line segments. In PL-NMS, it is not necessary to compute the IoU between polylanes. Rather, the distance between the polylanes is considered, which inversely corresponds to the IoU of the bounding boxes. Polylanes were deemed to overlap when the distance between them was minimal. In cases of overlap, the polylane with the higher centerness was retained, while the other was eliminated.

4.1 Datasets and Training

Our model was verified using the popular lane-detection dataset CULane [10]. CULane provides camera images and lane labels in various environments, including difficult conditions, such as night, dazzles, and shadows. The dataset contained 133,235 images with an image resolution of 1640 × 590. The model was evaluated using the CULane benchmark tool.

As we started our research on the private project, we also created a custom dataset for lane detection. Although we cannot disclose the data due to security issues, the custom dataset contains approximately 20,000 frames. Roadmarks, including stop lines and lane marks, were annotated in the dataset. Our model was trained to detect lane marks and stop lines.

For training, a single Nvidia RTX 3090 was used, and the model was optimized using Adam. The learning rate was initially set to 0.001 and reduced by 1/10 every 10 epochs. Training was conducted for 40 epochs with a batch size of 2. Our model was implemented using TensorFlow 2.8.

4.2 CULane Results

The qualitative results derived from the CULane dataset are shown in Figure 3. The figure comprises eight cells corresponding to eight distinct scenes, with each cell featuring four images juxtaposed to facilitate a comparative study of lane markings within the same scene. The order of representation from top to bottom includes the original image, GT label image, CLR-Net results, and PolyLaneDet results. CLRNet, which is one of the latest lane detection models, was trained from scratch. Each column, moving from left to right, shows different scene categories: Normal, Shadowed, No Line, and Cross. These composite images were collated from the entire test dataset and encompassed over 23,000 frames. Observations from these images indicate that although our model occasionally overlooks a few lane markings, it performs effectively in most cases, given the availability of sufficiently discernible visual cues. Nonetheless, numerous frames were labeled without any apparent visual cues. In the shadow category, lane markings are overly faint to be discernible, even for human observers. The ‘No Line’ category lacks entirely visible lane markings; however, lane markings are instinctively labeled by human annotators. Additionally, the ‘Cross’ category solely features empty labels. The presence of numerous frames labeled without visible cues, juxtaposed with the absence of labels for the ‘Cross’ class despite clearly visible lane markings, introduces a measure of confusion. The CULane dataset was chosen for evaluation because of its popularity in lane detection and its extensive collection of images. However, the reliability of these labels remains debatable.

To evaluate the quantitative performance, our model was evaluated using the CULane benchmark, as summarized in Table 1. We only evaluated the F1 score of ‘Normal’ category, where most of the lane marks are visible, whereas a large part of other categories has invisible lane markings. The F1 score with an IoU threshold of 0.5 is referred to as F1@50. Our model performance appears to be inferior to that of state-of-the-art models, but it is essential to consider the context. Our model is not specifically optimized for detecting vertical lines. Rather, it offers extended flexibility because it does not impose restrictions on the number of detected lanes or their orientations. By contrast, the CULane benchmark restricts lane detection to a maximum of four segments per image. Other models align with this benchmark by focusing on limited starting points and predefined directions. However, the lack of constraints in our model allows it to detect more potential lane candidates, resulting in excessively higher false positives. In addition, when lane visual features are faint or ambiguous, our model may struggle to identify them effectively. The value of our approach was revealed using a custom dataset.

An ablation study was conducted to test different numbers of polylane vertices, as summarized in Table 2. It seems that more vertices help polylanes fit lane labels more closely, but the table shows the opposite results. More vertices render polylane shapes unnecessarily complex, resulting in poor performance. Because most lane markings are straight lines, a simple parameterization is more advantageous.

4.3 Custom Dataset Results

Our customized road image dataset encompasses approximately 20,000 annotated frames, 1,000 of which comprise a test split. The annotations include two classes: stop lines and lane markings. Image frames were gathered during the daytime, with annotations made solely on the visible portions of the lane markings and stop lines. Our model was trained to detect lane markings and stop lines. The qualitative results are shown in Figure 4. True positives (TP), represented by green lines, indicate instances in which the GT labels correspond to predictions. The red lines in the middle row represent false negatives (FN), and those in the bottom row represent false positives (FP). Despite the presence of red lines signifying FNs and FPs, many FNs corresponded closely to FPs. In line with the CULane methodology, GT labels, and predictions were matched based on the IoU of the two-lane masks. The width of the lanes was set to 10 pixels, and the IoU threshold was set to 0.5. In contrast to the CULane, the number of lanes was not restricted.

The qualitative results are presented in Table 3. The detection performance for stop lines is lower than that for lane markings because stop lines are generally located farther away and appear shorter owing to occlusion. The reason for the lower lane detection performance on the custom dataset compared with the CULane benchmark is the difference in the performance calculation method. In the CULane benchmark, lanes were drawn with a width of 30 pixels on low-resolution images with a width of 800 pixels, and the IoU was calculated accordingly. However, in the custom dataset, lanes were drawn with a width of 10 pixels on high-resolution images of 1200 pixels width, and the IoU was calculated based on these dimensions. Owing to our model’s capacity to output freeform polylines, the qualitative and quantitative results demonstrate that our model can effectively detect lane markings and horizontal stop lines.

In this study, we introduced a novel lane detector, PolyLaneDet, which is equipped with a unique parametric representation of lane markings. This novel representation, called a polylane, operates as a free-form polyline that does not constrain the y-axis coordinates. Because the polylane representation dispenses with the vertical lane assumption, it effectively handles intersections and lane changes. To avoid limiting the number of detected lane markings, we employed YOLOv4 architecture. Lane markings were detected at the center points and a new algorithm, PL-NMS, was proposed to eliminate overlaps between the detected lane markings. Our model was evaluated using both the CULane benchmark and our custom road image dataset and demonstrated commendable results. Notably, our model can be trained to detect horizontal stop lines, a feature that differentiates it from existing models, which generally assume nearly vertical lines. In the future, we aim to enhance the lane detection performance using public and more reliable datasets while retaining the current advantages.

Fig. 1.

PolyLaneDet architecture. CSPDarkNet53 is used as a backbone network, and PAN is employed as a neck network. A couple of convolutions are applied to the middle level PAN feature in the head.


Fig. 2.

PolyLane NMS. (a) Polylane is represented by a polyline, and it is simplified to a line segment before the NMS process. (b) Orthogonal distances from a line endpoint to the other line is used to compute the distance between the polylanes.


Fig. 3.

Qualitative results from CULane. There are 8 grid cells, and each cell shows a pack of 4 images. They are, from the top, a source image, GT label, CLRNet result and our PolyLaneDet result


Fig. 4.

Qualitative results from the custom dataset. There are four samples, and each sample consists of three rows: (from the top) a source image, GT label, and our PolyLaneDet result.


Table. 1.

Table 1. Quantitative results from CULane benchmark.

MethodBackboneF1@50
SCNN [10]VGG1690.60
RESA [11]ResNet5092.10
UFLD [15]ResNet3490.70
CondLane [5]ResNet10193.47
CLRNet [6]DLA3493.73
PolyLaneDetCSPDarkNet5384.29

The metric is F1@50 of the Normal category in the CULane benchmark..


Table. 2.

Table 2. F1@50 metrics of Normal category with the different numbers of vertices of polylanes.

ParameterF1@50
n = 584.29
n = 1061.54
n = 2053.43

Table. 3.

Table 3. Quantitative results (%) from the custom road dataset.

ClassRecallPrecisionF1@50
Lane68.7868.9468.86
Stop line62.9045.2452.63

  1. Tan, H, Zhou, Y, Zhu, Y, Yao, D, and Li, K . A novel curve lane detection based on Improved River Flow and RANSA., Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems, 2014, Qingdao, China, Array, pp.133-138. https://doi.org/10.1109/ITSC.2014.6957679
    CrossRef
  2. Liu, G, Worgotter, F, and Markelic, I . Combining statistical Hough transform and particle filter for robust lane detection and tracking., Proceedings of 2010 IEEE Intelligent Vehicles Symposium, 2010, La Jolla, CA, USA, Array, pp.993-997. https://doi.org/10.1109/IVS.2010.5548021
    CrossRef
  3. Long, J, Shelhamer, E, and Darrell, T . Fully convolutional networks for semantic segmentation., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Boston, MA, USA, Array, pp.3431-3440. https://doi.org/10.1109/CVPR.2015.7298965
    CrossRef
  4. Ronneberger, O, Fischer, P, and Brox, T. (2015) . U-Net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer Assisted Intervention – MICCAI 2015, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
    CrossRef
  5. Liu, L, Chen, X, Zhu, S, and Tan, P. CondLaneNet: a top-to-down lane detection framework based on conditional convolution., Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021, pp. 3753-3762. https://doi.ieeecomputersociety.org/10.1109/ICCV48922.2021.00375
    CrossRef
  6. Zheng, T, Huang, Y, Liu, Y, Tang, W, Yang, Z, Cai, D, and He, X . CLRNet: cross layer refinement network for lane detection., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, New Orleans, LA, USA, Array, pp.898-907. https://doi.org/10.1109/CVPR52688.2022.00097
    CrossRef
  7. Bochkovskiy, A, Wang, CY, and Liao, HYM. (2020) . YOLOv4: optimal speed and accuracy of object detection. Available: https://arxiv.org/abs/2004.10934
  8. Yoo, S, Lee, HS, Myeong, H, Yun, S, Park, H, Cho, J, and Kim, DH . End-to-end lane marker detection via row-wise classification., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, Seattle, WA, USA, Array, pp.4335-4343. https://doi.org/10.1109/CVPRW50498.2020.00511
    CrossRef
  9. Abualsaud, H, Liu, S, Lu, DB, Situ, K, Rangesh, A, and Trivedi, MM (2021). LaneAF: robust multi-lane detection with affinity fields. IEEE Robotics and Automation Letters. 6, 7477-7484. https://doi.org/10.1109/LRA.2021.3098066
    CrossRef
  10. Pan, X, Shi, J, Luo, P, Wang, X, and Tang, X . Spatial as deep: spatial CNN for traffic scene understanding., Proceedings of the AAAI Conference on Artificial Intelligence, 2018, Array, pp.7276-7283. https://doi.org/10.1609/aaai.v32i1.12301
    CrossRef
  11. Zheng, T, Fang, H, Zhang, Y, Tang, W, Yang, Z, Liu, H, and Cai, D . RESA: recurrent feature-shift aggregator for lane detection., Proceedings of the AAAI Conference on Artificial Intelligence, 2021, Array, pp.3547-3554. https://doi.org/10.1609/aaai.v35i4.16469
    CrossRef
  12. Philion, J . FastDraw: addressing the long tail of lane detection by adapting a sequential prediction network., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, Long Beach, CA, USA, Array, pp.11582-11591. https://doi.org/10.1109/CVPR.2019.01185
    CrossRef
  13. Su, J, Chen, C, Zhang, K, Luo, J, Wei, X, and Wei, X. (2021) . Structure guided lane detection. Available: https://arxiv.org/abs/2105.05403
    CrossRef
  14. Ko, Y, Lee, Y, Azam, S, Munir, F, Jeon, M, and Pedrycz, W (2022). Key points estimation and point instance segmentation approach for lane detection. IEEE Transactions on Intelligent Transportation Systems. 23, 8949-8958. https://doi.org/10.1109/TITS.2021.3088488
    CrossRef
  15. Qin, Z, Wang, H, and Li, X (2020). Ultra fast structure-aware deep lane detection. Computer Vision – ECCV 2020. Cham, Switzerland: Springer, pp. 276-291 https://doi.org/10.1007/978-3-030-58586-0_17
    CrossRef
  16. Li, X, Li, J, Hu, X, and Yang, J (2020). Line-CNN: end-to-end traffic line detection with line proposal unit. IEEE Transactions on Intelligent Transportation Systems. 21, 248-258. https://doi.org/10.1109/TITS.2019.2890870
    CrossRef
  17. Tabelini, L, Berriel, R, Paixao, TM, Badue, C, De Souza, AF, and Oliveira-Santos, T . Keep your eyes on the lane: real-time attention-guided lane detection., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Nashville, TN, USA, Array, pp.294-302. https://doi.org/10.1109/CVPR46437.2021.00036
    CrossRef
  18. Tabelini, L, Berriel, R, Paixao, TM, Badue, C, De Souza, AF, and Oliveira-Santos, F . PolyLaneNet: lane estimation via deep polynomial regression., Proceedings of 2020 25th International Conference on Pattern Recognition (ICPR), 2021, Milan, Italy, Array, pp.6150-6156. https://doi.org/10.1109/ICPR48806.2021.9412265
    CrossRef
  19. Liu, S, Qi, L, Qin, H, Shi, J, and Jia, J . Path aggregation network for instance segmentation., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, USA, Array, pp.8759-8768. https://doi.org/10.1109/CVPR.2018.00913
    CrossRef
  20. Girshick, R . Fast R-CNN., Proceedings of the IEEE International Conference on Computer Vision, 2015, Santiago, Chile, Array, pp.1440-1448. https://doi.org/10.1109/ICCV.2015.169
    CrossRef
  21. Lin, TY, Goyal, P, Girshick, R, He, K, and Dollar, P . Focal loss for dense object detection., Proceedings of the IEEE International Conference on Computer Vision, 2017, Venice, Italy, Array, pp.2980-2988. https://doi.org/10.1109/ICCV.2017.324
    CrossRef

Jeongmin Kim received his M.S. degrees (electronics and information engineering) from Soonchunhyang university, in 2013 and 2023, respectively. He has been working at AUTONOMOUS A2Z since 2023. His research interests are computer vision, object detection, deep learning, and unsupervised learning.

Hyukdoo Choi received the B.S. and Ph.D. degrees in electrical and electronics engineering from Yonsei University, Seoul, Republic of Korea, in 2009 and 2014, respectively. From 2014 to 2017, he worked for LG Electronics as a senior research engineer. Since 2018, he has been an assistant professor with the department of electronics and information engineering in Soonchunhyang university. His research interests include simultaneous localization and mapping (SLAM), visual odometry, computer vision, object detection, deep learning, and unsupervised learning.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2024; 24(2): 105-113

Published online June 25, 2024 https://doi.org/10.5391/IJFIS.2024.24.2.105

Copyright © The Korean Institute of Intelligent Systems.

PolyLaneDet: Lane Detection with Free-Form Polyline

Jeongmin Kim1 and Hyukdoo Choi2

1AI Technology Development Team 2, Autonomous A2Z, Anyang, Korea
2Department of Electronic Materials, Devices, and Equipment Engineering, Soonchunhyang University, Asan, Korea

Correspondence to:Hyukdoo Choi (hyukdoo.choi@sch.ac.kr)

Received: August 1, 2023; Revised: May 4, 2024; Accepted: May 27, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Lane detection is a critical component of autonomous driving technologies that face challenges such as varied road conditions and diverse lane orientations. In this study, we aim to address these challenges by proposing PolyLaneDet, a novel lane detection model that utilizes a freeform polyline, termed ‘polylane,’ which adapts to both vertical and horizontal lane orientations without the need for post-processing. Our method builds on the YOLOv4 architecture to avoid restricting the number of detectable lanes. This model can regress both vertical and horizontal coordinates, thereby improving the adaptability and accuracy of lane detection in various scenarios. We conducted extensive experiments using the CULane benchmark and a custom dataset to validate the effectiveness of the proposed approach. The results demonstrate that PolyLaneDet achieves a competitive performance, particularly in detecting horizontal lane markings and stop lines, which are often omitted in traditional models. In conclusion, PolyLaneDet advances lane detection technology by combining flexible lane representation with robust detection capabilities, making it suitable for real-world applications with diverse road geometries.

Keywords: Lane detection, CNN, Deep learning

1. Introduction

Lane detection is a fundamental component of advanced driver assistance systems (ADAS) and self-driving systems. Given its importance, numerous studies have been conducted in the computer vision community to enhance lane detection [1, 2]. Because lane markings are thin lines, cameras are the only sensors that can reliably recognize lanes, making convolutional neural networks (CNNs) the foundational framework for lane detection. However, lane detectors are vulnerable to camera limitations including severe occlusions, lighting conditions, weather conditions, and blurry markings.

Existing studies can be categorized based on lane representation: segmentation and parametric representations. The segmentation representation produced by semantic segmentation models [3, 4] is a pixel-wise classification map. Although it is easy to train and does not limit the number or shape of lane markings, it is user-friendly for humans but not for machines. An additional post-processing step is required to extract a parametric representation to make the segmentation map usable for autonomous driving systems. Although this approach may lead to high-performance benchmarks, its practical application is challenging. In contrast, parametric representation defines specific lane-marking instances with parameters. This is straightforward to use but is limited to a certain number and shape of lane markings. Most parametric representations assume that a vehicle is driving in a long, regular lane, without considering lane changes or crossroads. Although such situations are infrequent in benchmarks, they often occur and are critical to self-driving vehicles.

In this study, we present a form-free representation of lane markings designed to reliably detect lane markings in various practical cases. Unlike previous works [5, 6], our representation takes the form of a polyline but does not fix the y-axis coordinates on the images. The use of fixed y-coordinates assumes only vertical lane markings, making them unsuitable for handling horizontal lane markings. Our lane detector can be applied to stop lines or crosswalks that are more appropriately represented by lines than by bounding boxes. Furthermore, our model, which is based on the object detector YOLOv4 [7], can detect all lane markings in an image, unlike existing parametric representation models that are designed to output up to four lanes [8, 9].

The main contributions of this paper are summarized as follows: First, we propose a novel representation of lane markings, termed a polylane, which is a free-form polyline that has not previously been used for lane detection and does not require any post-processing. Second, our model is based on the widely used YOLOv4, which has a relatively simple structure compared with existing models [10, 11]. Third, we develop a new nonmaximal suppression (NMS) algorithm specifically for lane marking. Because our model densely predicts lane markings from the output feature map, an NMS algorithm is necessary. Our PolyLane NMS (PL-NMS) algorithm effectively filters out overlapping predictions. Finally, our algorithms were validated using a custom road image dataset and the widely recognized CULane dataset.

2. Related Work

Because lane markings are detected from camera images, most lane detection models are based on CNN. Lane detectors are categorized based on their output representations. Lane markings are typically represented using pixel-wise segmentation maps or instance-wise parameters. Existing studies were reviewed based on these criteria.

2.1 Segmentation Representation

From early studies on lane detection based on deep learning, semantic segmentation has become a popular approach to lane detection tasks [814]. The segmentation map consists of (C+1) channels that predict the probability of each pixel belonging to one of the lane or background classes. Pan et al. [10] proposed a spatial CNN for propagating local information globally. It treats the row or column slices of a feature map as internal layers and performs convolutions through the slices in four directions. As a result, each output pixel secures a global field of view; however, this slows down through many slice operations. Recurrent feature-shift aggregator (RESA) [11] developed this idea more efficiently. They proposed RESA module to remove sequential operations within a layer. It parallelizes row-wise or column-wise convolutions and iterates layers with different slice strides.

However, there are different approaches for outputting a lane segmentation map. Using prior knowledge that lane markings are long thin vertical lines, they classified lane pixels over same-row pixels rather than traditional channel-wise classification. Ultra-fast lane detection (UFLD) [15] invented this approach to achieve high-speed segmentation. It formulates a lane-detection task as a column selection problem in each row of a low-resolution feature map. In segmentation-based approaches, segmentation map outputs are easy to interpret and visualize; however, heuristic post-processing must recognize connected lane markings.

2.2 Parametric Representation

Unlike in segmentation representations, there are diverse representations of lane markings in parametric representations. LineCNN [16] is an anchor-based lane detector. Lane markings were detected from the border grid cells of the feature map, and multiple line proposals based on predefined angles were presented for each grid cell. The line with the highest score was selected, and its detailed shape was tuned using the horizontal offset parameters of the selected line. LaneATT [17] developed an anchor-based lane detector using an attentional mechanism. It extracts feature vectors along predefined anchor lines on a feature map and fuses them with attention weights to obtain global features. The global feature helps to detect occluded lane markings. These anchor-based models are vulnerable to occlusions occurring at the edges of images because they detect lane markings starting from the image borders. In addition, the use of many anchor lines leads to computational inefficiency.

CondLaneNet [5] inherits a row-wise segmentation map from UFLD; however, lane markings are located using the weighted mean of grid cell locations rather than classification. Lane instances are detected at the starting points of the lane markings at the bottom of the image, as in LineCNN [16]. They proposed a recurrent instance module (RIM) to handle complex shapes such as forked lanes; however, utilizing long short-term memory (LSTM)-based dynamic kernels overly complicates the model structure. PolyLaneNet [18] directly outputs polynomial coefficients for lane markings from the fully connected layers on top of the CNN backbone. Most recently, CLRNet [6] represented lane markings using vertically equally spaced 2D points; however, they were derived from lane priors. A lane prior consists of a starting point and angle, and the shape of the lane is fine-tuned using horizontal offsets. CLRNet [6] contains a cross-layer refinement structure to fully utilize high-level features to low-level features and is trained by Line intersection over union (IoU) loss to reduce the horizontal distance to the ground-truth (GT) lane.

Although various parametric representations of lane markings exist, a vertical lane is assumed. This can be an issue when entering a new lane on the left or right side. However, our polylane representation has no prior assumptions and can detect more than lane markings.

3. Methodology

Methods should be described in sufficient detail to allow others to replicate and build on published results.

3.1 Network Architecture

Our model was designed to overcome the limitations of parametric representation approaches with respect to the number and form of lane markings. This detector-based model can detect a significant number of lane markings and the polyline representation, referred to as a polylane, dismisses the assumption of vertical lines. The network architecture is shown in Figure 1. An RGB image served as the input to the backbone network CSPDarkNet53 in YOLOv4, and the backbone feature maps were then aggregated in the path aggregation network (PAN) [19]. The PAN structure fuses high- and low-level features using both upward and downward paths. The lane detection head is linked to the mid-level feature of the PAN and outputs the lane parameters in each grid cell. Because the lane class is closely tied to the location and lane markings are represented by relative coordinates, spatial encoding is appended to the PAN feature before head convolutions. The resolution of the output feature map is 1/16 that of the input image.

The output parameters include lane points {xi, yi}i=1,...,N , centerness c, laneness , and class probabilities {pi}i=1,...,C . Lane points correspond to the coordinates of the polylane vertices. Centerness denotes the probability of the center point of the lane marking residing within a specific grid cell, whereas laneness implies the likelihood of a polyline crossing the grid cell. The output parameters are derived from a raw feature map. As the values resulting from the convolution operation on this raw feature map are unbounded, the employment of suitable activation functions is necessary. The activation functions allocated to each type of parameter are defined as follows:

(xi,yi)=2(σ((xi,yi))-0.5)+(gx,gy),c=σ(c),=σ(l),pi=σ(pi).

In this equation, the output parameter x is derived from the corresponding raw feature x′ . (gx, gy) are the coordinates of the upper left point of the grid cell within the range of [0, 1], and σ() denotes a sigmoid function. Points (xi, yi) can cover any position in an image.

We modified the output parameters from YOLOv4, which implies that there are minimal constraints on the number of output objects as well as their position and direction. Furthermore, this approach can be applied to any object detection model and not just YOLOv4.

3.2 Loss

Given that four types of output parameters exist, the training loss comprises the following four components: For the lane-point regression task, we employed SmoothL1 loss [20] as follows:

point=i=1Nx{x,y}SmoothL1(x^i,xi).

The GT coordinates for lane points are derived by uniformly spacing N points along a labeled lane mark. The parameters for both centerness and laneness pertaining to binary classification tasks undergo training via focal loss [21], which emphasizes binary cross-entropy in areas of high loss. The class probabilities were trained according to the sum of the binary cross-entropies, adhering to the method implemented in YOLOv3. The definitions of classification losses are given in Eqs. (6), (7), and (8).

center=(c^-c)2{c^log c+(1-c^)log(1-c)},lane=(^-)2{^log l+(1-^)log(1-)},class=i=1Cp^ilog (pi)+(1-p^i)]log(1-pi).

The total loss function is the sum of four losses with weights, as expressed by the following equation:

total=λpointpoint+λcentercenter+λlanelane+λclassclass,

where the λ parameters are the weights of the losses that follow.

In this study, for our free-form polylane, a novel NMS algorithm is devised PL-NMS. However, calculating the distance between complex-shaped polylines proved challenging. Fortunately, lane markings typically display simple shapes with low curvatures and are sparse and almost equally distributed. As such, simplifying polylanes into line segments does not pose any issues when eliminating overlapping polylanes.

3.3 PolyLane NMS

Polylanes were densely extrapolated from the output feature map, with most filtered out using centerness thresholding. However, overlapping polylanes remain that are closely related to their position and direction. Typically, in object detection, a model produces bounding boxes and the NMS algorithm is utilized to eliminate overlapping boxes. In our case, the NMS algorithm required modifications to accommodate the polylanes. The key deviation from existing NMS lies in the method used to calculate the IoU.. CLRNet [6] introduced the Line IoU, which computes the intersection of two lines of fixed width solely on a horizontal axis. Although it is simple to compute and differentiable, it assumes vertical lines, rendering it inapplicable to horizontal lines.

The process for simplifying a polylane into a line segment is illustrated in Figure 2(a). Given N points on a polylane, an optimal line equation is fitted in the form of ax + by = 1, with the intention of minimizing the mean distance to these points. The point projected onto the line from the first point (x1, y1), is designated as the starting point, (χ̄s, s) whereas the end point (χ̄e, e) is the point projected from the final point (xN, yN).

When the polylanes are simplified into line segments, the calculation of the distances between these segments becomes straightforward, as illustrated in Figure 2(b). The distance between line segments i and k is formulated as follows:

di,k=dk,i=min{max(dsi,k,dei,k),max(dsk,i,dek,i)}.

Here, dsi,k signifies the orthogonal distance from the starting point of line i to line k and dei,k represents the orthogonal distance from the end point of line i to line k. max(dsi,k, dei,k) indicates the maximum distance from line i to line k, and the minimum operation is used to eliminate short line segments. In PL-NMS, it is not necessary to compute the IoU between polylanes. Rather, the distance between the polylanes is considered, which inversely corresponds to the IoU of the bounding boxes. Polylanes were deemed to overlap when the distance between them was minimal. In cases of overlap, the polylane with the higher centerness was retained, while the other was eliminated.

4. Result

4.1 Datasets and Training

Our model was verified using the popular lane-detection dataset CULane [10]. CULane provides camera images and lane labels in various environments, including difficult conditions, such as night, dazzles, and shadows. The dataset contained 133,235 images with an image resolution of 1640 × 590. The model was evaluated using the CULane benchmark tool.

As we started our research on the private project, we also created a custom dataset for lane detection. Although we cannot disclose the data due to security issues, the custom dataset contains approximately 20,000 frames. Roadmarks, including stop lines and lane marks, were annotated in the dataset. Our model was trained to detect lane marks and stop lines.

For training, a single Nvidia RTX 3090 was used, and the model was optimized using Adam. The learning rate was initially set to 0.001 and reduced by 1/10 every 10 epochs. Training was conducted for 40 epochs with a batch size of 2. Our model was implemented using TensorFlow 2.8.

4.2 CULane Results

The qualitative results derived from the CULane dataset are shown in Figure 3. The figure comprises eight cells corresponding to eight distinct scenes, with each cell featuring four images juxtaposed to facilitate a comparative study of lane markings within the same scene. The order of representation from top to bottom includes the original image, GT label image, CLR-Net results, and PolyLaneDet results. CLRNet, which is one of the latest lane detection models, was trained from scratch. Each column, moving from left to right, shows different scene categories: Normal, Shadowed, No Line, and Cross. These composite images were collated from the entire test dataset and encompassed over 23,000 frames. Observations from these images indicate that although our model occasionally overlooks a few lane markings, it performs effectively in most cases, given the availability of sufficiently discernible visual cues. Nonetheless, numerous frames were labeled without any apparent visual cues. In the shadow category, lane markings are overly faint to be discernible, even for human observers. The ‘No Line’ category lacks entirely visible lane markings; however, lane markings are instinctively labeled by human annotators. Additionally, the ‘Cross’ category solely features empty labels. The presence of numerous frames labeled without visible cues, juxtaposed with the absence of labels for the ‘Cross’ class despite clearly visible lane markings, introduces a measure of confusion. The CULane dataset was chosen for evaluation because of its popularity in lane detection and its extensive collection of images. However, the reliability of these labels remains debatable.

To evaluate the quantitative performance, our model was evaluated using the CULane benchmark, as summarized in Table 1. We only evaluated the F1 score of ‘Normal’ category, where most of the lane marks are visible, whereas a large part of other categories has invisible lane markings. The F1 score with an IoU threshold of 0.5 is referred to as F1@50. Our model performance appears to be inferior to that of state-of-the-art models, but it is essential to consider the context. Our model is not specifically optimized for detecting vertical lines. Rather, it offers extended flexibility because it does not impose restrictions on the number of detected lanes or their orientations. By contrast, the CULane benchmark restricts lane detection to a maximum of four segments per image. Other models align with this benchmark by focusing on limited starting points and predefined directions. However, the lack of constraints in our model allows it to detect more potential lane candidates, resulting in excessively higher false positives. In addition, when lane visual features are faint or ambiguous, our model may struggle to identify them effectively. The value of our approach was revealed using a custom dataset.

An ablation study was conducted to test different numbers of polylane vertices, as summarized in Table 2. It seems that more vertices help polylanes fit lane labels more closely, but the table shows the opposite results. More vertices render polylane shapes unnecessarily complex, resulting in poor performance. Because most lane markings are straight lines, a simple parameterization is more advantageous.

4.3 Custom Dataset Results

Our customized road image dataset encompasses approximately 20,000 annotated frames, 1,000 of which comprise a test split. The annotations include two classes: stop lines and lane markings. Image frames were gathered during the daytime, with annotations made solely on the visible portions of the lane markings and stop lines. Our model was trained to detect lane markings and stop lines. The qualitative results are shown in Figure 4. True positives (TP), represented by green lines, indicate instances in which the GT labels correspond to predictions. The red lines in the middle row represent false negatives (FN), and those in the bottom row represent false positives (FP). Despite the presence of red lines signifying FNs and FPs, many FNs corresponded closely to FPs. In line with the CULane methodology, GT labels, and predictions were matched based on the IoU of the two-lane masks. The width of the lanes was set to 10 pixels, and the IoU threshold was set to 0.5. In contrast to the CULane, the number of lanes was not restricted.

The qualitative results are presented in Table 3. The detection performance for stop lines is lower than that for lane markings because stop lines are generally located farther away and appear shorter owing to occlusion. The reason for the lower lane detection performance on the custom dataset compared with the CULane benchmark is the difference in the performance calculation method. In the CULane benchmark, lanes were drawn with a width of 30 pixels on low-resolution images with a width of 800 pixels, and the IoU was calculated accordingly. However, in the custom dataset, lanes were drawn with a width of 10 pixels on high-resolution images of 1200 pixels width, and the IoU was calculated based on these dimensions. Owing to our model’s capacity to output freeform polylines, the qualitative and quantitative results demonstrate that our model can effectively detect lane markings and horizontal stop lines.

5. Conclusion

In this study, we introduced a novel lane detector, PolyLaneDet, which is equipped with a unique parametric representation of lane markings. This novel representation, called a polylane, operates as a free-form polyline that does not constrain the y-axis coordinates. Because the polylane representation dispenses with the vertical lane assumption, it effectively handles intersections and lane changes. To avoid limiting the number of detected lane markings, we employed YOLOv4 architecture. Lane markings were detected at the center points and a new algorithm, PL-NMS, was proposed to eliminate overlaps between the detected lane markings. Our model was evaluated using both the CULane benchmark and our custom road image dataset and demonstrated commendable results. Notably, our model can be trained to detect horizontal stop lines, a feature that differentiates it from existing models, which generally assume nearly vertical lines. In the future, we aim to enhance the lane detection performance using public and more reliable datasets while retaining the current advantages.

Fig 1.

Figure 1.

PolyLaneDet architecture. CSPDarkNet53 is used as a backbone network, and PAN is employed as a neck network. A couple of convolutions are applied to the middle level PAN feature in the head.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 105-113https://doi.org/10.5391/IJFIS.2024.24.2.105

Fig 2.

Figure 2.

PolyLane NMS. (a) Polylane is represented by a polyline, and it is simplified to a line segment before the NMS process. (b) Orthogonal distances from a line endpoint to the other line is used to compute the distance between the polylanes.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 105-113https://doi.org/10.5391/IJFIS.2024.24.2.105

Fig 3.

Figure 3.

Qualitative results from CULane. There are 8 grid cells, and each cell shows a pack of 4 images. They are, from the top, a source image, GT label, CLRNet result and our PolyLaneDet result

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 105-113https://doi.org/10.5391/IJFIS.2024.24.2.105

Fig 4.

Figure 4.

Qualitative results from the custom dataset. There are four samples, and each sample consists of three rows: (from the top) a source image, GT label, and our PolyLaneDet result.

The International Journal of Fuzzy Logic and Intelligent Systems 2024; 24: 105-113https://doi.org/10.5391/IJFIS.2024.24.2.105

Table 1 . Quantitative results from CULane benchmark.

MethodBackboneF1@50
SCNN [10]VGG1690.60
RESA [11]ResNet5092.10
UFLD [15]ResNet3490.70
CondLane [5]ResNet10193.47
CLRNet [6]DLA3493.73
PolyLaneDetCSPDarkNet5384.29

The metric is F1@50 of the Normal category in the CULane benchmark..


Table 2 . F1@50 metrics of Normal category with the different numbers of vertices of polylanes.

ParameterF1@50
n = 584.29
n = 1061.54
n = 2053.43

Table 3 . Quantitative results (%) from the custom road dataset.

ClassRecallPrecisionF1@50
Lane68.7868.9468.86
Stop line62.9045.2452.63

References

  1. Tan, H, Zhou, Y, Zhu, Y, Yao, D, and Li, K . A novel curve lane detection based on Improved River Flow and RANSA., Proceedings of the 17th International IEEE Conference on Intelligent Transportation Systems, 2014, Qingdao, China, Array, pp.133-138. https://doi.org/10.1109/ITSC.2014.6957679
    CrossRef
  2. Liu, G, Worgotter, F, and Markelic, I . Combining statistical Hough transform and particle filter for robust lane detection and tracking., Proceedings of 2010 IEEE Intelligent Vehicles Symposium, 2010, La Jolla, CA, USA, Array, pp.993-997. https://doi.org/10.1109/IVS.2010.5548021
    CrossRef
  3. Long, J, Shelhamer, E, and Darrell, T . Fully convolutional networks for semantic segmentation., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, Boston, MA, USA, Array, pp.3431-3440. https://doi.org/10.1109/CVPR.2015.7298965
    CrossRef
  4. Ronneberger, O, Fischer, P, and Brox, T. (2015) . U-Net: convolutional networks for biomedical image segmentation. Medical Image Computing and Computer Assisted Intervention – MICCAI 2015, 234-241. https://doi.org/10.1007/978-3-319-24574-4_28
    CrossRef
  5. Liu, L, Chen, X, Zhu, S, and Tan, P. CondLaneNet: a top-to-down lane detection framework based on conditional convolution., Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021, pp. 3753-3762. https://doi.ieeecomputersociety.org/10.1109/ICCV48922.2021.00375
    CrossRef
  6. Zheng, T, Huang, Y, Liu, Y, Tang, W, Yang, Z, Cai, D, and He, X . CLRNet: cross layer refinement network for lane detection., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, New Orleans, LA, USA, Array, pp.898-907. https://doi.org/10.1109/CVPR52688.2022.00097
    CrossRef
  7. Bochkovskiy, A, Wang, CY, and Liao, HYM. (2020) . YOLOv4: optimal speed and accuracy of object detection. Available: https://arxiv.org/abs/2004.10934
  8. Yoo, S, Lee, HS, Myeong, H, Yun, S, Park, H, Cho, J, and Kim, DH . End-to-end lane marker detection via row-wise classification., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, Seattle, WA, USA, Array, pp.4335-4343. https://doi.org/10.1109/CVPRW50498.2020.00511
    CrossRef
  9. Abualsaud, H, Liu, S, Lu, DB, Situ, K, Rangesh, A, and Trivedi, MM (2021). LaneAF: robust multi-lane detection with affinity fields. IEEE Robotics and Automation Letters. 6, 7477-7484. https://doi.org/10.1109/LRA.2021.3098066
    CrossRef
  10. Pan, X, Shi, J, Luo, P, Wang, X, and Tang, X . Spatial as deep: spatial CNN for traffic scene understanding., Proceedings of the AAAI Conference on Artificial Intelligence, 2018, Array, pp.7276-7283. https://doi.org/10.1609/aaai.v32i1.12301
    CrossRef
  11. Zheng, T, Fang, H, Zhang, Y, Tang, W, Yang, Z, Liu, H, and Cai, D . RESA: recurrent feature-shift aggregator for lane detection., Proceedings of the AAAI Conference on Artificial Intelligence, 2021, Array, pp.3547-3554. https://doi.org/10.1609/aaai.v35i4.16469
    CrossRef
  12. Philion, J . FastDraw: addressing the long tail of lane detection by adapting a sequential prediction network., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, Long Beach, CA, USA, Array, pp.11582-11591. https://doi.org/10.1109/CVPR.2019.01185
    CrossRef
  13. Su, J, Chen, C, Zhang, K, Luo, J, Wei, X, and Wei, X. (2021) . Structure guided lane detection. Available: https://arxiv.org/abs/2105.05403
    CrossRef
  14. Ko, Y, Lee, Y, Azam, S, Munir, F, Jeon, M, and Pedrycz, W (2022). Key points estimation and point instance segmentation approach for lane detection. IEEE Transactions on Intelligent Transportation Systems. 23, 8949-8958. https://doi.org/10.1109/TITS.2021.3088488
    CrossRef
  15. Qin, Z, Wang, H, and Li, X (2020). Ultra fast structure-aware deep lane detection. Computer Vision – ECCV 2020. Cham, Switzerland: Springer, pp. 276-291 https://doi.org/10.1007/978-3-030-58586-0_17
    CrossRef
  16. Li, X, Li, J, Hu, X, and Yang, J (2020). Line-CNN: end-to-end traffic line detection with line proposal unit. IEEE Transactions on Intelligent Transportation Systems. 21, 248-258. https://doi.org/10.1109/TITS.2019.2890870
    CrossRef
  17. Tabelini, L, Berriel, R, Paixao, TM, Badue, C, De Souza, AF, and Oliveira-Santos, T . Keep your eyes on the lane: real-time attention-guided lane detection., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Nashville, TN, USA, Array, pp.294-302. https://doi.org/10.1109/CVPR46437.2021.00036
    CrossRef
  18. Tabelini, L, Berriel, R, Paixao, TM, Badue, C, De Souza, AF, and Oliveira-Santos, F . PolyLaneNet: lane estimation via deep polynomial regression., Proceedings of 2020 25th International Conference on Pattern Recognition (ICPR), 2021, Milan, Italy, Array, pp.6150-6156. https://doi.org/10.1109/ICPR48806.2021.9412265
    CrossRef
  19. Liu, S, Qi, L, Qin, H, Shi, J, and Jia, J . Path aggregation network for instance segmentation., Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, USA, Array, pp.8759-8768. https://doi.org/10.1109/CVPR.2018.00913
    CrossRef
  20. Girshick, R . Fast R-CNN., Proceedings of the IEEE International Conference on Computer Vision, 2015, Santiago, Chile, Array, pp.1440-1448. https://doi.org/10.1109/ICCV.2015.169
    CrossRef
  21. Lin, TY, Goyal, P, Girshick, R, He, K, and Dollar, P . Focal loss for dense object detection., Proceedings of the IEEE International Conference on Computer Vision, 2017, Venice, Italy, Array, pp.2980-2988. https://doi.org/10.1109/ICCV.2017.324
    CrossRef

Share this article on :

Related articles in IJFIS