Performance Evaluation of Feature Extraction Algorithms for Vehicle Shape Classification

Vehicle classification is a classic application of automotive image processing that is necessary for a variety of modern vehicle safety and comfort features. While machine learning-based solutions are effective in these fields and are currently employed extensively in various automotive applications. However, the most challenging aspect of automotive image processing with machine learning methods is gathering adequate quality and quantity datasets to develop such applications. Additionally, imbalanced datasets are common in multiclass automotive image processing, as is the case with the current topic of vehicle platoon management. The effectiveness of available handcrafted feature extractors and classifiers employed for vehicle class categorization varies greatly due to the effect of dataset imbalance. This study aims to examine how the performance of four prominent feature extractors alters when used with the imbalance dataset for vehicle shape classification. Also, the use of image augmentation techniques to increase the dataset size for three vehicle classes: car, bus, and truck, has been presented. Further, using the Support Vector Machine (SVM) classifier, experimental analysis was performed using feature extractors such as Histogram of Oriented Gradient (HOG), Scaled Invariant Feature Transform (SIFT), Speeded-Up Robust Feature (SURF), and Haar. Vehicle shape classification, which is an important characteristic in vehicle platoon management, has been evaluated using Receiver Operating Characteristic (ROC) for both the unbalanced dataset and the augmented dataset. The experimental results demonstrate that using the HOG feature extractor performs better when compared to SIFT, SURF, and HAAR feature extractors on the imbalance dataset. After using an image augmentation technique to add images, output performance improved significantly, with HOG output of 95%, SIFT output of 91%, SURF output of 91%, and HAAR output of 96%.


Introduction
For its many vehicle safety and comfort features, the automotive industry heavily relies on vision-based vehicle detection and classification systems. Vehicle shapes can be classified using a variety of methods, including feature-based, model-based, and machine learningbased approaches. In the work of Chen, Pan, and Zou (2016), a vision-based method was utilized to perform lane-line recognition in conjunction with vehicle detection using a featurebased strategy that uses an edge feature and vertical projection to estimate vehicles in the image. Li and Wang (2015) developed a new hybrid image template for describing vehicle features through the use of texture and color. Zakria et al. (2018) worked on a model-based approach and proposed a new probabilistic framework for vehicle detection. Chen et al. (2018) proposed a vehicle detection model for complex traffic video surveillance. Penghua and Yanping (2019) proposed an improved feature extractor using an SVM classifier. Machine learning-based vehicle shape classification has gained popularity in recent years due to its simplicity and increased accuracy. SVM is one of the popular machine learning algorithms used for image processing-based classification problems. Local Binary Pattern (LBP) (Wen et al. 2015), Gist (Sikirić, Brkić, and Šegvić 2013), HOG (Lee et al. 2015), SIFT (Chen and Meng 2013), SURF (Song et al. 2019), HAAR (Wei et al. 2019) are some of the prominent handcrafted feature extraction techniques. According to the literature review, feature extractors like HOG, SIFT, SURF, and Haar have been used in a variety of vehicle applications and have produced notable results for various scale changes. The current work focuses on using popular handcrafted feature extractors to do machine learning-based vehicle shape categorization. Feature extractors that are invariant to image augmentation approaches have been chosen for the current study. In addition, the current effort aims to improve the dataset for bus and truck images by incorporating multiple image augmentation techniques such as rotation, crop, blur, and tilt. The rest of the article is organized as follows: Section 2 provides an overview of the image processing problem statement for vehicle platoon management. Section 3 describes the algorithm's implementation approach, Section 4 offers simulation results for the given problem using various feature extraction techniques, Section 5 summarizes and discusses the results; and Section 6 wraps up the study with concluding remarks.

Problem Statement
Vehicle platoon management entails the coordination of many vehicles driving in the same longitudinal direction, as shown in Figure 1. Vehicles in platooning systems typically feature at least one front camera for Advanced Driver Assistance Systems (ADAS) and communicate with one another using advanced telematics systems. In platoon management, the lead vehicle will direct the other followers' vehicles to reap the benefits of platooning. Reduced fuel or energy consumption, reduced carbon dioxide (CO2) emissions, reduced traffic congestion, and reduced accidents are all important advantages of vehicle platoon management. Various researchers have demonstrated the potential advantages of vehicle platooning in their work. Liu et al. (2019) presented a new platoon technique that maximizes fuel economy during lane changes. Ballarin and Zeilinger (2017) showed how platoon management benefits society and customers by lowering fuel consumption, CO2 emissions, traffic congestion, and road accidents in their study. Peng, Bonsangue, and Xu (2019) suggest a novel timed automata model for safe vehicle platooning. Each following vehicle will transmit the image of its front vehicle to the lead vehicle through a front vehicle camera. As illustrated in Figure 1, after a new merger, the platoon consists of four vehicles: V1 is the lead vehicle, while V2, V3, and V4 are follower vehicles. Follower vehicles V2, V3, and V4 shall capture front images of their respective front vehicles and transmit them to lead vehicle V1 via vehicle-to-vehicle communication. It is assumed in the current work that all vehicles travel at a constant distance d. The lead vehicle V1 is responsible for identifying the various types of vehicles in platooning and arranging them in a certain order based on vehicle area. To optimize fuel efficiency, however, it is important to group vehicles according to their shape before arranging them in a prescribed manner. Numerous studies have been conducted to date utilizing various feature extraction algorithms as a pre-processing step for vehicle form detection in automotive image processing using single-class and multiclass machine learning classifiers. Nayak, Gopalswamy, and Rathinam (2019) conducted a study employing various features and different classifiers on a variety of vehicles, demonstrating that HOG with Adaboost delivers a 40% more accurate and precise classification result. In Amraoui et al. (2019), the accuracy of SURF has been compared to other extractors; it is 100 percent when compared to other bio-inspired approaches; however, the authors proposed enhancing the average duration of reaction before offering the bio-inspired approach. Kapoor et al. (2007) showed that utilizing SVM and SIFT, the Gaussian Process with Pyramid Match Kernal may be used to recognize a wide variety of target vehicles. Wei et al. (2019) demonstrated and discussed the importance of performing research in multi-vehicle detection settings, as well as the advantages of combining HOG and Haar feature extraction with an SVM classifier. Bougharriou, Hamdaoui, and Mtibaa (2018) demonstrated that linear SVM with HOG feature benefits advanced vehicle driver assistance systems by improving the performance of their proposed algorithms. Lin et al. (2012) researched blind spot identification using a front and sidecar model with SURF as the feature extractor and probabilistic as the classifier. In Sun, Bebis, and Miller (2006), the HOG and Gabour feature extractors were compared to the SVM classifier and demonstrated their superiority. The feature extraction approach in image processing frequently results in the loss of the original image features, according to the literature review. Furthermore, each feature extractor has its own set of pros and downsides; therefore, a comprehensive investigation of the specific application should be undertaken before selecting a feature extractor. The goal of this study is to learn more about the performance of four popular feature extractors for vehicle shape detection, which will be employed in vehicle platoon management. Furthermore, the study demonstrates how image augmentation may be used to improve the quality of image datasets, hence boosting the performance of machine learning systems.

Algorithm Implementation
The experimental work has been carried out using Python 3.7.9, with IDE PyCharm and OpenCV Library. PyCharm is used as it is the dedicated Integrated Development Environment (IDE) for Python, which provides essential tools for the development of machine learning and data science-based applications; OpenCV (Open Source Computer Vision Library) is an opensource software library for computer vision, machine learning, and image processing. The implementation flow of the proposed algorithm is depicted in Figure 2, with the first stage being the creation of the required image dataset, followed by filtering and training with multiple feature extractors. Following the extraction of features depending on the region of interest, the vehicles are classified using an SVM classifier before being arranged in a specified manner for platoon management.

Vehicle dataset creation
A brief survey was undertaken to ascertain the availability of datasets for the stated topic, as identifying or developing data sets for the specified topic is a vital step in achieving the best results for any classification challenges involving machine learning algorithms. Zakria et al. (2018) have developed the datasets of rear-view car images and a few vehicle datasets have been made open source. Additionally, the datasets mentioned in these papers have also been made public (Chang et al. 2018;Lou et al. 2019;Naphade et al. 2021). According to the results of the survey, open repository datasets may not contain all of the images required to conduct an experimental study on a certain issue. As a result, to expand the dataset size for the current problem study, data augmentation techniques were used. Currently, the effort is being made to recognize certain vehicles based on their size and organize them suitably so that the platoon's lead vehicle can influence the rearrangement of other vehicles to improve fuel or energy economy, which benefits consumers and society. For the proposed problem statement, the vehicles investigated are cars, trucks, and buses, and the associated image collection was used for the preliminary study. The images have been gathered and created, as mentioned in Figure 3. The images were taken from various opensource repositories and 5000 images of car types were obtained from these open datasets (Krause et al. 2013). However, there were about 3500 images available for Truck (Tabassum et al. 2020), while nearly 2900 images were available for Bus (Tabassum et al. 2020). Initially, an analysis was carried out using the available images, with minimal image quantities of 2900 images for each car, bus, and truck. Additionally, studies were carried out utilizing an unbalanced image dataset acquired from an open-source repository (Arróspide, Salgado, and Nieto 2012) which included 5000, 2900, and 3500 images of each car, bus, and truck, respectively. The dataset size was increased for the final analysis, with an equal number of images for each group, resulting in 5000 images for each car, truck, and bus vehicle type. Table 1 represents the number of images considered for analysis in three separate scenarios.  (2017) suggested a technique for smart data augmentation in which the proposed network learns to generate enhanced data during the learning process. Because there were fewer images accessible for buses and trucks than for cars in the current study, the number of images for buses and trucks was enhanced through the use of image augmentation techniques. An effort has been made to capture photos of approximately 400 quantities each for different types of buses and trucks on the road. By utilizing different image augmentation techniques, as shown in Figure 3, quantities of captured images have increased. Even though the image augmentation techniques greatly aid in increasing the data size, one should be careful when applying the required image augmentation technique for the given problem because sometimes wrong augmentation techniques can degrade the quality of the dataset, which in turn degrades learning capability.

Training, validation, and testing data set
In the proposed work, from the total image dataset, 80% of the images have been used to train the learning algorithm and the remaining 20% have been used for testing the performance of the classifier. For tuning the SVM classifier, approximately 20% of the available dataset in the training set has been utilized. Dataset modeling, training, and testing have been demonstrated in Figure 4.

Feature extraction algorithms
Many feature extractor techniques are available to execute the aforementioned task; however, in the current work, HOG, SIFT, SURF, and Haar have been examined for experimental analysis, and the same algorithms have been explained as follows. Dalal and Triggs (2005) first derived HOG algorithm for human detection in 2005. Since then, several studies have been carried out in the automotive field for pedestrian detection, vehicle detection, etc. HOG operates using the sliding window technique with width and height that slides across an image. Usually, sliding window size is eight times more compared to HOG cell size; that is, sliding window pixels will be 32x32, 48x48, and 64x64 for the respective HOG pixel size of 4x4, 6x6, and 8x8. HOG extraction computes the magnitude and orientation of the gradient. The orientation histogram shall be derived from magnitude and orientation. If p(x,y) is a pixel data at (x,y) position of an image, then the HOG feature is to calculate the difference value, dx and dy for the x and y direction, respectively, as mentioned in Equations (1) and (2). The gradient magnitude and gradient orientation can be calculated as mentioned in Equations

Scaled Invariant Feature Transform (SIFT)
SIFT technique shall extract scale-invariant features from the image and this was published in 1999 by David Lowe (Lowe 1999). For a given set of reference images and for the given new image, an object is recognized individually, comparing each feature. Here Gaussian function is applied in the scale space to locate maxima and minima in the horizontal and vertical directions, as mentioned in Equation (5).

Speed-Up Robust Feature (SURF)
The SURF detector descriptor has been proposed by Bay et al. (2008) to improve speed and accuracy compared to scale-invariant feature transform (SIFT). The blob detector method has been used with the Hessian matrix to detect intersecting points, as mentioned in Equation (8).
Once intersect point is made, then local neighborhood description is done using Hessian matrix; given a pixel, the Hessian of the pixel is as shown in Equation (9).

Haar
Haar is a rectangular feature and it can be defined as the difference between the sum of image pixels inside the rectangle within the original image at any position. Later this simple rectangular feature was modified for 2, 3, and 4 rectangle features. Viola and Jones (2004) take credit for defining 3-rectangle and 4-rectangle features. The retrieval process of the four corner coordinates is done by calculating the image features in the integral image. From Equation (10), each point is scanned in the image, and then the integral image is calculated to calculate Harr features more efficiently.
An (x', y') is the integral image for the input image of AI(x,y).

Vehicle classification
Many machine learning classifiers are available for image processing applications, and multiclass SVM has been chosen in the current work because of its widespread use in automotive image processing. SVM classifier is a supervised learning algorithm that separates two classes with the maximum margin possible. The support vector in the classifier constructs the hyperplane that acts as a class descriptor between two datasets optimally. As shown in the final SVM equation (Equation 11), where Xi is the input image and X is the dataset image used for training with hyperplane parameters w and w0. The architecture for SVM-based vehicle classification is shown in Figure 5.

Vehicle detection and grouping
The classification model assigns vehicles to the car, bus, or truck class types, and the lead vehicle in platoon management groups the vehicles accordingly, depending on the classification model's results. In the current study, classification is limited to only three vehicles: truck, bus, and car; therefore, grouping will be constrained as well. Based on the grouping outcome, the lead vehicle in the platoon management shall decide on the optimal placement of vehicles to achieve better fuel or energy efficiency. The current research effort does not focus on the optimal vehicle placement aspect, but it is being considered for future work awaiting the completion of the current study's learning algorithm.

Results and Discussion
The accuracy of the HOG, SIFT, SURF, and Haar feature extraction methods have been evaluated. Accuracy can be calculated as mentioned (Equation 12).
Where true positive and true negative represents the correctly predicted classes, false positive and false negative are those that are incorrectly predicted as positive or negative. As mentioned in  (12), total accuracy has been computed and the model has achieved 83% and the same can be visualized in the experimental result from Figure 6 for the HOG feature extractor.  Table 3 for the HOG feature; here, correctly classified images are 1995,4800 and 2800 images for bus, car, and truck class, respectively. Misclassified images are mentioned in the red color cells. Based on the misclassified images, the false positive rate is 560,300 and 845 images for bus, car, and truck class respectively. A false negative is 905,200 and 600 images for bus, car, and truck class respectively. By using Equation (12), the total accuracy achieved is 85% and the same can be visualized in the experimental result from Figure 7 for the HOG feature extractor. As mentioned in Table 1 for scenario-3: Added Augmented Image Dataset, each class of bus, car, and truck have 2900, 5000, 3500 images in the dataset. Additional images for bus and truck class have been increased by taking photos for bus and trucks of nearly 400 each and then using image augmentation technique, the quantity of the image dataset size has been increased. After increasing the image dataset size for bus and truck, the total images in each class is 5000. The results are shown in Table 4. Where for the HOG feature extractor, correctly classified images are 4680,4900 and 4710 images for bus, car, and truck class, respectively. Misclassified images are mentioned in the red color cells. Based on the misclassified images, the false positive is 310, 60, and 340 images for bus, car, and truck class, respectively. For the bus, car, and truck classes, the false negative is 320, 100 and 290 images, respectively. By using Equation (12), the total accuracy achieved is 85% and the same can be visualized in the experimental result from Figure 8 for the HOG feature extractor.  A similar approach has been used to calculate average accuracy for all other feature extractors, SIFT, SURF, Haar, and consolidated values of all the mentioned feature extraction methods, as shown in Table 5. Where Figure 6, Figure 7, and Figure 8 show the experimental result and compare the true positive versus false-positive rates of HOG, SIFT, SURF, and Haar through the ROC curve for scenario-1, scenario-2, and scenario-3, respectively. In particular, for scenarios-2 and 3, the SIFT and SURF feature extractions performed approximately identically. This can be seen in Figure 7 and Figure 8, where the ROC curve of SURF overlaps with the ROC curve of SIFT, allowing just SURF's performance curve to be visualized.

Conclusion
Because gathering appropriate image datasets for machine learning algorithm-based solutions in automotive image processing takes a lot of time and effort, this work shows how image augmentation can be used to create the necessary dataset and perform vehicle classification in automotive image processing. For various dataset sizes, analysis was conducted using prominent feature extractors such as HOG, SIFT, SURF, and Harr. In comparison to previous algorithms, HOG with SVM has shown improved performance for minimal images, unbalanced images, and augmented image datasets. As a result, the image augmentation technique can be used to increase the machine learning algorithm's accuracy. However, researchers have to aptly analyze and select appropriate image augmentation techniques for their research problem to avoid adverse effects. Based on the current work outcome, further work is being carried out to increase the classification of different vehicle classes in the vehicle platoon management system. The future study will expand the vehicle class and group them according to their diverse shapes, thereby improving the performance of vehicle platoon management utilizing HOG and SVM.