A PyTorch Operations Based Approach for Computing Local Binary Patterns

Advances in machine learning frameworks like PyTorch provides users with various machine learning algorithms together with general purpose operations. PyTorch framework provides Numpy like functions and makes it practical to use computational resources for accelerating computations. Also users may define their custom layers or operations for feature extraction algorithms based on the tensor operations. In this paper, Local Binary Patterns (LBP) which is one of the important feature extraction approaches in computer vision were realized using tensor operations of PyTorch framework. The algorithm was written both using Python code with standard libraries and tensor operations of PyTorch in Python. According to experimental measurements which were realized for various batches of images, the algorithm based on tensor operations considerably reduced the computation time and provides significant accelerations over Python implementation with standard libraries.


Introduction
Machine learning applications usually involves special preprocessing for feature extraction and these can have significant effect on the success of the designed model. Especially developing a deep learning model may require defining custom layers for better feature extraction and training the model usually involve large datasets . When Python script with standard libraries are used for writing programs computation times may increase considerably, depending on the size of the dataset and the type of the processed data. Because Python is an interpreter based language, evaluation of the program script line by line usually increases the computation time specifically for loop operations. Various libraries like PyTorch are provided to make computations faster for machine learning algorithms as well as practical. PyTorch frameworks includes compiled algorithms mainly designed for machine learning and deep learning applications such as natural language processing , sequence processing and computer vision. Using the tensors of PyTorch frame work for writing a custom operation eliminates, if possible, most of the loops and it enables efficient utilization of computational resources. Custom operations can be used to build custom layers in machine learning and deep learning applications (Paszke et al. 2019). Various researchers design their frameworks, custom layers, custom loss functions or custom operations based on the available PyTorch operations such as a kernel design for xnor and bitcount computations (Xu and Pedersoli 2019), Bayesian optimization framework in PyTorch (Balandat et al. 2019), normalized convolutional neural network (Kim et al. 2020).
Some of the deep learning applications requires preprocessing of special layers for feature extraction that are not exist in the algorithms of PyTorch library. In such cases programmers may define their custom layers based on the present operations without writing a tensor operation from scratch. In computer vision applications, one of the important feature extraction approaches is the LBP transform (Pietikäinen et al. 2011;Pietikäinen 2010). Due to efficient and computationally lightweight structure, it has been used in various machine learning applications such as LBP network for face recognition (Xi et al. 2016), facial expression recognition (Rahul, Kohli, and Agarwal 2020), image forgery detection (Alahmadi et al. 2017), acoustic scene classification (Yang and Krishnan 2017), dermoscopic skin lesion image segmentation (Pereira et al. 2020), Hyperspectral image classification (Tu et al. 2019), on-road vehicle detection in urban traffic scene (Hassaballah, Kenk, and El-Henawy 2020).
Computer vision algorithms like LBP usually requires high CPU usage due to various matrix multiplications and additions. In the cases of machine learning or deep learning model developments, computational power need increases further as the dataset and the image dimensions to be processed are increased. In this work, a custom function for the accelerating the computation of LBP transform was implemented based on the available PyTorch operations. Various dimensions of images up to 448448 were used in the performance evaluations. Also the effect of batch size were evaluated by changing the number of images in test batches up to 256. In order to obtain acceleration results, LBP algorithm was also implemented using standard Python functions. In the following section, the details about LBP transform and PyTorch implementation for the LBP were introduced. Experimental running time evaluations were presented in Section 3. Discussions and concluding remarks were given in the final section.

Local binary patterns
LBP is an efficient as a pattern descriptor that has been used in various computer vision applications (Ojala, Pietikäinen, and Mäenpää 2002) including machine learning and deep learning applications. LBP transform involves checking the neighbors of a pixel within a given radius with the center pixels. According to the results of each comparison, an LBP value is formed as a result. A general representation for LBP transform for a given pixel is given by Formula 1. In this equation, R is the radius of the area, K is the number of neighbors, pc and pk represents the center pixel and a selected neighbor pixel respectively. The function f(.) represents a comparison function. In the application of this equation, sum operator selects a neighbor pixel and compare it with the pixel at the center. If the pixel at the center is smaller, corresponding value is set to 2 k otherwise it is set to zero. This is repeated for all neighbors to complete transform.
In the present study, 3×3 window which is the common case in practice was used. Hence, the number of neighbors was set to 9 and radius was set to 3. Figure 1 shows an example computation for a selected pixel. Once a neighbor is selected for comparison, the result is added to sum. According to formula, the first neighbor forms the Least Significant Bit (LSB) and the last neighbor forms the Most Significant Bit (MSB) when the result is considered in binary form. This operation is repeated for all pixels in order to obtain LBP transform. An example image and its LBP equivalent are shown by Figure 2.

PyTorch implementation
Python is one of the mostly preferred programming languages for machine learning and scientific computations. On the other hand, Python programs usually take long running times when the costly algorithms written using scripts instead of compiled functions. PyTorch is one of the popular frameworks that provides various accelerated algorithms for developing machine learning and deep learning applications. It is designed to work with tensor based algorithms that enables accelerated computation. As its name implies PyTorch is primarily designed for Python programming language, but it also provides language bindings for Java and C++ as well as working on various operating systems as shown by Figure 3 where the hierarchical structure of the framework is given. Data types in PyTorch are defined by multi-dimensional matrices which are called as tensor.
These enables the utilization of optimized algorithms that work, if exist, on multicore CPU and CPU devices. There are various tensor operation such as add() for adding tensors, mul() for multiplying each element of the tensor and ge() for greater and equal comparison. These can be used to design custom operations for specific purposes. LBP algorithm contains independent pixels operations and these can be computed using matrix definitions instead of using two for-loop to travel through the pixels of the image. The center and neighbor pixels can be selected in the form of matrices as described by   After the first comparison is done, the resulting matrix is multiplied with 2 0 as previously given by Formula 1 and then the result is written to a temporary variable. Similarly, after the second comparison and multiplication obtained results is accumulated in the temporary variable. Following computations for the other pixels are realized in the same way and the results accumulated in the temporary variable for the LBP transform. Element wise multiplications and addition of the resulting matrices can be done using mul() and add() tensors respectively. A verification test was done by comparing the results of each method for a given random input matrix as shown console outputs given by Figure 6. For both approach the input matrices are zero padded to maintain image size. According to outputs, tensor based approach produces the desired results. Also the size of test batches were selected as 2 n where n is varied between 0 and 7. Images for performance measurements were examined on images used from ImageNet [19] dataset and these images were scaled to test dimensions in each case. Although there may be small differences for the computation of Formula 1 due to if-else blocks, the number of pixel operations are independent from the pixel contents. Hence, contents of the images have trivial impact on the processing times for images in the same dimensions and the same size of batches are close to each other.   Table 1 shows example execution durations for Python implementation with standard libraries. For small sizes of images and small sizes of batches, the running times measured are considerably small. As the number of images or the image dimensions are increased, the execution times are also increased significantly. In the case of PyTorch based implementation, computation times are considerably reduced as shown by Table 2. When the CPU utilization graphs are examined, PyTorch significantly increase the CPU utilization when compared with the standard implementation as shown by Figure 7. The speedup results provided by PyTorch over Python implementation with standard libraries are given by Figure 8. The acceleration is mainly the result of compiled code and multicore CPU utilization. Differences among the speedup results for different batch sizes and image sizes are mainly depends on the management of threads, behavior of multicore CPU and the number of pixels operations.

Discussions and Conclusions
The running times for Python scripts is usually longer than the compiled algorithms since it is an interpreted language and executes the code interpreting it line by line. Comparison with PyTorch implementation show that compiled operations provide significant accelerations together with multicore implementation. Script implementation usually utilizes one CPU core and the increase in the computation times are approximately linear according to the number of pixels operations which are determined by image dimensions and batch size. For example the computation time for a 112112 image where batch size is set to 16 is about 4.04 second.
If the batch size is increased to 32 for the same image the computation time is increased to about 8.17 seconds which is nearly 2 times higher than the results for the batch size of 16. Similar behavior is not always observed in PyTorch results. For example the computation times for a 112112 image are 0.0009 and 0.0014 for batch sizes 1 and 2 res pectively. This can be related to the small computation times, multicore computations and variations in CPU frequency, and cache memory size. In general PyTorch library, reduces computation time to practical levels and in the case of more CPU with more cores or GPU support the results will improve especially for processing large sizes of data. Implemented, algorithm for LBP computations can be utilized as a feature extraction layer in various machine learning and deep learning algorithms for computer vision applications.