Image Classification: A Survey

The Classification of images is a paramount topic in artificial vision systems which have drawn a notable amount of interest over the past years. This field aims to classify an image, which is an input, based on its visual content. Currently, most people relied on hand-crafted features to describe an image in a particular way. Then, using classifiers that are learnable, such as random forest, and decision tree was applied to the extract features to come to a final decision. The problem arises when large numbers of photos are concerned. It becomes a too difficult problem to find features from them. This is one of the reasons that the deep neural network model has been introduced. Owing to the existence of Deep learning, it can become feasible to represent the hierarchical nature of features using a various number of layers and corresponding weight with them. The existing image classification methods have been gradually applied in real-world problems, but then there are various problems in its application processes, such as unsatisfactory effect and extremely low classification accuracy or then and weak adaptive ability. Models using deep learning concepts have robust learning ability, which combines the feature extraction and the process of classification into a whole which then completes an image classification task, which can improve the image classification accuracy effectively. Convolutional Neural Networks are a powerful deep neural network technique. These networks preserve the spatial structure of a problem and were built for object recognition tasks such as classifying an image into respective classes. Neural networks are much known because people are getting a state-of-the-art outcome on complex computer vision and natural language processing tasks. Convolutional neural networks have been extensively used.


Machine learning
Machine learning is an operation that requires teaching any computer system how to forge accurate predictions when any kind of data has been fed into the computer system [1]. These predictions could be the answer to whether a piece of vegeta-

Open Access
A. Singh et al.

ISSN (Online) : 2582-7006 2
Journal of Informatics Electrical and Electronics Engineering (JIEEE) A2Z Journals ble in a photo is broccoli or a beetroot, analyzing Twitter comments as negative, positive, or neutral, predicting stock prices of the market, whether an email is a spam or not, or recognizing a speech accurately to generate some captions for a video.
The important difference between traditional computer software and a machine learning model is that humans have not written any code that tells the computer system how to tell the difference between broccoli and Beetroot. Instead, the machine-learning model has been taught how to accurately differentiate between the Vegetables by training the model on a large amount of data, for instance, a great number of photos with vegetables.

Deep learning
Deep learning is a sub-division of machine learning. It is a branch that is found on learning and ameliorating on its very own by inspecting certain computer algorithms. While machine learning utilizes easier concepts, here deep learning performs with various neural networks that are constructed to emulate how a human learns and thinks. Until now, these neural networks were just restricted by the computing power and consequently were confined in complexity [11]. Advancements in big data analytics have allowed greater, sophisticated neural networks, allowing the computers to think, learn, and react to byzantine situations very faster than humans. Deep learning has assisted in the classification of images, translation of language, etc.

Supervised Learning
In machine learning, such type of learning is the type where we can see that the learning is assisted by someone, for example, a professor. So, we have a dataset that acts as the professor, and the role of a dataset i.e. the professor is to train the model or the machine. After the model gets trained, the model can then process making a prediction, the decision when the new data is fed to it.

Unsupervised Learning
A model learns through the observations and finds structure in those data. When the model is given the dataset, it can automatically find a certain pattern and relationship in that dataset by constructing some clusters in it. What the model cannot do is that it can't add a label to clusters, for example, it could not say that this is a group of vegetables or fruits, however, it can discriminate all the vegetables from the fruits. For instance, we present some images of cars, trucks, and planes to a model, so what it will do, depending on some patterns and relationships, will create some cluster and divide that particular data set into those clusters. Later on, if fresh data is put into the model as input, it puts that into one of the already built clusters.

Reinforcement learning
It is a potential of an agent, to inter-relate with its environment and figures out what would be the perfect possible outcome.
The agent follows the notion of the trial and hit method. Here, the agent is then guerdoned with one point for one correct or penalized by one point for one incorrect answer, and based on the correct reward points gained by the agent, the model will train itself. Then once again, when the model is trained, the model would get prepared to make predictions regarding the fresh data given to it [1].

Applications of Machine Learning
Machine learning is probably the most exhilarating technology that one has come across. As obvious from its name, it makes a computer similar to a human being [3]. Machine learning is used extensively all around. Some of its applications are

d) Virtual Personal Assistants:
We have different virtual personal assistants such as Alexa and Siri. These assistants help us in finding the information using our instructions. Virtual assistants can also aid us in another way by using our voice instructions such as Play songs, call, send an email, Schedule an appointment, etc. e) Image recognition: Image recognition is probably the most common application using machine learning. It finds its usage in identifying certain objects, people, venues, and digital images, etc. The most common use of image recognition is Automatic friend tagging suggestion, Facebook provides this feature.

Steps for machine learning
The process for machine learning occurs in 4 important stepsa) Gathering of data: The first and foremost step is to feed some data to the computer through which our model learns. The data can be in any kind of formats such as text files or excel sheets.

b) Data preparation:
This step involves taking only the useful information from the whole data set which is fed into the computer. The data which is useful for the model for processing is only taken into consideration and the rest is discarded. This step also involves removing the unwanted data, checking for values that are missing, and treatment of the outliers.

c) Training set and testing set:
After that data is filtered in the second step, that data is then divided into 2 sets. One of the sets is called a training set which is used for creating the models and the second set, i.e. test sets is then used to check the accuracy of the model created [3].

d) Evaluation:
This step involves evaluating the model. To verify the accuracy of the created model, the model is tested on some data which was not present in the data used for its creation. Here, the test data is used.

Convolutional Neural Network
A Convolutional Neural Network also called ConvNet or CNN is a Deep Learning algorithm that takes into account an image as input and then assigns importance i.e. weights and biases to certain aspects or objects in the image and can differentiate them from the other [2][3]. The pre-processing required in a Convolutional neural network is lower as compared to the other classification algorithms available. Technically, deep learning models take each image as input and then pass it through several

Steps for building a Convolutional neural network
1) Firstly, we provide an input image into the convolution layer 2) Next, we choose some parameters; apply filters with padding and strides, if required. We then perform convolution on the input image and apply activation functions [1].
3) We then need to perform pooling to reduce dimensionality size 4) Many convolution layers can be added as per need 5) Flatten the output and feed into a dense layer 6) Finally, train the model using the training set.

Dataset
In this project, I have used the CIFAR-10 dataset (Figure 1). This dataset consists of 60 thousand 32x32 RGB images with 10 classes, each class having 6000 images per class. There are 50 thousand training images and 10 thousand test images. This dataset was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The classes are completely mutually exclusive.
There is no overlap between the classes, automobiles, and trucks. TensorFlow is a Python library that is imported for fast numerical computing. It was created and released by Google. TensorFlow is a foundation library that can is used to build models of deep learning directly or by using some wrapper libraries that can simplify the process built on top of TensorFlow [14].

b) OS:
The OS library in python provides useful functions for interacting with one's operating system. The OS module comes under Python's standard utility modules. This module provides a portable way of using the operating system's dependent functionality. The 'os' and 'os.path' modules subsume various functions to interact with the file system.

c) Numpy:
The Numpy library is the core library used for scientific computing in Python. This library provides a high-performance multidimensional array object and various useful tools for working with such arrays.

d) Matplotlib:
This useful library is a visualization library. One of the many benefits of visualization is that it allows us visual access to a large amount of data in easily digestible visuals such as line, bar, scatter, histogram, etc [14].
Next, I have created a function called get_three_classes to pre-process the CIFAR-10 dataset. For simplification, I have used only the first 3 classes in the dataset i.e. 'Airplane', 'Bird', and 'Automobiles'. After selecting the first three classes, it is necessary to remove the unwanted data i.e. the images for the other classes as they will be superfluous to our model. Using functions from the Numpy module, images corresponding to the 3 selected classes are selected and then are shuffled randomly and stored. Lastly, the labels are converted to their numerical representation using One-Hot encoding. Machines understand numbers and not text. Hence, we need to convert each text category i.e. our classes, to numbers for the machine to process them using mathematical equations.
The next step was to load the CIFAR-10 dataset from Keras. Deep neural networks, being the current passion, the convolution of its extensive frameworks have always been a hurdle for their application for all the fledglings to machine learning.  3. Batch normalization is a technique used for improving the performance, speed, and stability of deep neural networks. As the name suggests, it is used to normalize the input layer by re-centering and re-scaling. 4. Dropout is simply a regularization technique. A simple way to avoid over-fitting. Using this, randomly selected neurons are ignored during training.

5.
Flattening is used to transform a 2-dimensional array of features to a 1-D array that can be used as input to a fully connected (FC) neural network layer. It is used before the FC layer.

6.
A dense layer is a basic deeply connect neural network layer. This layer operates below with the input and then returns one output.

Output = activation (dot (input, kernel) + bias
After importing the layers from Keras, I created two more functions called add_conv_block and create_model, using which the model is created. Then the summary of the model is displayed (Table 1).   As the final step, I loaded the best model i.e. the one with the highest accuracy, and then using the predict function, my model made the predictions and with help of the show_random_examples, the predictions are displayed (Figure 4). Also, I displayed the model execution (Table 2) showing the epoch, time, loss, accuracy, validation loss, and validation accuracy.

Conclusion
Machine learning is a rapidly developing field in Computer Engineering. It has applications in each other field of study and is as of now being actualized industrially because machine learning can take care of issues excessively troublesome or tedious for people to explain. An advantage to utilizing a convolutional neural system is that it is intended to all the more likely handle picture and speech recognition tasks. Rather than hidden layers, convolutional neural networks have a convolutional and pooling layer. It is a result of these layers that convolutional neural networks are favored for Image classification tasks.