How to construct a simple convolution Neural Network with Keras Framework 07/15 Update SLTechnology News&Howtos

How to construct a simple convolution Neural Network with Keras Framework

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces how to use Keras framework to build a simple convolution neural network, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Problem introduction

Before discussing the details of the issue, I would like to introduce the business process. Avito.ma is a leading Moroccan e-commerce advertising platform where users can advertise and sell second-hand goods or new products such as mobile phones, laptops, cars, motorcycles, etc.

Now, let's discuss this problem: in order to advertise and sell a product, you must first fill out a form, describe your product profile, set the price, and upload the corresponding photos. After successfully completing these fields, you must wait about 60 minutes before posting your submitted ad after the pictures have been reviewed and verified by the relevant management of the site.

Nowadays, in the era of deep learning and computer vision, checking web content manually is considered a flaw, and it is very time-consuming, and it can produce many errors. For example, the following error, website reviewers posted ads for laptops listed in the phone category, which is wrong and will affect the quality of search engines, which can be done in a second through a deep learning model.

Notebook ads are posted in mobile phone links.

In this blog post, I'll show you how to optimize this process by building a simple convolution neural network using the Keras framework that analyzes whether the uploaded image is used for mobile or laptop ads and tells us whether the image matches the ad category.

The blog post divides the case into five specific steps.

Data collection

Data preprocessing

Data modeling

Use TensorBoard to analyze the model

Model deployment and evaluation

1. Data collection

Just like any data science project, the first component we should look for is data, in which case we will delete a set of images from the Avito.ma of the same website for both laptop and mobile phone products, and the result folder will contain two subdirectories called "laptop" and "phone". The size of the downloaded image is between 120 × 90 and 67 × 90, and each subdirectory has three RGB channels. The following is a snapshot of the code that performs this task, and the complete code is provided in the notebook. (https://github.com/PaacMaan/avito_upload_classifier/blob/master/avito_image_classifier.ipynb)

Once this process is completed, we get 2097 laptop images and 2180 mobile phone images. In order to make the classification more accurate and unbiased, we need to verify the two classes with almost the same number of observations, because we can visualize from the figure below, and the number of the two classes is roughly balanced.

The image is distributed on the class

two。 Data preprocessing

For the preprocessing task, we divide it into three subtasks, as follows:

2.1 Delete noise data

When you manually check the downloaded images, you will notice that there are some noisy images that have nothing to do with the relevant categories, for example, the following (mobile phone charger, mobile phone bag, virtual reality glasses) can be observed in the mobile phone folder:

Noise image found in mobile phone image

Unfortunately, there is no automatic way to solve this problem, so we have to look at them manually and start deleting them to retain only the images associated with the corresponding class.

2.2 Image resizing

This step depends entirely on the deep learning architecture used, for example, when using the Alexnet model to classify images, the input image size should be 22 × 227, while for VGG-19, the input image size should be 224 × 224.

Since we do not intend to use any pre-built architecture, we will build our own convolution neural network model with an input size of 64 × 64, as shown in the following code snapshot.

To perform this task, we create another directory called preprocessed_data in the two subdirectories phone and laptop, and then we loop through each image in the original raw_data folder to resize it and save it in the newly created directory.

As a result, we finally got the newly generated datasets of the two classes in the format 64 × 64.

2.3 data split

After resizing the dataset, we split it into 80% for the training set and leave the rest for validation. To perform this task, we created a new directory called data, where we set up two other new directories, train and validation, and we will set up two class images for mobile phones and laptops.

More specifically, we define the current target and target directory, and then we fix the ratio of the training set to 0.8 and the ratio of verification to 0.2 to measure the number of images we will move from the original path to the target path.

Code snapshot that performs data split

The folder hierarchy needs to be well visualized, which is the project tree view:

Global project structure

3. Data modeling

Now we have come to the main step of this pipeline, data modeling, for which we will build a convolution neural network that will train thousands of mobile phone and laptop images we have processed before.

In computer vision, convolution operation is one of the basic building blocks of convolution neural network, which requires four necessary components:

The main components of convolution neural network

For this model, we will discuss how each component implements it using Keras and its own parameters from convolution to the fully connected layer, but first, let's discover the complete architecture of the built-in model.

Convolution neural network (CNN) model architecture

Convolution layer

After instantiating the sequential object to the model, we use the add method to add a convolution layer called Conv2D, where the first parameter is the filter, which is the dimension of the number of outputs, as shown in the model summary, the shape of the first layer output is (None, 62, 62, 32).

For the second parameter, kernel_size specifies the length of the 1D convolution window, where we choose a window size of 3 × 3 to convolution the input volume.

The third parameter represents input_shape, which is the size of 64 × 64 × 3 associated with image_width x image_height x color channels (RGB), respectively, and last but not least, activation_function, which is responsible for adding nonlinear transformations. In this case, we choose the relu activation function.

Illustration for convolution using kernel_size = (3p3)

Maximum pool layer

The reason for adding the maximum pool layer after convolution is to reduce the amount of features extracted by the convolution layer we applied previously, in other words, we are interested in the location of these features.

In order to generalize its height, if there is a vertical edge from x to y, reduce the vertical edge of the image to the height of 2 of the image.

All of this process is restored in one line of code in Keras:

Here, we use the add method to inject another layer of Maximum Pooling called MaxPooling2D, where pool_size is the window (2), and by default strides = None and padding = 'valid'.

The diagram of the maximum pool and pool_size = (2jue 2)

Flattened output

When ending the convolution neural network (CNN) model, it is a necessary step to flatten the maximum pool into a continuous one-dimensional vector.

What Keras does here is to add a Flatten layer to the network, which is simply equivalent to reshaping the functions in numpy using the'C 'sort.

Fully connected layer

Finally, we inject the last layer into the fully connected layer network, which you can think of as an inexpensive way to learn nonlinear combinations of features extracted from previous convolution.

Keras is easy to achieve by adding the Dense function to the network. It requires only two parameters, units and activation, which represent the number of output units we will have, respectively, because we are doing binary classification, so it takes a value of 2 and activates the function to use.

Compile the network

Finally, we must compile the network we just built by calling the compile function, which is a necessary step for every model built using Keras.

Loss parameter, because we have a binary classification, where the number of class M is equal to 2, and the cross entropy can be calculated as:

Objective function of binary Cross Entropy

Where p is the prediction probability and y is the binary indicator (0 or 1).

To minimize this objective function, we need to call the optimizer, such as adam, which is short for Adaptive Moment Estimation, whose learning rate is set to 0.001 by default, but does not close the hyperparameter adjustment window. To summarize what we have done, here is the complete code for the built-in model.

Load image and data conversion

To provide images to our compiled model, we call the ImageDataGenerator function, which will help us generate batch tensor image data with real-time data enhancement. The data will be looped (batch).

Now that we have created two ImageDataGenerator instances, we need to use the classification pattern to provide the correct path for training and validating the dataset.

Once the train and validation sets are ready to provide information for the network, we call the fit_generator method to provide them to the model.

Usually we prepare another test data set, in addition to evaluating the validation of the final training model, but in order to keep it simple, and only evaluate on the validation set.

Model evaluation

After completing the training, our accuracy is about 87.7%, and there is still a high loss rate of 0.352, but having high accuracy does not necessarily mean that we have good model quality. We need to track and visualize the behavior of the model in time, so we use TensorBoard provided by Keras as a callback function to run with the TensorFlow backend.

4. Use TensorBoard to analyze the model

In this step, we will see how to use TensorBoard to analyze the behavior of our model. TensorBoard is a tool that uses models built at the back end of TensorFlow to help us basically visualize the training of our model over time and observe the relationship between accuracy and verification accuracy or loss and verification loss.

With Keras, you can restore this step in only one line of code by calling the TensorBoard function and inject it as a callback when fitting the data.

Loss (Loss)

Loss histogram of training and verification sets

As can be clearly seen from the above figure, for the training line from 0.39 to 0.13, the loss decreases significantly, while for the verification line, starting from 0.42 and taking 25 cycles to 0.35, it gradually decreases.

Personally, whenever I want to evaluate the model, I see the validation loss, and what we can see here is that after 19 cycles, the validation loss begins to increase slightly, which may cause the model to remember many input samples and verify this hypothesis. we better check the accuracy histogram.

Accuracy.

Evolution of the accuracy histogram of training and verification sets

As we have seen, the verification accuracy increased until the 19th cycle it became somewhat stable and had the expected decline and rise, which can be explained by the verification loss behavior when it began to increase in the same period.

In order to maintain good model quality, it is recommended to use early stop callbacks in this case, which will force the model to stop training with a certain tolerance when verification losses begin to increase or accuracy decreases.

5. Using Flask to deploy the model

Before moving on to the deployment details, we first need to save our previously trained model, for which we call the save method, as follows:

Once our model is saved, we can use it later to predict new image classes.

Why use Flask?

Flask is a Python micro-framework inspired by the reference to "Do Thing and Do It Well", which is why I chose Flask as the REST API to provide the model.

The Flask application consists of two main components: the python application (app.py) and the HTML template, and for app.py, it will contain logical code that performs the prediction, which will be sent as an HTTP response. This file contains three main components and can be displayed as follows:

Load the saved model.

Convert the uploaded image.

Use the loaded model to predict its appropriate class.

In the next section, we will discuss the most important components of these aspects.

Back to the main question.

When a user chooses a laptop as an advertising category, he is expected to have to upload an image of the laptop, but what is happening is different. As we have seen before, there are many advertisements with pictures of laptops marked with mobile phone categories.

After running the application and assuming that the model has been successfully loaded, users can upload images of different sizes, while our model can only predict images of 64 × 64 × 3, so we need to convert them to the correct size. so that our model can predict it well.

Code snapshots that deal with uploaded images

After converting the uploaded image, we send it as a parameter to the loaded model to predict and return the HTTP response as a JSON object, where the pattern is as follows:

The first attribute is the image prediction class, and the second attribute is a Boolean value indicating whether the category selected from the user matches the uploaded image. Below I show a snapshot of the code logic that performs this work.

Application demonstration

To run the application, we simply change to the folder where the app.py was created and run the following command:

Then we browse the following URL:http://127.0.0.1:5000 / displayed on the console, once the index page is displayed, select the ad category and upload its related photos, send the request to the path / upload behind the scenes and save the photos in the directory to predict its appropriate class.

This is a live demonstration of what we can build at the end of this project.

Web application demo

If the selected and predicted classes match, you will get a success message that everything is fine, otherwise you will receive a warning message and the selection box will automatically change to the corresponding prediction class.

Conclusion

Finally, this blog demonstrates a complete computer vision pipeline by building a deep learning model, which can predict the categories of uploaded images applied to e-commerce scenarios, from data collection to data modeling, and deploy the model as a Web to complete the application.

Ways to improve:

1. Increase the data size and remove noise by deleting more images for both classes.

two。 Super-parameter adjustment for learning rate and beta value.

3. Try other architectures, such as Lenet-5. (yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf)

4. Use Dropout on a fully connected (dense) layer.

On how to use the Keras framework to build a simple convolution neural network to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.