How to realize face recognition by Pytorch 07/02 Update SLTechnology News&Howtos

How to realize face recognition by Pytorch

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to achieve face recognition with Pytorch". In daily operation, I believe many people have doubts about how to achieve face recognition with Pytorch. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how to achieve face recognition with Pytorch". Next, please follow the editor to study!

I. face recognition

Face recognition is a relatively mature technology.

Its figure can be seen everywhere, facial scanning payment, information audit, monitoring search, face coding and so on.

More often, it is convenient for our life, without leaving home, we can achieve a variety of APP real name authentication, information audit.

Some companies also have facial scanning payment systems open to internal employees, which can "buy and brush" inside the company without a mobile phone or work card, with a beautiful or handsome face.

Second, face coding

In addition to these routine operations, you can also code specific characters in the video.

Tong Zhuo revealed that he cheated in the college entrance examination, which can be described as a deceptive tutorial at the "textbook" level.

This "wayward" coding method is simple and rough.

For video, manual post-frame-by-frame processing, coding task is undoubtedly a "physical work".

But if combined with face recognition technology, the task will be much easier.

Third, face recognition technology

Face recognition technology includes a variety of algorithms, and the whole process is as follows:

The location of human face is detected by using detection technology.

Using landmark technology, the key points of human face are detected.

According to the position of the face and the key points of the face, the face region is cut out, and the face image is corrected according to the key points to get the "standard face".

Calculate the facial feature vector of the "standard face".

Compare with the face feature vector of "face database", calculate the distance of the vector, find the closest person, and output the result of face recognition.

1. Face detection

Input: the original image that may contain a human face.

Output: bounding box of face position.

This step is generally called "face detection" (Face Detection). There are many libraries that can be used for face detection algorithms, such as OpenCV, dlib, face_recognition, RetianFace, CenterFace and so on.

There are too many to count.

Of course, it is possible to implement one by yourself with classic detection algorithms such as yolo and ssd.

2. Face clipping and correction

Input: original image + face position bounding box.

Output: "calibrated" images containing only faces.

This step requires the use of landmark algorithm to detect the key points in the face, and then align and calibrate the face according to these key points.

The so-called key points are the green points shown in the following picture, usually the position of the corner of the eye, the position of the nose, the outline of the face, and so on.

With these key points, we can "calibrate" or "align" our faces.

The explanation is that the original face may be crooked. Here, according to the key points, affine transformation is used to uniformly "straighten" the face to eliminate errors caused by different postures as far as possible. This step is usually called Face Alignment.

3. Facial features

Input: calibrated single face image.

Output: a vector representation.

This step is to use the depth convolution network to convert the input face image into a vector representation. This vector is the feature of the human face, for example:

This dense 128-dimensional vector is the feature of a face, and you can also call it face coding.

Convolution neural networks are very good at extracting features.

For example, VGG16 is a relatively simple basic model in deep learning.

The image is input to the convolution neural network. After a series of convolution, the category probability is obtained by full connection classification.

The whole process goes like this:

In fact, the convolution neural network continues to convolution, downsampling, this is a process of feature extraction, and finally through the full link layer to get the category probability.

Facial feature extraction, we can also do this. We can remove the fully connected layer and use the calculated features (usually the last layer of the convolution layer, e.g. The conv5_3 in the figure is calculated as the extracted feature.

Unlike classification tasks, the loss loss function used in the end is different.

Ideally, we hope that the distance between "vector representations" can directly reflect the similarity of faces:

For the face image of the same person, the Euclidean distance of the corresponding vector should be relatively small.

For the face images of different people, the Euclidean distance between the corresponding vectors should be larger.

Therefore, the category center of each face should be as far away as possible so that it can be used to distinguish different people.

The loss commonly used in human face includes center loss, arcface loss and so on.

Face recognition is similar to fine-grained classification.

Anyone who has been trained in classified tasks should know it.

The two categories of trainers and pigs are easy to train because the characteristics of people and pigs are very different.

But it is more difficult to train the two categories of men and women, because the characteristics of men and women are very similar.

In order to better distinguish between men and women, it is necessary to use a loss function with a large distance between the centers of the category.

Face recognition is a fine-grained distinction, all people, but you have to distinguish between Zhang San, Li Si and Wang Erma.

4. Face recognition

Face recognition generally needs to establish a "retrieval database".

To explain briefly, we need to identify Zhang San, Li Si and Wang Erma.

In that case, we have to choose 10 pictures of each of Zhang San, Li Si and Wang Erma.

Then we use our trained facial feature model to extract everyone's facial features.

In this way, everyone has 10 facial features, which is a "retrieval library".

For the pictures that need to be recognized, after extracting the facial features, compare them with the existing facial features in the search database, and vote for the closest person.

Fourth, face coding

When the principle of face recognition technology is clear, we can use this technology to code Tongzhuo.

As we can see, there are many algorithms involved in face recognition technology, and it takes time to implement them in turn.

But this, it is not difficult for me to be an excellent "bag changer".

There are many open source third-party libraries, such as face_recognition.

It integrates face detection, face recognition and other interfaces.

Use face recognition technology to code Tongzhuo's face for this short video.

Sort out your thoughts:

First of all, we use programs like opencv to process video, which can only deal with pictures, not sound.

Therefore, we need to save the audio first, and then synthesize the processed video and audio, which not only ensures the coding of the picture, but also ensures that the sound is still there.

This can be implemented using ffmpeg.

Install ffmpeg and configure the environment variables.

Write the following code:

Import subprocessimport osfrom PIL import Imagedef video2mp3 (file_name): "" convert video to audio: param file_name: path to the video file: return: "outfile_name = file_name.split ('.') [0] + '.mp3' cmd = 'ffmpeg-I' + file_name +'- f mp3' + outfile_name subprocess.call (cmd, shell=True) def video_add_mp3 (file_name) Mp3_file): "" Video add audio: param file_name: path to incoming video file: param mp3_file: path to incoming audio file: return: "" outfile_name = file_name.split ('.') [0] +'- f.mp4' subprocess.call ('ffmpeg-I' + file_name +'- I) '+ mp3_file +'-strict-2-f mp4'+ outfile_name Shell=True)

Video to audio, video plus audio function has been written, next, we write the video automatic coding program.

First, install face_recognition.

Python-m pip install face_recognition

Face_recognition has detailed API documentation:

Https://face-recognition.readthedocs.io/en/latest/face_recognition.html

Let's first save the video we want to process locally:

Https://cuijiahua.com/wp-content/uploads/2020/07/cut.mp4

You can then use opencv to read every face in the video detection picture.

When import cv2import face_recognitionimport matplotlib.pyplot as plt#% matplotlib inline # is used in jupyter Remove comments cap = cv2.VideoCapture ('cut.mp4') ret, frame = cap.read () if ret: face_locations = face_recognition.face_locations (frame) for (top_right_y, top_right_x, left_bottom_y,left_bottom_x) in face_locations: cv2.rectangle (frame, (left_bottom_x,top_right_y), (top_right_x, left_bottom_y), (0,0,255) 10) plt.imshow (cv2.cvtColor (frame, cv2.COLOR_RGB2BGR)) plt.show ()

Running effect:

In this way, for each face detected in the screen, face recognition is Tongzhuo, then code it.

Type the picture, let's also use a simple and rough one.

Save the mask.jpg locally.

Then intercept a picture of Tong Zhuo's face, which can be used as a comparison library. Of course, it is OK to have more than one picture. One is enough here.

We chose this picture.

Download the picture to the local, write the following code, you can extract facial features.

Import face_recognitionknown_image = face_recognition.load_image_file ("tz.jpg") biden_encoding = face_recognition.face_encodings (known_image) [0] print (biden_encoding)

Running result:

You can see that 128-dimensional facial features can be extracted with a few lines of code.

The overall process is:

Use ffmpeg to save audio

Deal with the video and code Tongzhuo.

Add audio to the processed video.

Just look at the code.

# Author: Jack Cui# Website: https://cuijiahua.com/import cv2import face_recognitionimport matplotlib.pyplot as plt#% matplotlib inline # when used in jupyter Uncomment import subprocessimport osfrom PIL import Imagedef video2mp3 (file_name): "" convert video to audio: param file_name: path to the video file: return: "outfile_name = file_name.split ('.') [0] + '.mp3' cmd = 'ffmpeg-I' + file_name +'- f mp3' + outfile_name print (cmd) subprocess.call (cmd) Shell=True) def video_add_mp3 (file_name Mp3_file): "" Video add audio: param file_name: path to incoming video file: param mp3_file: path to incoming audio file: return: "" outfile_name = file_name.split ('.') [0] +'- f.mp4' subprocess.call ('ffmpeg-I' + file_name +'- I) '+ mp3_file +'-strict-2-f mp4'+ outfile_name Shell=True) def mask_video (input_video, output_video, mask_path='mask.jpg'): # Encoding picture mask = cv2.imread (mask_path) # read video cap = cv2.VideoCapture (input_video) # read video parameters Fps, width, heigth CV_CAP_PROP_FPS = 5 CV_CAP_PROP_FRAME_WIDTH = 3 CV_CAP_PROP_FRAME_HEIGHT = 4 v_fps = cap.get (CV_CAP_PROP_FPS) v_width = cap.get (CV_CAP_PROP_FRAME_WIDTH) v_height = cap.get (CV_CAP_PROP_FRAME_HEIGHT) # set write video parameters The format is mp4 size = (int (v_width), int (v_height)) fourcc = cv2.VideoWriter_fourcc ('masked,' paired, '44th,' v') out = cv2.VideoWriter (output_video,fourcc, v_fps) Size) # known face known_image = face_recognition.load_image_file ("tz.jpg") biden_encoding = face_recognition.face_encodings (known_image) [0] # read video cap = cv2.VideoCapture (input_video) while (cap.isOpened ()): ret Frame = cap.read () if ret: # detect face face_locations = face_recognition.face_locations (frame) # detect every face for (top_right_y, top_right_x, left_bottom_y,left_bottom_x) in face_locations: unknown_image = frame [top _ right_y-50:left_bottom_y+50 Left_bottom_x-50:top_right_x+50] unknown_encoding = face_recognition.face_encodings (unknown_image) [0] # comparison result results = face_recognition.compare_faces ([biden_encoding], unknown_encoding) # is Tong Zhuo It will be coded and mapped. If results [0] = = True: mask = cv2.resize (mask, (top_right_x-left_bottom_x, left_bottom_y-top_right_y)) frame[ top _ right_y:left_bottom_y Left_bottom_x:top_right_x] = mask # write video out.write (frame) else: break if _ _ name__ = ='_ _ main__': # Save audio as cut.mp3 video2mp3 (file_name='cut.mp4') # process video Automatically code, output video for output.mp4 mask_video (input_video='cut.mp4', output_video='output.mp4') # add sound video_add_mp3 (file_name='output.mp4', mp3_file='cut.mp3') to the video processed by output.mp4, and the study on "how to achieve face recognition in Pytorch" is over. I hope it can solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.