How to build a face recognition system with DL4J 07/12 Update SLTechnology News&Howtos

How to build a face recognition system with DL4J

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is about how to use DL4J to build a face recognition system. I think it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.

I. Overview

Face recognition is essentially a problem of similarity, the same faces are mapped to the same space, their distance is relatively close, the measure of this distance can be cosine distance, Euclidean distance, or other distance. There are three avatars below.

A B C

Obviously An and C are the same faces, An and B are different faces, how to describe them mathematically? Let's say that there is a distance function d (x1 ~ 2), then d (A ~ B) > d ~ (A ~ C). In real face recognition applications, what is the range of the function d (x1 ~ x2) that can be recognized as the same face? This value is related to the parameters of the training model, which will be given below. It is worth noting that if the function d is cosine, the higher the value, the more similar it is. A general face recognition model should include two units: feature extraction (that is, feature mapping) and distance calculation.

Second, the construction model

So is there any way to map features? For image processing, convolution neural network is undoubtedly the best method at present. DeepLearning4J has built-in a trained VggFace model, which is based on vgg16 training. Download address of vggFace: https://dl4jdata.blob.core.windows.net/models/vgg16_dl4j_vggface_inference.v1.zip, how did you get this address? Directly follow the DL4JResources.getURLString method in the source code VGG16,pretrainedUrl method to find the download address of the relevant model, and the model download address of VGG19, ResNet50, and so on pretrained can be found in this way. The source code is as follows

Public class VGG16 extends ZooModel {@ Builder.Default private long seed = 1234; @ Builder.Default private int [] inputShape = new int [] {3,224,224}; @ Builder.Default private int numClasses = 0; @ Builder.Default private IUpdater updater = new Nesterovs (); @ Builder.Default private CacheMode cacheMode = CacheMode.NONE; @ Builder.Default private WorkspaceMode workspaceMode = WorkspaceMode.ENABLED; @ Builder.Default private ConvolutionLayer.AlgoMode cudnnAlgoMode = ConvolutionLayer.AlgoMode.PREFER_FASTEST Private VGG16 () {} @ Override public String pretrainedUrl (PretrainedType pretrainedType) {if (pretrainedType = = PretrainedType.IMAGENET) return DL4JResources.getURLString ("models/vgg16_dl4j_inference.zip"); else if (pretrainedType = = PretrainedType.CIFAR10) return DL4JResources.getURLString ("models/vgg16_dl4j_cifar10_inference.v1.zip") Else if (pretrainedType = = PretrainedType.VGGFACE) return DL4JResources.getURLString ("models/vgg16_dl4j_vggface_inference.v1.zip"); else return null;}

The model structure of vgg16 is as follows:

= VertexName (VertexType) nIn,nOut TotalParams ParamsShape Vertex Inputs=input_1 (InputVertex)-conv1_1 (ConvolutionLayer) 3 pint 64 1792 W: {64 jue 3 Jing 3 Jing 3} B: {1 64} [input_1] conv1_2 (ConvolutionLayer) 64 36928 W: {64 conv1_1 64 pool1 (SubsamplingLayer) -,-0-[conv1_2] conv2_1 (ConvolutionLayer) 64128 73856 W: B: {1128} [pool1] conv2_2 (ConvolutionLayer) 128128 147584 W: {128 conv2_1 128 ConvolutionLayer 3}, b: {1128} [conv2_1] pool2 (SubsamplingLayer) -,-0-[conv2_2] conv3_1 (ConvolutionLayer) 128256 295168 W: {256 295168 W B: {1256} [pool2] conv3_2 (ConvolutionLayer) 256256 590080 W: {256 conv3_3 (ConvolutionLayer) 256256 590080 W: {256} [conv3_2] pool3 (SubsamplingLayer)- -0-[conv3_3] conv4_1 (ConvolutionLayer) 256512 1180160 W: {512 conv4_1 (ConvolutionLayer) 256512 1180160 W: {1512} [pool3] conv4_2 (ConvolutionLayer) 512512 2359808 W: {512 conv4_1] conv4_3 (ConvolutionLayer) 512512 2359808 W: B: {1512} [conv4_2] pool4 (SubsamplingLayer) -,-0-[conv4_3] conv5_1 (ConvolutionLayer) 512512 2359808 W: {512 pool4 3jue 3}, b: {1512} [pool4] conv5_2 (ConvolutionLayer) 512512 2359808 W: B: {1512} [conv5_1] conv5_3 (ConvolutionLayer) 512512 2359808 W: {512 conv5_3 3 PreprocessorVertex}, b: {1512} [conv5_2] pool5 (SubsamplingLayer) -,-0-[conv5_3] flatten (PreprocessorVertex)- -[pool5] fc6 (DenseLayer) 25088 fc8 4096 102764544 W: {25088 DenseLayer 4096 W, b: {1Magre 4096} [flatten] fc7 (DenseLayer) 4096 16781312 W: {4096 fc8 (DenseLayer) 4096 2622 10742334 W: {4096 B: {1cm2622} [fc7]- Total Parameters: 145002878 Trainable Parameters: 145002878 Frozen Parameters: 0

For VggFace, we only need the convolution layer and pooling layer to extract features, and the other fully connected layers can be discarded, so our model can be set as follows.

Note: the reason for using StackVertex and UnStackVertex here is that the tensor Merge is input together by default in dl4j, which cannot achieve the purpose of sharing weight among multiple inputs, so here we first use StackVertex to stack tensors along the 0th dimension, share convolution and pooling to extract features, and then use UnStackVertex to disassemble the tensor for later use to calculate the distance.

The next problem is that the migration learning api in dl4j can only append the relevant structure to the tail of the model, but now our scenario is to put part of the structure of the pretrained model in the middle, what to do? Don't worry, let's take a look at the source code of transfer learning API and see how DL4J is encapsulated. There are clues in org.deeplearning4j.nn.transferlearning.TransferLearning 's build method.

Public ComputationGraph build () {initBuilderIfReq (); ComputationGraphConfiguration newConfig = editedConfigBuilder .validateOutputLayerConfig (validateOutputLayerConfig = = null? True: validateOutputLayerConfig) .build (); if (this.workspaceMode! = null) newConfig.setTrainingWorkspaceMode (workspaceMode); ComputationGraph newGraph = new ComputationGraph (newConfig); newGraph.init (); int [] topologicalOrder = newGraph.topologicalSortOrder (); org.deeplearning4j.nn.graph.vertex.GraphVertex [] vertices = newGraph.getVertices () If (! editedVertices.isEmpty ()) {/ / set params from orig graph as necessary to new graph for (int I = 0; I < topologicalOrder.length; iTunes +) {if (! vertices [topologicalOrder [I]] .hasLayer ()) continue Org.deeplearning4j.nn.api.Layer layer = vertices [topologicalOrder.getLayer (); String layerName = vertices [topologicalOrder.getVertexName (); long range = layer.numParams (); if (range 1) {String positiveFilePath = PARENT_PATH + "/ pairs/1/" + name File positiveFileDir = new File (positiveFilePath); if (positiveFileDir.exists ()) {positiveFileDir.delete ();} positiveFileDir.mkdir (); FileUtils.copyFile (faceFileArray [0], new File (positiveFilePath + "/" + faceFileArray [0] .getName () FileUtils.copyFile (faceFileArray [1], new File (positiveFilePath + "/" + faceFileArray [1]. GetName ());} / / construct a negative example String negativeFilePath = PARENT_PATH + "/ pairs/0/" + name; File negativeFileDir = new File (negativeFilePath); if (negativeFileDir.exists ()) {negativeFileDir.delete () } negativeFileDir.mkdir (); FileUtils.copyFile (faceFileArray [0], new File (negativeFilePath + "/" + faceFileArray [0] .getName ()); File [] differentFaceArray = list.get (randomInt (list.size (), I)) .listFiles (); int differentFaceIndex = randomInt (differentFaceArray.length,-1) FileUtils.copyFile (differentFaceArray [differentFaceIndex], new File (negativeFilePath + "/" + implementFacebook Index]. GetName ());}} public static int randomInt (int max, int target) {Random random = new Random (); while (true) {int result = random.nextInt (max); if (result! = target) {return result }

After the test set is constructed, an iterator is constructed. NativeImageLoader is used to read pictures in the iterator. "how to use datavec in deeplearning4j to process images" is introduced in this article.

Public class DataSetForEvaluation implements MultiDataSetIterator {private List facePairList; private int batchSize; private int totalBatches; private NativeImageLoader imageLoader; private int currentBatch = 0; public DataSetForEvaluation (List facePairList, int batchSize) {this.facePairList = facePairList; this.batchSize = batchSize; this.totalBatches = (int) Math.ceil ((double) facePairList.size () / batchSize) This.imageLoader = new NativeImageLoader (224,224,3, new ResizeImageTransform (224,224);} @ Override public boolean hasNext () {return currentBatch < totalBatches;} @ Override public MultiDataSet next () {return next (batchSize) } @ Override public MultiDataSet next (int num) {int I = currentBatch * batchSize; int currentBatchSize = Math.min (batchSize, facePairList.size ()-I); INDArray input1 = Nd4j.zeros (currentBatchSize, 3224224); INDArray input2 = Nd4j.zeros (currentBatchSize, 3224224); INDArray label = Nd4j.zeros (currentBatchSize, 1) For (int j = 0; j < currentBatchSize; jacks +) {try {input1.put (new INDArrayIndex [] {NDArrayIndex.point (j), NDArrayIndex.all (), NDArrayIndex.all (), NDArrayIndex.all ()}, imageLoader.asMatrix (facePairList.get (I). GetList (). Get (0)). Input2.put (new INDArrayIndex [] {NDArrayIndex.point (j), NDArrayIndex.all (), NDArrayIndex.all (), NDArrayIndex.all ()}, imageLoader.asMatrix (facePairList.get (I). GetList (). Get (1). Div (255));} catch (Exception e) {e.printStackTrace () } label.putScalar ((long) j, 0, facePairList.get (I). GetLabel ()); + + I;} System.out.println (currentBatch); + + currentBatch Return new org.nd4j.linalg.dataset.MultiDataSet (new INDArray [] {input1, input2}, new INDArray [] {label});} @ Override public void setPreProcessor (MultiDataSetPreProcessor preProcessor) {} @ Override public MultiDataSetPreProcessor getPreProcessor () {return null } @ Override public boolean resetSupported () {return true;} @ Override public boolean asyncSupported () {return false;} @ Override public void reset () {currentBatch = 0;}}

Next, you can evaluate the performance of the model, and the accuracy and accuracy are reasonable, but the F1 value is a little low.

= Evaluation Metrics==== # of classes: 2 Accuracy: 0.8973 Precision: 0.9119 Recall: 0.6042 F1 Score: 0.7268Precision, recall & F1: reported for positive class (class 1-"1") only=Confusion Matrix= 01-5651 98 | 0 = 0.665 1015 | 1 = 1Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times=

4. Encapsulate the model into a service with SpringBoot

After the model is saved, it is a bunch of dead parameters, how can it become an online service? There are two kinds of face recognition services: 1:1 and 1VR N

1. 1:1 application

Typical 1:1 applications such as mobile phone face recognition unlock, nail face recognition attendance, this kind of application is relatively simple, only need Zhang San is Zhang San, the amount of computation is very small. It's easy to achieve.

2. 1PUR N application

Typical 1VR N applications, such as the face search of public security organs, find out who the target face is from the massive face database without knowing the identity of the target face. When there is a large amount of data in the face database, computing is a big problem.

If you don't need the structure to come out in real time, you can calculate it offline with Hadoop MapReduce or Spark. All we need to do is encapsulate a Hive UDF function, or MapReduce jar, or Spark RDD programming.

However, for the real-time requirement of the calculation result, this problem cannot be transformed into an index problem, so it is necessary to design a computing framework that can distributed solve the problem of global Max or global Top. The general structure is as follows:

The blue arrow indicates that the request is reserved, and the green arrow indicates that the calculation result is returned. In the figure, a client request is called on the node Node3, and the Node3 forwards the request to other Node for parallel computing. Of course, if the memory of each Node is large enough, you can preheat the tensor of the entire face database to memory resident to speed up the computing speed.

Of course, this blog does not implement a parallel computing framework, only using springboot to wrap the model as a service. Run FaceRecognitionApplication and access http://localhost:8080/index. The service results are as follows:

The main intention of the editor is to introduce how to use DL4J in actual combat, including the acquisition of pretrained model parameters, the implementation of custom layer, the implementation of custom iterator, the service of springboot wrapper layer and so on.

Of course, a face recognition system with only one picture embedding and tensor distance is not enough, it should also include face correction, defense against AI attack (later blogs will also introduce how to use DL4J to carry out FGSM attacks), face key parts feature extraction and a lot of fine work to be done. Of course, there is a lot of work to be done to turn face recognition into a general SAAS service.

To train a good face recognition model, we need the cooperation of many kinds of loss function. For example, we can first use SoftMax to classify, and then use Center Loss and Triple Loss to do fine-tuning. The follow-up blog will introduce how to use DL4J to implement Triple Loss to train the face recognition model.

The above is how to use DL4J to build a face recognition system, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.