Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Logistic regression

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces the knowledge of "how to use Logistic regression". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The speech Digital (Spoken digits) data set is a subset of the Tensorflow speech data set, which includes recordings other than the digits 0-9. Here, we only focus on identifying spoken numbers.

The dataset can be downloaded as follows.

Data = download_url ("http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz"," / content/ ") with tarfile.open ('/ content/speech_commands_v0.01.tar.gz', 'rGZ') as tar: tar.extractall (path='./data') Downloading http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz to / content/speech_commands_v0.01.tar.gzHBox (children= (FloatProgress (value=1.0) Bar_style='info', max=1.0), HTML (value='')) digit = ['zero',' one', 'two',' three', 'four',' five', 'six',' seven', 'eight',' nine'] for x in digit: print (x, ":" Len (os.listdir ('/ content/data/'+x)) # balance zero: 2376one: 2370two: 2373three: 2356four: 2372five: 2357six: 2369seven: 2377eight: 2352nine: 2364 Evaluation indicators

The numbers are fairly balanced, with about 2300 samples for each class. Therefore, accuracy is a good index to evaluate the performance of the model. Accuracy is the comparison between the correct predictions and the total predictions.

This is not a good performance metric for unbalanced datasets, as a few classes may be overshadowed.

Cyclic learning rate

When training a model, the learning rate decreases gradually in order to fine-tune the training. In order to improve the learning efficiency, the circular learning rate can be adopted. Here, the learning rate fluctuates between the minimum and maximum values in different periods, rather than a monotonous decline.

The initial training rate is very important to the performance of the model, the low training rate can prevent it from getting stuck at the beginning of training, and the subsequent fluctuation suppresses the local minimum.

The project can be classified in three ways:

Using five extracted features for Logistic regression analysis, the accuracy was 76.19%.

Logistic regression using only MFCCs-95.56% accuracy.

CNN uses Mel spectra-the accuracy is 95.81%.

The model is trained repeatedly by changing epoch and training rate. The number of hidden layers and the nodes in each layer are also different. The best architecture and hyperparameters for each method are described here. Due to the randomness of training and verification set division, the accuracy of retraining may be slightly different.

There are five .ipynb files:

Feature extraction-extract the CSV files and features needed by the three methods.

Feature Visualization-draw a feature map in each class.

Spokendigit five features-use five extracted features to achieve logical regression.

Spokendigit MFFC- uses MFCC for logical regression.

Spokendigit CNN- uses Mel spectrogram to implement CNN.

1. Logistic regression features using five extracted features

The extracted features include:

Mel Frequency Cepstral Coefficients (MFCCs)-A coefficient that makes up the spectral representation of sound based on the frequency bands at intervals between the responses of the human auditory system (Mel scale).

Chroma-related to 12 different pitch levels.

Average value of Mel spectrogram-Mel spectrum based on Mel scale.

Spectral Contrast- represents the centroid of the spectrum.

Tonnetz-represents the tone space.

These are NumPy arrays of sizes (20,) (12,) (128,) (7,) and (6,). These are joined together to form a feature array of size (173,). The tag is appended to the header of the array and written to the CSV file for each record.

Def extract_features (files): data, sr= librosa.load ('/ content/data/'+files.File) mfccs = np.mean (librosa.feature.mfcc (y = data, sr=sr). T, axis = 0) stft = np.abs (librosa.stft (data)) chroma = np.mean (librosa.feature.chroma_stft (S = stft, sr=sr). T, axis = 0) mel = np.mean (librosa.feature.melspectrogram (data, sr). T Axis = 0) contrast = np.mean (librosa.feature.spectral_contrast (S = stft, sr = sr). T, axis = 0) tonnetz = np.mean (librosa.feature.tonnetz (y = librosa.effects.harmonic (data), sr = sr). T, axis = 0) # print (mfccs.shape, stft.shape, chroma.shape, mel.shape, contrast.shape, tonnetz.shape) row = np.concatenate ((mfccs, chroma, mel, contrast, tonnetz) Axis = 0). Astype ('float32') csvwriter.writerow (np.concatenate ([digit.index (files.Label)], row)) model

The linear regression model consists of one input layer, two hidden layers and one output layer with ReLu activation.

Class SpokenDigitModel (nn.Module): def _ init__ (self): super (). _ _ init__ () self.l1 = nn.Linear (173,512) self.l2 = nn.Linear (1024) self.l3 = nn.Linear (512,64) self.l4 = nn.Linear (64,10) def forward (self) X): X = F.relu (self.l1 (x)) x = F.relu (self.l2 (x)) x = F.relu (self.l3 (x)) x = self.l4 (x) return x def training_step (self, batch): inputs, labels = batch outputs = self (inputs) loss = F.cross_entropy (outputs, labels) return loss def validation_step (self Batch): inputs, labels = batch outputs = self (inputs) loss = F.cross_entropy (outputs, labels) _, pred= torch.max (outputs, 1) accuracy = torch.tensor (torch.sum (pred==labels). Item () / len (pred) return [loss.detach (), accuracy.detach ()] training model = to_device (SpokenDigitModel (), device) history = [] evaluate (model, val_dl) {'accuracy': 0.10285229980945587 'loss': 3.1926627159118652} history.append (fit (model, train_dl, val_dl, 64,0.01)) r = evaluate (model, val_dl) yp, yt = predict_dl (model, val_dl) print ("Loss:", r [' loss'], "\ nAccuracy:", r ['accuracy'], "\ nF-score:", f1_score (yt, yp, average='micro') Loss: 2.0203850269317627 Accuracy: 0.7619398832321167 F-score: 0.7586644125105664

The model was trained on CPU for about 3 minutes, and the accuracy was 76.19%.

Plot (losses, 'Losses')

Starting from the minimum value, the final verification loss gradually increases.

Plot (accuracies, 'Accuracy')

The above is the accuracy curve.

Plot (last_lr, 'Last Learning Rate')

The above is the learning rate curve of each epoch.

two。 Use only the Logistic regression feature of MFCCs

This model only uses Mel frequency cepstrum coefficient (MFCCs). This feature is a NumPy array of size (20,). It is retrieved from the CSV file that contains all of the above features.

Model

The linear regression model consists of one input layer, two hidden layers and one output layer with ReLu activation.

Class SpokenDigitModel (nn.Module): def _ init__ (self): super (). _ _ init__ () self.l1 = nn.Linear (20,512) self.l2 = nn.Linear (1024) self.l3 = nn.Linear (512,64) self.l4 = nn.Linear (64,10) def forward (self) X): X = F.relu (self.l1 (x)) x = F.relu (self.l2 (x)) x = F.relu (self.l3 (x)) x = self.l4 (x) return x def training_step (self, batch): inputs, labels = batch outputs = self (inputs) loss = F.cross_entropy (outputs, labels) return loss def validation_step (self Batch): inputs, labels = batch outputs = self (inputs) loss = F.cross_entropy (outputs, labels) _, pred= torch.max (outputs, 1) accuracy = torch.tensor (torch.sum (pred==labels). Item () / len (pred) return [loss.detach (), accuracy.detach ()] training model = to_device (SpokenDigitModel (), device) history = [] evaluate (model, val_dl) {'accuracy': 0.08834186941385269 'loss': 8.290132522583008} history.append (fit (model, train_dl, val_dl, 128,0.001)) r = evaluate (model, val_dl) yp, yt = predict_dl (model, val_dl) print ("Loss:", r [' loss'], "\ nAccuracy:", r ['accuracy'], "\ nF-score:", f1_score (yt, yp, average='micro') Loss: 0.29120033979415894 Accuracy: 0.95561796477307 F-score: 0.9556213017751479

The model is trained on CPU for about 10 minutes, and the accuracy is 95.56%.

Mfcc is based on the Mel scale. In the Mel scale, frequencies are grouped according to the human auditory response rather than the linear scale. Human ear is a tested speech recognition system, so the Mel scale gives good results.

On the other hand, mfcc is easily affected by background noise, so it works best when dealing with clean speech data (no noise or minimum noise).

Plot (losses, 'Losses')

The above is the verification set loss curve.

Plot (accuracies, 'Accuracy')

The above is the accuracy curve of the verification set

Plot (last_lr, 'Last Learning Rate')

Above is the curve of the final learning rate of each epoch.

3. Use the CNN of the Mel spectrogram image. Features

The Mel spectrum is used in the model. The Mel spectrum is the spectrum that converts the frequency to the Mel scale. These features are extracted from the recording and stored in the drive. It took more than 4.5 hours.

Def extract_mel (f, label): data, sr = librosa.load ('/ content/data/'+label+'/'+f) fig = plt.figure (figsize= [1Magne1]) ax = fig.add_subplot (111m) ax.axes.get_xaxis (). Set_visible (False) ax.axes.get_yaxis (). Set_visible (False) ax.set_frame_on (False) S = librosa.feature.melspectrogram (y=data) Sr=sr) librosa.display.specshow (librosa.power_to_db (S, ref=np.max), fmin=50, fmax=280) file ='/ content/drive/My Drive/Dataset/spokendigit/'+label+'/' + str (f [:-4]) + '.jpg' plt.savefig (file, dpi=500, bbox_inches='tight') Pad_inches=0) plt.close () Model class SpokenDigitModel (nn.Module): def _ init__ (self): super (). _ init__ () self.network = nn.Sequential (nn.Conv2d (3,16, kernel_size=3, padding=1), nn.ReLU (), nn.MaxPool2d (2,2), nn.Conv2d (16,32, kernel_size=3, stride=1) Padding=1), nn.ReLU (), nn.MaxPool2d (2,2), nn.Conv2d (32,64, kernel_size=3, stride=1, padding=1), nn.ReLU (), nn.MaxPool2d (2,2), nn.Conv2d (64,128,kernel_size=3, stride=1, padding=1), nn.ReLU (), nn.MaxPool2d (2,2) Nn.Conv2d (128,128, kernel_size=3, stride=1, padding=1), nn.ReLU (), nn.MaxPool2d (2,2), nn.Conv2d (128,256, kernel_size=3, stride=1, padding=1), nn.ReLU (), nn.AdaptiveAvgPool2d (1), nn.Flatten (), nn.Linear (256,128) Nn.ReLU (), nn.Linear (128,64), nn.ReLU (), nn.Linear (64,10), nn.Sigmoid () def forward (self, x): return self.network (x) def training_step (self, batch): inputs Labels = batch outputs = self (inputs) loss = F.cross_entropy (outputs, labels) return loss def validation_step (self, batch): inputs, labels = batch outputs = self (inputs) loss = F.cross_entropy (outputs, labels) _, pred = torch.max (outputs) 1) accuracy = torch.tensor (torch.sum (pred==labels). Item () / len (pred)) return [loss.detach (), accuracy.detach ()] training model = to_device (SpokenDigitModel (), device) history = [] evaluate (model, val_dl) {'accuracy': 0.09851787239313126,' loss': 2.3029427528381348} history.append (fit (model, train_dl, val_dl, 128,0.001)) r = evaluate (model, val_dl) yp Yt = predict_dl (model, val_dl) print ("Loss:", r ['loss'], "\ nAccuracy:", r [' accuracy'], "\ nF-score:", f1_score (yt, yp, average='micro') Loss: 1.492598056793213 Accuracy: 0.9581243991851807 F-score: 0.9573119188503804

The model was trained on Colab GPU for about 5 hours, and the accuracy was 95.81%.

The high accuracy can be attributed again to the Mel scale.

Plot (losses, 'Losses')

The above is the verification set loss curve.

Plot (accuracies, 'Accuracy')

Above is the accuracy curve of the verification set.

Plot (last_lr, 'Last Learning Rate')

Above is the curve of the final learning rate of each epoch.

That's all for "how to use Logistic regression". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report