In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to use tensorflow to achieve music type classification, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
Data source
Predicting the type of an audio sample is a supervised learning problem. In other words, we need to contain the data for the tag example. FreeMusicArchive is an audio segment library containing related tags and metadata, originally collected for papers at the International Music Information Retrieval Conference (ISMIR) in 2017.
We focus our analysis on a small portion of the data provided. It contains 8000 audio clips, each 30 seconds long, and is divided into eight different types:
Hip-Hop
Pop
Folk
Experimental
Rock
International
Electronic
Instrumental
Each type has 1000 representative audio clips. The sampling rate is 44100hz, which means that each audio sample has more than 1 million data points, or a total of more than 10 data points. Using all of this data in a classifier is a challenge, which we will discuss in more detail in the following chapters.
For instructions on how to download data, see the README file included in the repository. We are very grateful to Micha ë l Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson for putting this data together and providing it for free, but we can only imagine the insights provided by the size of the data that Spotify or Pandora Radio has. With this data, we can describe various models to perform the task at hand.
Model description
I will try to reduce the theoretical details, but will link to the relevant resources as much as possible. In addition, our report contains much more information than I can contain here, especially about functional engineering.
Standard machine learning
We use Logistic regression, k-nearest neighbor (kNN), Gaussian naive Bayes and support vector machine (SVM): support vector machine (SVM) to find the best decision boundary by maximizing the margin of training data. Kernel techniques define nonlinear boundaries by projecting data into high-dimensional space.
KNN assigns a label based on a majority of votes in k recent training samples
Naivebayes predicts the probabilities of different classes according to the characteristics. The conditional independence assumption greatly simplifies the calculation.
Logistic regression also uses Logistic function to predict different categories of probabilities through direct modeling of probabilities.
Deep learning
For further study, we use the TensorFlow framework. We build different models according to the type of input. For raw audio, each example is a 30-second audio sample, or about 1.3 million data points. These floating-point values (positive or negative) represent the wave displacement at a certain time. In order to manage computing resources, less than 1% of the data is used. With these features and associated tags (a hot spot code), we can build a convolutional neural network. The overall architecture is as follows:
One-dimensional convolution layer in which the filter combines information from accidental data
The MaxPooling layer, which combines information from the convolution layer
Full connection layer, creates a linear combination of extracted convolution features, and performs the final classification
The Dropout layer, which helps model generalize to invisible data
On the other hand, the spectrogram is used as a visual representation of the audio sample. This inspires to treat the training data as images and make use of pre-trained models through transfer learning. For each example, we can form a Mel spectrum of the matrix. If we calculate the size correctly, this matrix can be represented as a 224x224x3 image. These are the correct dimensions of MobileNetV2, and MobileNetV2 has excellent performance in image classification tasks. The idea of transfer learning is to use the basic layer of the pre-trained model to extract features and replace the last layer with a custom classifier (dense layer in our case). This is because the base layer usually generalizes well to all images, even if they are not trained.
Model result
We use 20% of the test suite to evaluate the performance of our model. We can summarize the results into the following table:
The convolutional neural network using transfer learning in the spectrum has the best performance, although SVM and Gaussian naivebayes are similar in performance (which is interesting considering the simplified assumptions of the latter). We describe the best hyperparameters and model architectures in the report.
Our analysis of training and verification curves highlights the problem of overfitting, as shown in the following figure (most of our models have similar diagrams). The current feature model helps us to identify this problem. We have designed some solutions for this, which can be implemented in future iterations of this project:
Reduce the dimension of data: techniques such as PCA can be used to combine extracted features and limit the size of feature vectors for each example
Increase the size of training data: the data source provides a larger subset of data. We limit the scope of exploration to less than 10% of the entire dataset. If more computing resources are available, or if we succeed in reducing the dimension of the data, we can consider using a complete dataset. This is likely to enable our approach to isolate more patterns and greatly improve performance
Please pay more attention to our search function: FreeMusicChive includes a series of functions. When we use these features instead of our own, we do see an improvement in performance, which leads us to believe that we can hope to achieve better results through domain knowledge and extended feature sets.
TensorFlow implementation
TensorFlow is a very powerful tool for building neural networks on a scale, especially in conjunction with googlecolab's free GPU/TPU runtime. The main idea of this project is to identify bottlenecks: my initial implementation was so slow that I even used GPU. I found that the problem was the Istroke O process (reading data from disk, which is very slow) rather than the training process. Using the TFrecord format can be speeded up by parallelization, which makes the training and development of models faster.
Before I get started, I have an important note: although all the songs in the dataset are in MP3 format, I convert them to wav files because TensorFlow has better built-in support. Please refer to the libraries on GitHub to see all the code related to this project. The code also assumes that you have a Google cloud bucket where all wav files are available, a Google drive for uploading metadata, and that you are using googlecolab. However, it should be relatively easy to adjust all the code to another system (cloud-based or local).
Initial setup
This project requires a large number of libraries. The files in this requirements.txt repository handle the installation for you, but you can also find the detailed list below.
# import libraries
Import pandas as pd
Import tensorflow as tf
From IPython.display import Audio
Import os
Import matplotlib.pyplot as plt
Import numpy as np
Import math
Import sys
From datetime import datetime
Import pickle
Import librosa
Import ast
Import scipy
Import librosa.display
From sklearn.model_selection import train_test_split
From sklearn.preprocessing import LabelEncoder
From tensorflow import keras
From google.colab import files
Keras.backend.clear_session ()
Tf.random.set_seed (42)
Np.random.seed (42)
The first step is to mount the drive (where the data has been uploaded) and authenticate using the GCS bucket where the audio files are stored. Technically, data can also be uploaded to GCS, so you don't need to install a drive, but that's how my own project is built.
# mount the drive
# adapted from https://colab.sandbox.google.com/notebooks/io.ipynb#scrollTo=S7c8WYyQdh6i
From google.colab import drive
Drive.mount ('/ content/drive')
# load the metadata to Colab from Drive, will greatly speed up the I/O process
Zip_path_metadata = "/ content/drive/My Drive/master_degree/machine_learning/Project/fma_metadata.zip"
! cp "{zip_path_metadata}".
! unzip-Q fma_metadata.zip
! rm fma_metadata.zip
# authenticate for GCS access
If 'google.colab' in sys.modules:
From google.colab import auth
Auth.authenticate_user ()
We also store some variables for future use, for example.
# set some variables for creating the dataset
AUTO = tf.data.experimental.AUTOTUNE # used in tf.data.Dataset API
GCS_PATTERN = 'gs://music-genre-classification-project-isye6740/fma_small_wav/*/*.wav'
GCS_OUTPUT_1D = 'gs://music-genre-classification-project-isye6740/tfrecords-wav-1D/songs' # prefix for output file names, first type of model
GCS_OUTPUT_2D = 'gs://music-genre-classification-project-isye6740/tfrecords-wav-2D/songs' # prefix for output file names, second type of model
GCS_OUTPUT_FEATURES = 'gs://music-genre-classification-project-isye6740/tfrecords-features/songs' # prefix for output file names, models built with extracted features
SHARDS = 16
Window_size = 10000 # number of raw audio samples
Length_size_2d = 50176 # number of data points to form the Mel spectrogram
Feature_size = 85210 # size of the feature vector
N_CLASSES = 8
DATA_SIZE = (224pr 224je 3) # required data size for transfer learning create TensorFlow dataset
The next step is to set up the necessary information that the function needs to read in the data. I didn't write this code, I just adapted it from FreeMusicArchive. This section is likely to change in your own project, depending on the dataset you use.
# function to load metadata
# adapted from https://github.com/mdeff/fma/blob/master/utils.py
Def metadata_load (filepath):
Filename = os.path.basename (filepath)
If 'features' in filename:
Return pd.read_csv (filepath, index_col=0, header= [0,1,2])
If 'echonest' in filename:
Return pd.read_csv (filepath, index_col=0, header= [0,1,2])
If 'genres' in filename:
Return pd.read_csv (filepath, index_col=0)
If 'tracks' in filename:
Tracks = pd.read_csv (filepath, index_col=0, header= [0,1])
COLUMNS = [('track',' tags'), ('album',' tags'), ('artist',' tags')
('track',' genres'), ('track',' genres_all')]
For column in COLUMNS:
Tracks [column] = tracks [column] .map (ast.literal_eval)
COLUMNS = [('track',' date_created'), ('track',' date_recorded')
('album',' date_created'), ('album',' date_released')
('artist',' date_created'), ('artist',' active_year_begin')
('artist',' active_year_end')]
For column in COLUMNS:
Tracks [column] = pd.to_datetime (tracks [column])
SUBSETS = ('small',' medium', 'large')
Try:
Tracks ['set',' subset'] = tracks ['set',' subset'] .astype (
Pd.CategoricalDtype (categories=SUBSETS, ordered=True))
Except ValueError:
# the categories and ordered arguments were removed in pandas 0.25
Tracks ['set',' subset'] = tracks ['set',' subset'] .astype (
Pd.CategoricalDtype (categories=SUBSETS, ordered=True))
COLUMNS = [('track',' genre_top'), ('track',' license')
('album',' type'), ('album',' information')
('artist',' bio')]
For column in COLUMNS:
Tracks [column] = tracks [column] .astype ('category')
Return tracks
# function to get genre information for each track ID
Def track_genre_information (GENRE_PATH, TRACKS_PATH, subset):
"
GENRE_PATH (str): path to the csv with the genre metadata
TRACKS_PATH (str): path to the csv with the track metadata
FILE_PATHS (list): list of paths to the mp3 files
Subset (str): the subset of the data desired
"
# get the genre information
Genres = pd.read_csv (GENRE_PATH)
# load metadata on all the tracks
Tracks = metadata_load (TRACKS_PATH)
# focus on the specific subset tracks
Subset_tracks = tracks [tracks ['set',' subset']
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.