The text definition of fundamental frequency and how to extract the fundamental frequency of wav file with librosa 02/14 Update SLTechnology News&Howtos

The text definition of fundamental frequency and how to extract the fundamental frequency of wav file with librosa

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the text definition of the fundamental frequency and how to extract the fundamental frequency of wav files with librosa. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

1. Pitch and fundamental frequency

two。 Fundamental frequency definition

3. Harmonic wave

Overtone (overtones) = = Harmonic (Harmonics)

According to this, the timbre and content can be decoupled.

The first formant and the second formant. The position of et, especially the relative position, determines that the pronunciation content TODO needs to be discussed. Different vowels have different overtone resonance enhancement for different multiples, which also reflects some of the relative differences in energy.

The height of the fundamental frequency and the absolute height of the formant are also related to the pronunciation to a certain extent, but it is necessary to subtract the average fundamental frequency of the speaker.

The absolute height of a person's fundamental frequency, formant, etc., is related to the timbre, such as the difference of gender and the difference of F0 range.

The relative position of the formant, the largest information is the pronunciation content, but with the same pronunciation content, there will be everyone's pronunciation habits and oral structure, so the secondary information also has timbre information. This is more like speaker identity, the ASV feature.

The real "thick, bright, sharp, beautiful" can also be counted as timbre, but it can also be simulated by the same person, such as single multi-role novels, singing and so on. The energy allocated at different formant frequencies will lead to hearing. There is a big difference between people, which can also lead to ASV.

Therefore, 1-if norm (0,1) of independent energy for each frequency is applied to each person in the mel spectrum, then the individual voice characteristics can be removed to a certain extent, and the pronunciation information can be retained (this think carefully, involving the energy of position and position) | this process is equivalent to removing the "human weight" of a person's oral features to a particular pronunciation.

Then, 2-if you can extract the f0 and formants at each moment, normalize these positions with the position of f0, and remove the timbre to a certain extent, and retain the pronunciation information | | this process is equivalent to removing the "human accentuation" of a particular pronunciation by removing the characteristics of a person's vocal cords.

Ask a question, when singing in chorus, everyone hums "hum", the same tune, use this to analyze

(

Timbre: the difference in timbre is caused by the different distribution of total energy in different levels of overtone due to various vibrations.

)

Answer:

Just letting people hum the same word is equivalent to "degenerating" people, or "analogy" to the violin, everyone is a violin, but male and female, tenor and bass, are naturally different violins, even if the energy distribution of overtones is the same word, the natural distribution is also different. But the gap is not big (compared to humming different words).

One more question, is the frequency of ba said by boys and ba by girls the same as that of formants? Is it true that after the need for norm, the structure of the vocal cords and the strength of the chest muscles are different for each person?

The meaning of the horizontal line on the picture: the horizontal line indicates that a singer has been at a certain pitch for a period of time. The brighter the line, the greater the amplitude, of course, the louder the sound.

A straight line is a long straight tone. The study of long straight tone mainly depends on the stability. The straighter the whole line is, the more stable the long tone is. If a big shake is a broken sound, if a small shake is unstable. If it is crooked, it means that the pitch is not sure of the TODO, which needs to be discussed.

A wavy line is a vibrato. The bigger the wave, the greater the vibrato. When you look at vibrato, you can also see the stability. If the tremor is unstable, it means that there is something wrong with the breath and it is not well controlled.

3.1. How to observe and evaluate overtone / harmonics

Look at the overtone mainly depends on three points: rich or not; distribution; overtone volume. The main purpose of contrast overtone is to see the comparison with the fundamental frequency. Because the volume of the audio file can be adjusted, simply looking at the overtone size does not make much sense, it is more meaningful to take the fundamental frequency as a reference. The resonance of low frequencies depends on the type of vowels, and each vowel has its own specific resonance characteristics. Generally speaking, the fundamental frequency is rarely enhanced by resonance)

In this example, the first overtone is huge (as you can see, the first overtone is usually the biggest), and the three overtones around 3000 are also very strong.

Another way of observation is to compare with the accompaniment. The more obvious the voice is, the stronger the voice is and the more it can penetrate the accompaniment. After that, the author gave an example, so I skipped it. Be lazy

4. Overview of librosa

5. Librosa extract F0

Https://librosa.org/doc/main/generated/librosa.pyin.html

Y, sr = librosa.load (librosa.ex ('trumpet')) f0, voiced_flag, voiced_probs = librosa.pyin (y, fmin=librosa.note_to_hz (' C2'), fmax=librosa.note_to_hz ('C7')) times = librosa.times_like (f0) import matplotlib.pyplot as pltD = librosa.amplitude_to_db (np.abs (librosa.stft (y)), ref=np.max) fig, ax= plt.subplots () img = librosa.display.specshow Yearly axisymmetric logging, ax=ax) ax.set (title='pYIN fundamental frequency estimation') fig.colorbar (img, ax=ax, format= "% + 2.f dB") ax.plot (times, f0, label='f0', color='cyan', linewidth=3) ax.legend (loc='upper right')

After reading the above, do you have any further understanding of the text definition of the fundamental frequency and how to extract the fundamental frequency of wav files with librosa? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.