In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
In addition to being used for communication, recognition and warning, the human auditory system has also evolved an advanced intelligence that may only be available to human beings, that is, music, such as solo chorus, musical instrument solo ensemble and so on.
Among them, singing is the easiest and most difficult "musical instrument". Because anyone can sing, sing well is "the sound around the beam, three days non-stop", on the contrary, it may also be "just that tone is hoarse and rough really bad".
Singing and speaking area when others speak with vocal cord vibration to produce voice, tone and frequency are in the most natural vocal area, occasionally some people will use abdominal breathing to enhance the thickness of the voice and reduce the fatigue of vocal cords. Even if mood swings affect voice, it generally doesn't change much. Singing requires more skills, which is significantly different from speaking.
First, the range of singing varies widely. For example, Russian male singer Vitas can sing four octaves from the lowest to the highest, which is very powerful. But I can also, shivering Xido, repeat 5 times, there are 5 octaves in one breath.
Second, there is a great difference between singing and speaking in the use of resonant cavity. For example, the head resonance used in singing is different from the position of the nose and the back of the head, which leads to a great difference in timbre. It is necessary to make a choice according to the style of the song. Bel canto, which is often heard by ordinary people, likes to set the sound, body feeling and language of the head cavity together.
If you pay attention to watching the stars sing, some singers will frown and wrinkle their noses when they sing high notes, and they are actually looking for the position where the high notes resonate.
In order to express the thickness of the song, it is not enough to rely on the head cavity alone, because it will be relatively thin, and you have to use chest resonance to enhance the resonance in the middle and bass areas.
If you want to improve your range, you can also learn to sing with mask, pharynx and off. On the other hand, the bass, such as Humai, carries the gas to the vicinity of the vocal cords to vibrate.
Picture source: pexels third, breath is also the place that makes the difference between speaking and singing bigger. Some of the lyrics in the song are very long, and it is often difficult to maintain the stability and continuity of the melody by using only the shallow chest breathing that you usually speak, so you need to use chest-abdominal breathing and more complex breathing skills.
Fourth, unlike speaking is generally stable, the rhythm of the song is very rich, a song may be slow and slow will appear.
Fifth, the understanding of the lyrics and emotional investment will also make a significant difference between singing and speaking.
Sixth, the question of connecting reading. Chinese lyrics are relatively better, but English is much more linked in singing. People may also have the illusion that people who stutter can't sing well when it comes to singing and talking. But in fact, the two belong to different vocal mechanisms. To speak, you need to think about what you want to say, organize the language, and then say it. Singing is usually the intonation, speed and tone of the song have been given, what one needs to do is to repeat these contents after repeated practice. Therefore, people who stutter can try to find confidence in speaking fluently by learning to sing.
How to evaluate whether the song is good or not is one of the ways to ease the mood for most people. When I heard my favorite song, I learned to sing it. But is it good to sing? A lot of people don't know.
What kind of song can be defined as a good song? Music defines whether it is good or not, and there is a general rule related to frequency f.
This is the 1 / f fluctuation principle put forward by the famous Japanese physicist Takeshi Liguang in the article "Biological Information and 1 / f fluctuation" published in the Journal of the Applied Physics Society in 1965. Fluctuation or fluctuation refers to the random change of a physical quantity near the macro average, and its principle is applicable in many fields.
As far as music is concerned, 1 / f indicates that the melody can be disordered locally, while macroscopically, it has some relevance, which can make people feel comfortable and harmonious. Many lyric songs on the market are in line with the principle of 1 / f fluctuation, so people like to listen to them.
For other forms of music style, such as rock, rap, etc., it is because its rhythm can help people vent and express their feelings.
What's more, there are songs that completely deviate from the principle of 1 / f fluctuation, such as the experimental song "Fireworks", which is almost close to noise (the original is Katy Perry).
To help evaluate whether music is good or not, scientists have also proposed some qualitative and quantitative indicators of psychoacoustics. for example, composite acoustic indicators such as "annoyance" and "perceptual pleasure" are based on a combination of acoustic features such as roughness, sharpness, volatility and tone.
Picture source: pexels but no matter how agreed, the diversity of music styles and the rich color of personality, the perception of sound is still based on the subjective feelings of the individual, and what is recognized by the public may not be used to depict the aesthetic views of minorities.
For singing, some people like rough and low, some like clear as water, some like loud and clear, some like euphemism.
For songs, some people like strange, some like plain talk, some like drooling songs, some like spring snow.
Because of the diversity of music style and the rich color of personality, it is difficult to form a unified objective standard to judge.
Song / singing related applications although song / singing analysis is obviously more complex and difficult than simple speech recognition, there are still some related applications in the field of artificial intelligence.
List several more valuable applications, one is song humming recognition, which is a function that most platforms that provide music have or are trying to do. Its task is to identify possible tunes according to the melodies of local fragments. The difficulty is that not everyone can accurately hum the melody. Most people find songs in this way, perhaps because they don't remember the title of the song, or it's just a distant memory of the melody. Secondly, there are some differences in pronunciation frequency, tone, articulation and original singing. Therefore, the task of humming recognition is to find a valid candidate set from imprecise humming.
Besides humming, another important application is automatic tuning. First, because few people can have the ability of absolute pitch, even after professional training, it may still be unstable. Second, there are problems with the pitch accuracy and stability of most people. And there are many people who like singing. Therefore, automatic tuning has a large application market for both professional singers and amateurs. However, due to the ever-changing style of music, and the need to learn and enhance each person's unique recognition and personalized timbre, it is obvious that it is difficult to use artificial intelligence technology to construct automatic tuners.
In addition, the separation of music and human voice is also an extremely important research direction. Human beings are so capable in this respect that they can easily choose the voice they pay attention to in a very noisy environment. In 1953, Cherry called this phenomenon caused by human auditory attention the cocktail party effect (cocktail party effect).
Although this phenomenon has been discovered for more than half a century, it is difficult for artificial intelligence to achieve similar recognition ability to human beings. Because the audio signal obtained through the microphone is generally an one-dimensional audio signal mixed by multiple sound sources, it will be an one-to-many morbid problem to separate the original multiple signal sources, and there is no unique solution.
In fact, humans can't get the cocktail party effect after listening to the recorded sound.
In order to solve this problem, in the field of artificial intelligence, it is usually assumed that these information sources are independent of each other and do not conform to the previously mentioned Gaussian distribution, and the output result is a weighted combination of these information sources. The separation of information sources is also called blind source separation (blind-source separation). The previous approach is to use the independent component analysis (independent component analysis) technology or its improved version in the field of machine learning and pattern recognition, but the disadvantage of this method is that the convergence speed is slow and it is difficult to obtain the unique solution.
Deep learning has made great progress in this direction. Such as the latest results published by Google Research in August 2018 in the Journal of the computer Graphics Society (ACM Transactions on Graphics,ACM ToG), the top journal of graphics. The author Ephrat and others combine audio and video and use two deep learning models to extract their respective features for video and audio respectively.
After the fusion of features, a short-term memory depth model (long short-term memory,LSTM) considering time variation is used to describe the timing characteristics of audio and video. Finally, two different decoding systems are used for each speaker to separate audio and video. The model achieves the best results so far and is one step closer to simulating the human cocktail party effect. However, there are still some shortcomings, mainly two points. First, we need to use video, so the face must appear in the picture to help locate the sound source, which is still much weaker than when people don't need visual help to locate at a cocktail party. Second, the study does not address the more difficult problem of separating songs from musical instruments.
Input video frame and audio (a)
Processing idea: extract video and audio features respectively, and perform audio and video source separation (b)
Output clean audio for each speaker (c)
Of course, there are many other interesting applications of music analysis based on artificial intelligence, such as computer composition / lyrics, designing singing robots like Luo Tianyi, and so on.
But on the whole, the artistic conception of lyrics and melodies written by human authors often has better integrity and stronger logic, while computer simulation can only achieve local approximation at present. Overall vision, the overall emotional grasp is still a long way to go, perhaps at this stage to consider mixed intelligent processing with people is a good attempt.
Source: "the agent that loves to make mistakes", slightly deleted by the author: Zhang Junping part of the image source network copyright belongs to the original author, introduction to ★ books ★
The author of "the Agent who loves to make mistakes": Zhang Junping, the author of Tsinghua University Press introduces the frontier progress of artificial intelligence in general, and at the same time, from the point of view of making mistakes, it is easy to understand the various delusions and mistakes existing in the aspects of vision, hearing, language and so on. It is pointed out that understanding mistakes that we pay little attention to is conducive to the research and development of agents. The book starts with the analysis of the perceptual function of human beings, and introduces the anatomical knowledge and basic principles of human vision, hearing, touch and somatosensory with vivid examples. After that, he enters the emotional world of people, from emotions, memories and dreams to inspiration and delusions. The author cruises effortlessly between life, computer, mathematics, physics and other major disciplines, allowing readers to experience a dizzying interdisciplinary science tourism. A brief introduction to ★ authors ★
Zhang Junping, professor and doctoral supervisor, School of computer Science and Technology, Fudan University. The main research interests are artificial intelligence, machine learning, image processing, biological authentication and intelligent transportation. He visited the University of California, San Diego in 2007.9-2008.3 as a visiting scholar, and worked at Pennsylvania State University as an Research Associate in 2014.8-2015.8. He has presided over three National Natural Science Foundation, "863" projects and Pujiang Talent Program projects. At present, he is in charge of the 2018 Ministry of Science and Technology key project "Human-Robot Intelligent Fusion Technology" and the National Natural Science Foundation of China. Deputy director of mixed intelligence professional committee of Chinese society of automation, member of artificial intelligence professional committee of Chinese computer society, standing committee member of machine learning professional committee of Chinese artificial intelligence society. Published more than 100 high-quality papers related to artificial intelligence. Including IEEE TPAMI, TNNLS, ToC, TAC, TITS, TVCG and other international journals and ICML, AAAI, ECCV and other international conferences. Reading at the origin has entered the Little Red Book!
This article comes from the official account of Wechat: Origin Reading (ID:tupydread), author: Zhang Junping, Editor: Zhang Runxin
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.