Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Video version of the Big Bang is coming, editing can be accurate to every word, support Chinese Demo can play, Lao Luo: transfer money

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Now that the video was edited, it could be accurate to every word!

Just click on the words you want to delete or leave, and AI will give you a new video in minutes.

This operation, a bit of a big bang function inside the smell. Lao Luo: Make money!

Whether it's MV, interview, movie clip, all kinds of videos, whether there are subtitles or not, it's all easy.

The netizens started to try it out one after another. For example, let Rick roll the ghost animal up ~

Even Chinese videos can hold. The Demo found that it even translated directly into English.

However, this does not affect the whole operation, after all, the model behind it supports multiple languages including Chinese.

Accurate to each word clip video only three steps, you can easily complete accurate to the word video clip_

Upload videos, select deleted/left words, download videos.

Three examples were released, a cooking video, a Zach interview, and "Just Do It."

Of course, you can also try it yourself, and all kinds of languages can be recognized. Take, for example, a classic dialogue that lets bullets fly.

Green represents retention and red represents deletion. You have three options: clip, select all words, reset.

After selecting any reserved word, you can "Cut Video". Select two lines that jump out here. It took less than ten seconds to edit.

The screen transition can be said to be very smooth ~ the entire Demo demo runs on T4.

Based on Whisper Model This is a new Whisper feature developed by Dutch developer Matthijs Hollemans on HuggingFace.

Whisper is a speech recognition neural network open-sourced by OpenAI in September last year, and after 680,000 hours of training on multi-language and multi-task supervised network data, its robustness and accuracy are close to human level. It can be multilingual transcribed, and other languages will be translated into English.

It is based on the Transformer end-to-end implementation architecture and requires no fine-tuning. Input audio is divided into 30-second chunks, converted to Mel cepstrum (log-Mel spectrogram), and passed to the encoder.

All right, Demo here, interested friends can poke:

https://huggingface.co/spaces/radames/whisper-word-level-trim

Reference link:

[1]https://openai.com/research/whisper

[2]https://twitter.com/mhollemans/status/1671812176842039296

This article comes from Weixin Official Accounts: Qubit (ID: QbitAI), by Yang Jing

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report