Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Google AI is devouring everything, crawling all public content to train AI, and privacy policy has been updated

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

From now on, every word you say publicly online may be used by Google to train AI!

Yes, after painting, text works will also be used to feed large models--

Whether it's technical blogs, code, papers, or all the posts you post online, they can be thrown into the Google Model Mixer, even if they are copyrighted.

Just this week, Google updated its privacy policy, making it clear that it reserves the right to crawl all public content online to build its AI tool.

Netizens immediately exploded. Someone warned that "Google is grabbing everything":

Once Google can read what you write, it means it's all their "property".

Some netizens hold a more pessimistic view:

Soon, all content producers will be AI.

So, what's going on with this privacy policy?

The fact that it is used to train AI products such as Bard still starts with Google's updated privacy policy in recent days.

In its latest privacy policy, Google added an AI model clause on "research and development":

Google uses information to improve our services and develop new products, features and technologies to benefit our users and the public.

For example, we will use public information to help train Google's AI model and build practical products and features (such as Google translation, Bard, and Cloud AI functions).

In other words, all the public information that may be collected is used in the training of AI-related products or features such as Google Translation, Bard and Cloud AI.

So, what exactly does this public information include?

Such as the Internet, the web, and other activity information, including search terms, applications and browsers that interact with Google services, as well as the use of Google services in third-party websites and applications.

In other words, not only content such as blogs that have been made public before, including Google Docs made public online, or posts containing personal information, may also be collected by Google for large model training.

Of course, at present, these contents are still limited to "public information".

E-mail services such as Google's Gmail should still not be crawled into the data.

And Google has made it clear in its privacy policy that such personal or public information can also be used for other reasons, such as security threats, information review, service maintenance, personalized advertising or law.

However, why does Google update this policy at this juncture?

"AI is challenging text copyright" may also have something to do with "stream restrictions" by companies such as Reddit and Twitter.

First, in April this year, Reddit announced that it would charge companies for access to API.

The company CEO thinks Reddit's database is valuable, but they don't want to give it away to big tech companies for free.

Later, Twitter began to limit the flow of Twitter under the pretext of "not wanting AI's whoring data". The number of unverified users' daily views increased to 6000 after verification.

This series of policies have a serious impact on users and third-party tools. For example, Reddit triggered a large-scale protest on the discussion section. Many moderators directly closed their own forums to protest against the Reddit campaign. Many people on Twitter also denounced it, and some netizens even said that "Twitter has been killed."

But in any case, let AI whoring data this matter, now is a contradiction that can not be ignored.

Some netizens expressed doubts about Google's AI crawling data:

Why before the Internet, such as search engines, there are operations such as crawling data, but people are resistant to "AI crawling".

Some netizens responded:

In essence, it is still a matter of copyright. If you only quote copyrighted material, it does not necessarily infringe copyright, but if you use AI to "stir and clean" copyrighted content, and this is legalized, then copyright is essentially dead.

That is why he is pessimistic about the matter:

Can you accept this if someone copies your blog without marking the source, or uses your open source code as a paid service, or uses your answers on StackOverflow as a way to answer questions?

Everything I've done before is free. But now if AI wants me to disappear, then I will disappear.

Of course, some netizens have accepted the introduction of this policy, and it is indispensable to guard against our own awareness:

Read the new policy carefully and notice how much information we have leaked to the Internet.

So, what do you think of this?

Reference link:

[1] https://gizmodo.com/google-says-itll-scrape-everything-you-post-online-for-1850601486

[2] https://news.ycombinator.com/item?id=36577626

This article comes from the official account of Wechat: quantum bit (ID:QbitAI), author: Xiao Xiao

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report