Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Google Analysis of language Spam

2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Share

Shulou(Shulou.com)06/01 Report--

Recently, when I was looking at website data in Google Analytics (hereinafter referred to as GA), I found a very suspicious message:

What the heck is this?

The "language" item is generally "zh-tw", "zh-cn", "en-us", "es", "fr" and so on. It is set by the user's browser. However, it is not possible for the user's browser to set the language entry to these:

Secret.oogle.com You are invited! Enter only with this ticket URL. Copy it. Vote for Trump!

O-o-8-o-o.com search shell is much better than google!

Vitaly rules google ☆: "(^)": "☆" _ (")" / "(") (°) "(") (") (= ^ ^ =) oO"

Congratulations to Trump and all americans

Analysis request

Obviously, this is a new kind of spam (spam) that hopes to attract the attention of the target group (probably network administrators like us).

If you look closely at these requests, you will find several characteristics:

The number of requests has an obvious peak, which will peak in a few days, and then fall again.

The proportion of New Sessions is very high, reaching more than 86%.

Analyze other columns that are subject to *

If you take a closer look at these requests for the GA statistics project, you can see that these also appear suspiciously in the referrer column:

Motherboard.vice.com addons.mozilla.org webmasters.stackexchange.com blackhatworld.com thenextweb.com abc.xyz lifehacer.com...

There are some very formal domain names. For example, abc.xyz is the official website of alphabet, the parent company of Google, and thenextweb.com is also a developer media.

There is nothing wrong with these sites, but there is not a single link to my blog, let alone my visitors "referral" from the site. When I visited back, I found that there were no actual links, just advertisements, but what was even more frightening was that there might be fishing, viruses and so on.

To sum up, this is a wave of two-dimensional terms (fake language terms and fake referral terms) designed to get your attention.

Analyze how it came into being.

Generally speaking, there are two ways of spam. One is a web crawler that actually visits your site; the other is not visiting your site but sending fake "click" events directly to the GA server. In fact, the second method is more common because of its low cost.

The statistical logic of GA is that when a user visits your site, a piece of JavaScript code will be run at the front end, and then some of the user's access will be sent to the GA server through an HTTP request, informing GA that a "visit" has occurred. This HTTP request can be easily forged, so spammers don't have to actually visit your site, it can just send a large number of HTTP requests.

In addition to HTTP requests, GA also supports a more convenient Measurement Protocol, where developers can send raw data (raw data) to GA to transmit a large number of user behaviors at once. The original purpose of this protocol is to allow developers to count user behavior in all environments. For example, developers can record user behavior offline and send it at once when online. Or when the internal network does not support external access, first record the user behavior, and then regularly send it to GA at one time.

The original intention is good, unfortunately, this process still does not require authentication, so it is more convenient for spammers. Spammers can send large amounts of fake data in a single request. All they need is your UA-ID (UA-XXXXXXX-XX).

In this original packet, everything can be forged. Hostname? No problem! Referral? It's all changed! URL path? Of course, it can also be changed.

How to avoid

For website owners, this kind of spam has several hazards: 1. Waste of time, just like traditional spam. 2. Interfere with the GA status bar, especially if the site traffic is not very large (such as me). 3. Spread the virus.

So, is there a perfect solution? Actually, no.

You know, once the data is entered into GA, there is no way to delete it. There are only two things you can do, one is to prevent spam from being further added to GA, and the other is to filter out the spam that has been added in the view.

Step 1: use filters to block future spam

In a language item, the general number of characters is 5-6, and there are rarely more than 10 characters, so we can think that a language item with more than 15 characters must be spam.

In addition, there are some characters that are impossible to appear in legitimate language items, but spam will use these characters to form URL, such as "secret google com", "secret,google,com", "secretsecretgoogle.com", so we also exclude spaces, dots, commas and exclamation marks.

. {15,} |\ s [^\ s] *\ s |\. | |\! |\ /

Select Filter in admin and add a filter as shown in the figure.

Once set up, you can verify that what we want to intercept will be blocked:

There is no problem, the future spam of language items will be blocked directly.

Step 2: purify existing data through Segment

The filter takes effect when you start setting it, and the historical information cannot be modified. However, GA provides the custom segment function to selectively filter out some data when generating reports.

Segment is a piece of data, which refers to taking the fragments we need from the complete data for analysis. For example, we can separately take out the behavior of users under the age of 24 and compare the behavior of users over the age of 24. And this function just allows us to filter out the non-standard data of the "language".

As shown in the figure below, there is a + Add Segment button next to All Users, which can be clicked to configure our segment.

Be sure to select "does not match regex" and fill it with the previous rules.

Once you have created a new segment, you will see a new filtered report.

The next time you enter it, you will enter the All Users view by default. At this time, you can find our custom segment in the custom in All Users and check it.

If you often need to check this segment, it is recommended to click shortcut to add shortcuts.

It will save the current segment, sorting, etc., and next time you can enter this shortcut directory directly from the Short project.

This is the complete filtering method, which should filter out most of the language spam. The filter and segment provided by GA are very powerful, and if you find any new spam later, you can continue to update and improve our filters through the methods we learned today.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Network Security

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report