Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The Scheme of realizing UA Pool with Redis

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Recently, I have been busy with business development, handover and games, coupled with occasional hesitation and confusion, and abandoned my studies for a period of time. It's getting cold, so it's time to pick up and start the next stage of study. Some of the previous data search projects involve request simulation, based on anti-crawling needs to use random User Agent, so use Redis to achieve a very simple UA pool.

Background

A recent requirement, which has the logic to simulate a request, requires the User Agent in the request header of each request to meet the following points:

The User Agent obtained each time is random. The User Agent obtained each time (within a short period of time) cannot be repeated. Each User Agent acquired must be accompanied by mainstream operating system information (which can be Uinux, Windows, IOS, Android, etc.).

All three points here can be solved from the source of UA data, in fact, we should focus on the specific implementation. For a brief analysis, the process is as follows:

When designing a UA pool, its data structure is very similar to a circular queue:

In the image above, assuming that the UA of different colors are completely different UA, they are scattered into the circular queue through the shuffle algorithm. In fact, after each UA is taken out, you only need to put the cursor cursor forward or backward one grid (you can even set the cursor to any element in the queue). The final implementation is that distributed queues (only queues, not message queues) need to be implemented through middleware.

Concrete realization scheme

There is no doubt that you need a distributed database type of middleware to store the prepared UA, and Redis will be more appropriate at first impression. Next, you need to choose the data type of Redis, mainly considering several aspects:

UA

The Redis data type that supports these aspects is List, but note that List itself cannot be deduplicated, and the deduplication can be done with code logic. Then you can imagine that the process for the client to obtain the UA is roughly as follows:

Combined with the previous analysis, the coding process has the following steps:

Prepare the UA data that needs to be imported, which can be read from the data source or directly from the file.

Because the UA data set that needs to be imported is generally not too large, consider breaking up the data of this collection at random first. If you use Java to develop, you can directly use Collections#shuffle () shuffle algorithm, of course, you can also implement the algorithm of random distribution of data on your own. This step is necessary for some scenarios where the simulated party will strictly test the validity of UA. Import UA data into the Redis list. Write the Lua script of RPOP + LPUSH to realize the distributed circular queue.

Coding and testing samples

Advanced client-side Lettuce dependencies introduced by Redis:

Io.lettuce lettuce-core 5.2.1.RELEASE

Write a Lua script for RPOP + LPUSH. The name of the Lua script is temporarily L_RPOP_LPUSH.lua, and put it in the resources/scripts/lua directory:

Local key = KEYS [1] local value = redis.call ('RPOP', key) redis.call (' LPUSH', key, value) return value

This script is very simple, but it has implemented the function of circular queue. The remaining test code is as follows:

Public class UaPoolTest {private static RedisCommands COMMANDS; private static AtomicReference LUA_SHA = new AtomicReference (); private static final String KEY = "UA_POOL"; @ BeforeClass public static void beforeClass () throws Exception {/ / initialize the Redis client RedisURI uri = RedisURI.builder (). WithHost ("localhost"). WithPort (6379). Build (); RedisClient redisClient = RedisClient.create (uri); StatefulRedisConnection connect = redisClient.connect (); COMMANDS = connect.sync () / / simulate the raw data used to build the UA pool, assuming that there are 10 UA, namely UA-0... UA-9 List uaList = Lists.newArrayList (); IntStream.range (0,10) .forEach (e-> uaList.add (String.format ("UA-%d", e)); / / shuffle Collections.shuffle (uaList); / / load Lua script ClassPathResource resource = new ClassPathResource ("/ scripts/lua/L_RPOP_LPUSH.lua"); String content = StreamUtils.copyToString (resource.getInputStream (), StandardCharsets.UTF_8) String sha = COMMANDS.scriptLoad (content); LUA_SHA.compareAndSet (null, sha); / / UA data is written in the Redis queue. When there is a large amount of data, you can consider writing in batches to prevent long-term blocking of Redis service COMMANDS.lpush (KEY, uaList.toArray (new String [0]));} @ AfterClass public static void afterClass () throws Exception {COMMANDS.del (KEY) } @ Test public void testUaPool () {IntStream.range (1,21) .forEach (e-> {String result = COMMANDS.evalsha (LUA_SHA.get (), ScriptOutputType.VALUE, KEY); System.out.println (String.format ("% d obtained UA is:% s", e, result);});}}

The result of one run is as follows:

The UA obtained for the first time is: UA-0

The second UA obtained is: UA-8

The UA obtained for the third time is UA-2

The UA obtained for the fourth time is UA-4

The UA obtained for the fifth time is: UA-7

The UA obtained for the sixth time is UA-5

The UA obtained for the seventh time is: UA-1

The UA obtained for the 8th time is: UA-3

The UA obtained for the 9th time is UA-6

The UA obtained for the 10th time is UA-9

The 11th UA obtained is: UA-0

The UA obtained for the 12th time is UA-8

The UA obtained for the 13th time is UA-2

The UA obtained for the 14th time is: UA-4

The UA obtained for the 15th time is: UA-7

The UA obtained for the 16th time is: UA-5

The UA obtained for the 17th time is: UA-1

The UA obtained for the 18th time is UA-3

The UA obtained for the 19th time is: UA-6

The UA obtained for the 20th time is UA-9

It can be seen that the effect of shuffling algorithm is not bad, and the data is relatively scattered.

Summary

In fact, the design of UA pool is not very difficult, so we need to pay attention to several key points:

Generally speaking, there are not too many system versions of mainstream mobile devices or desktop devices, so the source UA data will not be too much. The simplest implementation can use file storage and write directly to the Redis at a time. Note that you need to randomly break up UA data to avoid too dense UA data of the same device system type, so as to avoid triggering risk control rules for simulating certain requests. Need to be familiar with the syntax of Lua, after all, Redis atomic instructions must be inseparable from the Lua script.

Summary

The above is the scheme that the editor introduced to you to use Redis to achieve UA pool. I hope it will be helpful to you. If you have any questions, please leave me a message and the editor will reply you in time. Thank you very much for your support to the website!

If you think this article is helpful to you, you are welcome to reprint it, please indicate the source, thank you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report