Using Redis ordered set to realize the detailed explanation of IP attribution query 07/01 Update SLTechnology News&Howtos

Using Redis ordered set to realize the detailed explanation of IP attribution query

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

A kind of requirement is often encountered in the work, which is to find the attribution information corresponding to the IP according to the IP address field. If you put the query process into a relational database, it will bring a lot of IO consumption, and the speed can not be satisfied, which is obviously not appropriate.

What are the better ways to do that? Some attempts have been made to this end, which are explained in more detail below.

Build index file

See an ip2region project on GitHub, the author implements a fast query by generating a file containing a secondary index, fast enough to millisecond level. However, if you want to update the address field or attribution information, it is not very convenient to regenerate the file every time.

However, it is recommended that you take a look at this project, in which the idea of building an index is still worth learning. In the author's open source project, there is only the code related to the query, and there is no code to generate the index file. I wrote a code to generate the index file according to the schematic, as follows:

#-*-coding:utf-8-*-import timeimport socketimport structIP_REGION_FILE ='. / data/ip_to_region.db'SUPER_BLOCK_LENGTH = 8INDEX_BLOCK_LENGTH = 12HEADER_INDEX_LENGTH = 8192def generate_db_file (): pointer = SUPER_BLOCK_LENGTH + HEADER_INDEX_LENGTH region, index ='' File format: # 1.0.0.0 | 1.0.0.255 | Australia | 0 | 0 | 0 | 0 | 0 # 1.0.1.0 | 1.0.3.255 | China | 0 | Fujian Province | Fuzhou | Telecom with open ('. / ip.merge.txt', 'r') as f: for line in f.readlines (): item = line.strip (). Split ('|) print item [0], item [1], item [2], item [3] Item [4], item [5], item [6] start_ip = struct.pack ('Illustrated, struct.unpack ('! lumped, socket.inet_aton (item [0])) [0]) end_ip = struct.pack ('irrelevant, struct.unpack ('! lumped, socket.inet_aton (item [1])) [0]) region_item ='| '.join ([item [2], item [3], item [4], item [5]) Item [6]]) region + = region_item ptr = struct.pack ('I, int (bin (len (region_item)) [2:] .zfill (8) + bin (pointer) [2:] .zfill (24), 2)) index + = start_ip + end_ip + ptr pointer + = len (region_item) index_start_ptr = pointer index_end_ptr = pointer + len (index)-12 super_block = struct.pack ('I'') Index_start_ptr) + struct.pack ('Illustrated, index_end_ptr) n = 0 header_index =' 'for index_block in range (pointer, index_end_ptr, 8184): header_index_block_ip = index [n * 8184 header_index n * 8184 + 4] header_index_block_ptr = index_block header_index + = header_index_block_ip + struct.pack (' I') Header_index_block_ptr) n + = 1 header_index + = index [len (index)-12: len (index)-8] + struct.pack ('with open, index_end_ptr) with open (IP_REGION_FILE,' wb') as f: f.write (super_block) f.write (header_index) f.seek (SUPER_BLOCK_LENGTH + HEADER_INDEX_LENGTH 0) f.write (region) f.write (index) if _ _ name__ ='_ _ main__': start_time = time.time () generate_db_file () print 'cost time:', time.time ()-start_time

Use Redis caching

Currently, there are two ways to cache IP and attribution information:

The first is to convert the starting IP, the ending IP and all the intermediate IP into integers, and then use the converted IP as the key and the attribution information as the value to be stored in Redis as a string.

The second is to use the ordered set and hash mode, first add the starting IP and ending IP to the ordered set ip2cityid, the city ID as the member, and the transformed IP as the score, and then add the city ID and attribution information to the hash cityid2city, the city ID as the key, and the attribution information as the value.

The first way is not to make much introduction, simple and rude, very not recommended. Of course, the query speed is very fast, millisecond level, but the disadvantage is also very obvious. I tested it with 1000 pieces of data. The cache time is long, about 20 minutes, and takes up a lot of space, nearly 1G.

The second way is to look directly at the code:

# generate_to_redis.py#-*-coding:utf-8-*-import timeimport jsonfrom redis import Redisdef ip_to_num (x): return sum ([256 * * j * int (I) for j, i in enumerate (x.split ('.') [::-1])]) # connection Redisconn = Redis (host='127.0.0.1', port=6379 Db=10) start_time = time.time () # File format: # 1.0.0.0 | 1.0.0.255 | Australia | 0 | 0 | 0 | salary 1.0.1.0 | 1.0.3.255 | China | 0 | Fujian Province | Fuzhou | Telecom with open ('. / ip.merge.txt' 'r') as f: I = 1 for line in f.readlines (): item = line.strip (). Split ('|') # add the starting IP and ending IP to the ordered set ip2cityid # members are city ID and ID + #, respectively, and the score is the integer values conn.zadd ('ip2cityid', str (I), ip_to_num (item [0]), str (I) +' # 'calculated according to IP Ip_to_num (item [1]) + 1) # add city information to the hash cityid2city Key is the city ID. The value is the json sequence of city information conn.hset ('cityid2city', str (I), json.dumps ([item [2], item [3], item [4], item [5]])) I + = 1end_time = time.time () print' start_time:'+ str (start_time) +', end_time:'+ str (end_time) +' Cost time:'+ str (end_time-start_time) # test.py#-*-coding:utf-8-*-import sysimport timeimport jsonimport socketimport structfrom redis import Redis# connection Redisconn = Redis (host='127.0.0.1', port=6379, db=10) # convert IP to integer ip = struct.unpack ("! L", socket.inet_aton (sys.argv [1])) [0] start_time = time.time () # sort ordered sets from large to small Take the first piece of data less than the input IP value cityid = conn.zrevrangebyscore ('ip2cityid', ip, 0, start=0, num=1) # if the returned cityid is empty, or matches to the # number Indicates that the corresponding address range if not cityid or cityid [0] .endswith ('#'): print'no city info...'else: # take out the city information according to the city ID to hash table ret = json.loads ('cityid2city', cityid [0]) print ret [0], ret [1], ret [2] end_time = time.time () print' start_time:'+ str (start_time) +' End_time:'+ str (end_time) +', cost time:'+ str (end_time-start_time) # python generate_to_redis.py start_time: 1554300310.31, end_time: 1554300425.65, cost time: 115.33326005Japan python test_2.py 1.0.16.0 Japan 0 0start_time: 1555081532.44, end_time: 1555081532.45, cost time: 0.000912189483643

The test data is about 500000, the cache takes less than 2 minutes, occupies 182 megabytes of memory, and the query speed is millisecond. Obviously, this approach is more worth trying.

The time complexity of zrevrangebyscore method is O (log (N) + M), where N is the cardinality of the ordered set and M is the cardinality of the result set. It can be seen that when the value of N is larger, the query efficiency is slower, and the specific amount of data can be queried efficiently, which needs to be verified. But I don't think we need to worry about this problem. We'll talk about it when we encounter it.

The above is the editor to introduce to you the use of Redis orderly collection to achieve IP attribution query detailed integration, I hope to help you, if you have any questions, please leave me a message, the editor will reply to you in time. Thank you very much for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.