In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
In this issue, the editor will bring you about the internal implementation principle of the dictionary in python. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.
The data structure used inside python's dictionary is the hash table.
I. related concepts of hash table
A hash table is actually a sparse array (an array that always has blank elements is called a sparse array). It is a data structure that directly accesses the memory location according to the key code value (Key-value).
Hash function: also known as a hash function, is a mapping function of the Hash table, which can transform an input of any length into a fixed-length output, which is a hash value. By using the hash function to determine the storage location of the element in the hash table, the hash function can make the access to a data sequence more quickly and efficiently. Through the hash function, the data element can be located quickly.
Units in a hash table are usually called bucket. In the dict hash table, each key-value pair occupies a table element, and each table element has two parts, one is a reference to the key, and the other is a reference to the value. Because all table elements are of the same size, a table element can be read by offset.
Second, the dictionary dict look up the principle of the value
Getting its value value through the dictionary's key can be found by dict.get (key) or dict [key], but what is the internal implementation principle?
Python will first call hash (search_key) to calculate the hash value of search_key, taking the lowest digits of this value as an offset, and looking for table elements in the hash table (depending on the size of the current hash table). If the table element found is empty, a KeyError exception is thrown. If it is not empty, there will be a pair of found_key:found_value in the table element. At this point, Python verifies that search_key = = found_key is true, and if they are equal, it returns found_value.
If search_key and found_key do not match, this is called a hash conflict. This happens because what a hash table actually does is map random elements to numbers with only a few digits, and the index of the hash table itself depends on only part of that number. In order to solve the hash conflict, the algorithm will take a few more bits in the hash value, and then deal with it in a special way, using the new number as an index to find the table element. If the table element found this time is empty, KeyError; is also thrown if it is not empty, or if the key matches, the value is returned, or if a hash conflict is found, repeat the above steps.
III. Addition and modification of the dictionary dict
Adding new elements and updating existing key values in a dictionary is almost the same as a lookup operation. However, for addition, a new element will be placed when an empty table element is found; for an update operation, the value object in the original table will be replaced with the new value after the corresponding table element is found.
In addition, when inserting a new value, Python may decide whether to reallocate memory to expand it based on the congestion of the hash table. If you increase the size of the hash table, the number of bits occupied by the hash value and the number of bits used as an index will increase in order to reduce the probability of hash conflicts.
IV. Summary of the characteristics of the dictionary dict
Because the dictionary uses a hash table, which must be sparse, it is spatially inefficient. For example, if you need to store a large number of records, it would be a better choice to put them in a list of tuples or named tuples; it is best not to use a list of dictionaries to store these records according to the style of JSON. Replacing dictionaries with tuples can save space for two reasons:
One is to avoid the space consumed by hash tables.
The second is that there is no need to store the names of the fields in the record in each element.
The implementation of dict is typically space-for-time: dictionary types have huge memory overhead, but they provide fast access regardless of the amount of data-- as long as dictionaries can be stored in memory.
Whenever a new key is added to the dictionary, the Python interpreter may make the decision to expand the dictionary. The result of the expansion is to create a larger hash table and add the elements already in the dictionary to the new table. A new hash conflict may occur during this process, resulting in a change in the order of the keys in the new hash table.
Whether and how the changes mentioned above will happen depends on the implementation behind the dictionary, so you can't say confidently that you know what's going on behind it. If you modify the dictionary while iterating over all the keys in a dictionary, the loop is likely to skip some keys-or even those already in the dictionary.
The above is the internal implementation principle of the dictionary in python shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.