In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article is mainly about "how to understand dict and set in Python". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand dict and set in Python.
How efficient are dict and set in Python?
Experiments show that no matter how many dictionaries or collections of elements are queried, the time spent can be ignored (as long as the dictionary or collection does not exceed the memory size).
Hash tables in dictionaries
A hash table is actually a sparse array (an array with blank elements is called a sparse array). In general data structure textbooks, units in hash tables are usually called bucket.
In the hash table of dict, each key-value pair occupies a table element, and each table element has two parts, one is the reference to the key, the other is the reference to the value.
Because all table elements are of the same size, a table element can be read by offset.
Python will try to ensure that about 1/3 of the table elements are empty, so when this threshold is reached, the original hash table will be copied to a larger space.
If you want to put an object in a hash table, first calculate the hash value of the element key. You can use the hash () method to do this in Python.
1. Hash value and equality
The built-in hash () method can be used for all built-in type objects. If a custom object calls hash (), you are actually running a custom _ _ hash__.
If the two objects are equal when comparing, then their hash values must be equal, otherwise the hash table will not work properly.
For example, if 11.0 is true, then hash (1) hash (1.0) must also be true, but the internal structure of the two numbers (integer and floating point) is completely different.
Now that integers are mentioned, one of the implementation details of CPython is: if there is an integer object and it can be stored in a machine word, then its hash value is its own value.
In order for hash values to be suitable for the role of hash table index, they must be dispersed as much as possible in the index space. This means that in an ideal situation, the more similar but unequal objects, the greater the difference in their hash values.
"" import sys# obtains the maximum integer value of the operating system through sys.maxsize, converts it to binary, calculates the number of bits, plus a symbol bit MAX_BITS = len (format (sys.maxsize,'b')) print ('% s-bit Python build'% (MAX_BITS + 1)) def hash_diff (o1, O2): H2 ='{: > 0 {} b} '.format (hash (o1), MAX_BITS) # calculates the hash value of o1 The hash value of O2 is calculated with 0 filling vacancy h3 ='{: > 0 {} b} '.format (hash (O2), MAX_BITS) #, and each bit of H2 and h3 is compared with 0 filling vacancy #, with! Identify it, otherwise use''to denote diff =. Join ('!' If b1! = b2 else''for b1, b2 in zip (H2, h3) count ='! = {} '.format (diff.count ('!)) # shows different totals width = max (repr (o1)), len (repr (O2)) 8) # width of line sep ='_'* (width * 2 + MAX_BITS) # split line return'{! r: {width}\ n {: {width}} {}\ n {! r: {width}\ n {}\ n {} '.format (o1, H2,' * width, diff, count, O2, H2, sep, width=width) print (hash_diff (1) Print (hash_diff (1.0,1.0001)) print (hash_diff (1.0001, 1.0002)) print (hash_diff (1.0002, 1.0003))
Starting from Python3.3, random salt addition is added to the calculation of hash values of str,bytes and datetime objects.
The added salt value is a constant in the Python process, but a different salt value is generated each time you start the Python interpreter.
The addition of random salt value is a security measure to prevent DOS attacks.
Hash table algorithm
In order to get the value behind my_ search [search _ key], Python will first call hash (search_key) to calculate the hash value of search_key, take the lowest digits of this value as offset, and look for table elements in the hash table (depending on the size of the current hash table). If the table element found is empty, a KeyError exception is thrown.
If it is not empty, there will be a pair of found_key:found_value in the table element. At this point, Python verifies that search_key = = found_key is true, and if so, it returns found_value.
If search_key and found_key do not match, this situation is called hash conflict. This happens because what a hash table does is map random elements to numbers with only a few digits, and the index of the hash table itself can only rely on part of that number. In order to solve the hash conflict, the algorithm will take a few more bits in the hash value, and then use a special method to deal with it, and then use the new number as an index to find table elements.
If the table element found this time is empty, KeyError; is also thrown if it is not empty, or if the key matches, the value is returned, or if a hash conflict is found, repeat the above steps.
The algorithm for taking values from the dictionary is as follows: given a key, the algorithm either returns a value or throws a KeyError exception
| |-| | calculate the hash value of the key _ use another part of the hash value to locate the zero line in the hash table | | | ↑ | | No (hash conflict) | ↓ | use one of the hash values | Some table elements | locate one-→ in the hash table is empty?-No-→ key is equal? | | table elements | | | Yes | Yes | | ↓ ↓ | | the value in the table element returned by throwing KeyError |-" -|
The operation of adding new elements and updating existing key values is almost the same as above. It's just that for the former, a new element is placed when an empty table element is found.
For the latter, after finding the corresponding table element, the original table-interior value object will be replaced with the new value.
In addition, when inserting a new value, Python may decide whether to reallocate memory to expand it according to the crowded hash table. If you increase the size of the hash table, the number of bits occupied by the hash value and the number of bits used as an index will increase in order to reduce the probability of hash conflicts.
On the face of it, the algorithm seems laborious, but in fact, there are millions of elements in dict, and there are no conflicts in most searches. On average, there may be one or two conflicts per search.
Under normal circumstances, the number of conflicts encountered by even the most unlucky keys can be counted with one hand.
The realization of dict and its result 1. Keys must be dead and hash.
A hashable object must meet the following requirements:
1) the hash () function is supported, and the hash value obtained by the _ _ hash__ () method is invariant.
2) equality can be detected by the _ _ eq__ () method.
3) if a = b is true, then hash (a) = = hash (b) is also true
All user-defined objects are hashable by default because their hash values are obtained by id () and they are not equal.
If you implement the _ _ eq__ () method of a class and want it to be hashable, then it must have an appropriate _ _ hash__ method at one point, which ensures that hash (a) = hash (b) must also be true if aplomb is true.
Otherwise, the constant hash table algorithm will be destroyed and the dictionaries and collections made up of these objects will be completely unreliable, which is a terrible consequence.
On the other hand, if a class with a custom _ _ eq__ dependency is in a mutable state, do not implement the _ _ hash__ method in this class, because its instance cannot be hashed.
A: there are no answers to problems encountered in study. The editor has created a Python learning exchange group: 725638078 look for like-minded partners to help each other, and there are also good video learning tutorials and PDF e-books in the group! Class A: def _ init__ (self, a): self.a = a def _ hash__ (self): return 1 def _ eq__ (self, other): return hash_diff (self, other) def _ repr__ (self): return str (self.a) a = A (1) b = A (2) D1 = {a: 1, b: 2 1: 3} print (D1) # {1: 3} will find that there is only one key-value pair 2. Dictionaries spend a lot of money on memory
Because the dictionary uses hash tables, which must be sparse, it is spatially inefficient. For example. If you need to store a large number of records, it would be a better choice to put it in a list of tuples or named tuples
It is best not to use a list of dictionaries to store these records according to the style of JSON. There are two reasons why tuples can save space:
One is to avoid the space consumed by hash tables. The second is that there is no need to store the names of the fields in the record in each element.
In a user-defined type, the _ _ slots__ attribute can change the storage of instance properties from dict to tuple.
3. Key query is quick.
Dict implementations are typically space-for-time: dictionary types have huge memory overhead, but they provide fast access regardless of the amount of data-as long as dictionaries can be stored in memory.
4. The order of keys depends on the order of addition.
When a new key is added to the dict and a hash conflict occurs, the new key may be arranged to be stored in another location. So the following happens:
The two dictionaries obtained by dict ([(key1, value1), (key2, value2)]) and dict ([(key2, value2), (key1, value1)]) are equal in comparison.
But if there is a conflict when key1 and key2 are added to the dictionary, the order in which the two keys appear in the dictionary is different.
The following example shows this current oak. This example creates three dictionaries with the same data. The only difference is that the data appear in a different order. As you can see, although the order of the keys is out of order, the three dictionaries are still considered equal.
STUDENTS = [(89, 'Sun WuKong'), (79, 'Zhu Bajie'), (69, 'Sha Wujing'), (59, 'Little White Dragon'), (49, 'Tang Monk')] D1 = dict (STUDENTS) print ('d1print), d1.keys ()) D2 = dict (sorted (STUDENTS)) print (' d2Ranger, d2.keys ()) d3 = dict (sorted (STUDENTS, key=lambda x: X [1]) print ('d3') D3.keys () assert D1 = = D2 and D2 = D35. Adding new keys to the dictionary may change the order of existing keys
Whenever a new key is added to the dictionary, the Python interpreter may make the decision to expand the dictionary. The result of expansion is to create a larger hash table and add existing elements in the dictionary to the new table.
This process may cause new hash conflicts, resulting in a change in the order of keys in the new hash table.
It is important to note that whether and how these changes will happen depends on the implementation behind the dictionary, so you can't be confident that you know what's going on behind it.
If you modify the dictionary at the same time while iterating over all the keys in a dictionary, the loop is likely to skip some keys-or even those already in the dictionary.
Therefore, do not iterate and modify the dictionary at the same time. If you want to scan and modify a dictionary, it is best to do it in two steps:
First, iterate through the dictionary to get what needs to be added, put it in a new dictionary, and update the original dictionary after the iteration.
In Python3, the .keys () .items () .values () method returns a dictionary view. That is, the values returned by these methods are more like collections.
The implementation of set and its resulting results
Set and frozenset implementations also rely on hash tables, but only element references are stored in their hash tables. Before set was added to Python, we used dictionaries plus meaningless values as sets.
1. The elements in the collection must be hashable.
two。 Collections consume a lot of memory.
3. You can efficiently determine whether an element exists in a collection.
4. The order of elements depends on the order in which they are added to the collection.
5. Adding elements to the collection may change the order of the existing elements in the collection.
At this point, I believe you have a deeper understanding of "how to understand dict and set in Python". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.