Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Case Analysis of python string resident Technology

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains the "python string resident technology case analysis", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's train of thought slowly in depth, together to study and learn "python string resident technology case analysis"!

1. What is "string resident"?

String residency is a compiler / interpreter optimization method that saves space and time for string processing tasks by caching generic strings.

This optimization method does not create a new copy of the string each time, but keeps only one copy of the string for each appropriate immutable value and references it with a pointer. The unique copy of each string is called its intern, hence the name String Interning.

String Interning is generally translated as "string resident" or "string retention". In some languages, it may be used to use the concept of String Pool (string constant pool), which is actually a different representation of the same mechanism. When intern is used as a noun, it means "intern, intern", which can be understood as "resident object, resident value".

The method to find the string intern may or may not be exposed as a public interface. Modern programming languages such as Java, Python, PHP, Ruby, Julia, and so on, all support string residency to achieve high performance in their compilers and interpreters.

2. Why should the string be resided?

String residency increases the speed of string comparison. If there is no resident, when we want to compare whether two strings are equal, its time complexity will rise to O (n), that is, we need to check each character in the two strings to determine whether they are equal.

However, if the string is fixed, because the same string will use the same object reference, just checking whether the pointers are the same is enough to determine whether the two strings are equal, instead of checking each character one by one. Because this is a very common operation, it is typically implemented as a pointer equality check, using only a machine instruction that has no memory reference at all.

String residency reduces memory footprint. Python avoids flooding memory with redundant string objects and optimizes memory footprint by sharing and reusing defined objects through meta-design patterns.

3. String residence of Python

Like most other modern programming languages, Python uses string residency to improve performance. In Python, we can use the is operator to check whether two objects refer to the same memory object.

Therefore, if two string objects refer to the same memory object, the is operator will yield True, otherwise False.

> > 'python' is' python'

True

We can use this specific operator to determine which strings are hosted. In CPython, string residency is implemented through the following functions, declared in unicodeobject.h and defined in unicodeobject.c.

PyAPI_FUNC (void) PyUnicode_InternInPlace (PyObject *)

To check whether a string is hosted, CPython implements a macro called PyUnicode_CHECK_INTERNED, which is also defined in unicodeobject.h.

This macro indicates that Python maintains a member variable named interned in the PyASCIIObject structure, whose value indicates whether the corresponding string is hosted or not.

# define PyUnicode_CHECK_INTERNED (op) (PyASCIIObject *) (op))-> state.interned) 4. The principle of string residency

In CPython, references to strings are stored, accessed, and managed by a Python dictionary called interned. The dictionary is initialized deferred the first time the string resides and holds references to all resident string objects.

4.1 how do I host a string?

The core function responsible for hosting strings is PyUnicode_InternInPlace, which is defined in unicodeobject.c. When called, it creates a dictionary interned that is ready to hold all resident strings, and then registers into the object in the parameter so that its keys and values all use the same object reference.

The following function snippet shows how Python implements string residency.

Void PyUnicode_InternInPlace (PyObject * * p) {PyObject * s = * p. / / Lazily build the dictionary to hold interned Strings if (interned = = NULL) {interned = PyDict_New (); if (interned = = NULL) {PyErr_Clear (); return;}} PyObject * t; / / Make an entry to the interned dictionary for the / / given object t = PyDict_SetDefault (interned, s, s) . / / The two references in interned dict (key and value) are / / not counted by refcnt. / / unicode_dealloc () and _ PyUnicode_ClearInterned () take / / care of this. Py_SET_REFCNT (s, Py_REFCNT (s)-2); / / Set the state of the string to be INTERNED_ PyUnicode_STATE (s). Interned = SSTATE_INTERNED_MORTAL;} 4.2 how do I clean up the resident string?

The cleanup function iterates through all the strings from the interned dictionary, adjusts the reference count of these objects, and marks them as NOT_INTERNED so that they are garbage collected. Once all strings are marked NOT_INTERNED, the interned dictionary is cleared and deleted.

The cleanup function is _ PyUnicode_ClearInterned, which is defined in unicodeobject.c.

Void _ PyUnicode_ClearInterned (PyThreadState * tstate) {. / / Get all the keys to the interned dictionary PyObject * keys = PyDict_Keys (interned); / / Interned Unicode strings are not forcibly deallocated; / / rather, we give them their stolen references back / / and then clear and DECREF the interned dict. For (Py_ssize_t I = 0; I < n; iTunes +) {PyObject * s = PyList_GET_ITEM (keys, I);. Switch (PyUnicode_CHECK_INTERNED (s)) {case SSTATE_INTERNED_IMMORTAL: Py_SET_REFCNT (s, Py_REFCNT (s) + 1); break; case SSTATE_INTERNED_MORTAL: / / Restore the two references (key and value) ignored / / by PyUnicode_InternInPlace (). Py_SET_REFCNT (s, Py_REFCNT (s) + 2); break; case SSTATE_NOT_INTERNED: / * fall through * / default: Py_UNREACHABLE ();} / / marking the string to be NOT_INTERNED _ PyUnicode_STATE (s). Interned = SSTATE_NOT_INTERNED } / / decreasing the reference to the initialized and / / access keys object. Py_DECREF (keys); / / clearing the dictionary PyDict_Clear (interned); / / clearing the object interned Py_CLEAR (interned);} 5. Implementation of string residence

Now that we understand the internal principles of string residency and cleanup, we can find all the strings that will be resided in Python.

To do this, all we have to do is look for the call to the PyUnicode_InternInPlace function in the CPython source code and look at the code near it. Here are some interesting findings about string hosting in Python.

5.1 variables, constants and function names

CPython performs string residency on constants such as function names, variable names, string literals, and so on.

The following code, from codeobject.c, indicates that when a new PyCode object is created, the interpreter will host all compile-time constants, names, and literals.

PyCodeObject * PyCode_NewWithPosOnlyArgs (int argcount, int posonlyargcount, int kwonlyargcount, int nlocals, int stacksize, int flags, PyObject * code, PyObject * consts, PyObject * names, PyObject * varnames, PyObject * freevars, PyObject * cellvars, PyObject * filename, PyObject * name, int firstlineno PyObject * linetable) {. If (intern_strings (names) < 0) {return NULL;} if (intern_strings (varnames) < 0) {return NULL;} if (intern_strings (freevars) < 0) {return NULL;} if (intern_strings (cellvars) < 0) {return NULL } if (intern_string_constants (consts, NULL) < 0) {return NULL;}. } 5.2 Dictionary key

CPython also hosts the string keys of any dictionary object.

When an element is inserted into the dictionary, the interpreter resides the key of the element as a string. The following code, from dictobject.c, shows the actual behavior.

What's interesting: there's a comment where the PyUnicode_InternInPlace function is called that asks, do we really need to reside all the keys in all dictionaries?

Int PyDict_SetItemString (PyObject * v, const char * key, PyObject * item) {PyObject * kv; int err; kv = PyUnicode_FromString (key); if (kv = = NULL) return-1; / / Invoking String Interning on the key PyUnicode_InternInPlace (& kv); / * XXX Should we really? * / err = PyDict_SetItem (v, kv, item); Py_DECREF (kv); return err } 5.3 Properties of any object

The properties of an object in Python can be set explicitly through the setattr function, implicitly as part of a class member, or predefined in its data type.

CPython resides on all of these property names for quick lookup. The following is a code snippet of the function PyObject_SetAttr, which is defined in the file object.c and is responsible for setting new properties for the Python object.

Int PyObject_SetAttr (PyObject * v, PyObject * name, PyObject * value) {. PyUnicode_InternInPlace (& name); } 5.4 explicitly resident

Python also supports explicit string residency through the intern function in the sys module.

When this function is called with any string object, the string object is resided. The following is a code snippet from the sysmodule.c file, which shows the string resident process in the sys_intern_impl function.

Static PyObject * sys_intern_impl (PyObject * module, PyObject * s) {. If (PyUnicode_CheckExact (s)) {Py_INCREF (s); PyUnicode_InternInPlace (& s); return s;}. } 6. Other discoveries of string residency

Only compile-time strings will be resided. Strings specified at interpretation time or compile time are resided, while dynamically created strings are not.

Python Cat Note: this rule is worth thinking about. I have stepped on it. There are two knowledge points that I believe 99% of people do not know: the join () method of a string is to create a string dynamically, so the string it creates will not be resident; and the constant folding mechanism also occurs at compile time, so it is sometimes easy to confuse it with string residence.

Strings that contain ASCII characters and underscores are resided. During compilation, when residing string literals, CPython ensures that only constants that match the regular expression [a-zA-Z0-9] * reside because they are very close to the identifier of Python.

Note: with regard to the naming rules for identifiers in Python, there are only "letters, numbers, and underscores" in the Python2 version, but Unicode encoding is already supported in Python 3.x.

Thank you for your reading, the above is the content of "python string resident technology example analysis". After the study of this article, I believe you have a deeper understanding of the problem of python string resident technology example analysis, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report