Explore the design and implementation of Redis 8: data structure robj connecting the bottom layer and the surface 07/02 Update SLTechnology News&Howtos

Explore the design and implementation of Redis 8: data structure robj connecting the bottom layer and the surface

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article is from the Internet.

This series of articles will be sorted out in my Java interview Guide warehouse on GitHub. Please check out more wonderful content in my warehouse.

Https://github.com/h3pl/Java-Tutorial

Have some trouble with Star if you like.

The article was first posted on my personal blog:

Www.how2playlife.com

This article is one of "exploring the Design and implementation of Java" on the official account of Wechat [Redis Technology]. Part of the content of this article comes from the network. In order to explain the topic of this article clearly and thoroughly, and integrate a lot of technical blog content that I think is good, quote some good blog articles, if there is any infringement, please contact the author.

This series of blog posts will show you how to start to advanced, the basic usage of Redis, the basic data structure of Redis, and some advanced usage. At the same time, you also need to further understand the underlying data structure of Redis. Then, it will also bring Redis master-slave replication, clustering, distributed locks and other related content, as well as some usage and precautions as a cache. So that you can have a more complete understanding of the whole Redis-related technology system and form your own knowledge framework.

If you have any suggestions or questions about this series of articles, you can also follow the official account [Java Technology jianghu] to contact the author. You are welcome to participate in the creation and revision of this series of blog posts.

This article is the third in a series of "detailed explanation of Redis internal data structures", which describes a basic data structure in Redis implementation: robj.

So what on earth is robj? What's its use?

From the perspective of Redis consumers, a Redis node contains multiple database (16 by default in non-cluster mode and only 1 in cluster mode), while a database maintains a mapping from key space to object space. The key of this mapping relationship is of string type, while value can be of multiple data types, such as string, list, hash, etc. As we can see, the type of key is always string, while the possible types of value are multiple.

From the perspective of the internal implementation of Redis, as we mentioned in the first article, this mapping within a database is maintained by a dict. The key of dict is always expressed in one data structure, which is the dynamic string sds. Value is more complex, in order to store different types of value in the same dict, it requires a general data structure, this general data structure is robj (full name is redisObject). For example: if value is a list, then its internal storage structure is a quicklist (the specific implementation of quicklist is discussed in a later article); if value is a string, then its internal storage structure is generally a sds. Of course, the actual situation is a little more complicated, such as a value of type string, if its value is a number, then the Redis will convert it to a long type for storage, thus reducing memory usage. A robj can represent both a sds, a quicklist, and even a long.

Data structure definition of robj

We found the code related to the robj definition in server.h, as follows (note that the code snippets in this series all come from the 3.2branch of the Redis source code):

/ * Object types * / # define OBJ_STRING 0#define OBJ_LIST 1#define OBJ_SET 2#define OBJ_ZSET 3#define OBJ_HASH 4amp * Objects encoding. Some kind of objects like Strings and Hashes can be * internally represented in multiple ways. The 'encoding' field of the object * is set to one of this fields for this object. * / # define OBJ_ENCODING_RAW 0 / * Raw representation * / # define OBJ_ENCODING_INT 1 / * Encoded as integer * / # define OBJ_ENCODING_HT 2 / * Encoded as hash table * / # define OBJ_ENCODING_ZIPMAP 3 / * Encoded as zipmap * / # define OBJ_ENCODING_LINKEDLIST 4 / * Encoded as regular linked list * / # define OBJ_ENCODING_ZIPLIST 5 / * Encoded as ziplist * / # define OBJ_ENCODING_INTSET 6 / * Encoded as intset * / # define OBJ_ENCODING_SKIPLIST 7 / * Encoded as skiplist * / # define OBJ_ENCODING_EMBSTR 8 / * Embedded sds string encoding * / # define OBJ_ENCODING_QUICKLIST 9 / * Encoded as linked list of ziplists * / # define LRU_BITS 24typedef struct redisObject {unsigned type:4 Unsigned encoding:4; unsigned lru:LRU_BITS; / * lru time (relative to server.lruclock) * / int refcount; void * ptr;} robj

A robj contains the following five fields:

Type: the data type of the object. Accounted for 4 bit. There are five possible values: OBJ_STRING, OBJ_LIST, OBJ_SET, OBJ_ZSET, OBJ_HASH, corresponding to the five data structures exposed by Redis (that is, the five data structures at the first level we mentioned in the first article). Encoding: the internal representation of the object (which can also be called encoding). Accounted for 4 bit. There are 10 possible values, namely the 10 OBJ_ENCODING_XXX constants in the previous code. Lru: used for LRU replacement algorithm, accounting for 24 bit. This is not the focus of our discussion here. We will ignore it for the time being. Refcount: reference count. It allows robj objects to be shared in some cases. Ptr: data pointer. Point to the real data. For example, a robj that represents string, whose ptr may point to a sds structure, and a robj that represents list, whose ptr may point to a quicklist.

What is particularly important to look at here is the encoding field. For the same type, it may also correspond to different encoding, indicating that there may be different internal representations for the same data type. Different internal representations vary in memory footprint and lookup performance.

For example, when type = OBJ_STRING, it means that the robj stores a string, and encoding can be one of the following three:

OBJ_ENCODING_RAW: string is expressed in the native way, that is, in sds. OBJ_ENCODING_INT: string is expressed as a number and is actually a long type. OBJ_ENCODING_EMBSTR: string is represented by a special embedded sds. We will discuss this detail next.

To give another example: when type = OBJ_HASH, the robj stores a hash, and encoding can be one of the following two types:

OBJ_ENCODING_HT: hash is represented by a dict. OBJ_ENCODING_ZIPLIST: hash is represented by a ziplist (the specific implementation of ziplist will be discussed in a later article).

The rest of this article will focus on the robj object that represents string, focusing on its three different encoding. All of the 10 encoding that appear in the previous code snippet are briefly explained here, and we should still have a chance to encounter them in later articles in this series.

OBJ_ENCODING_RAW: the most native representation. In fact, only the string type will use this encoding value (represented as sds). OBJ_ENCODING_INT: expressed as a number. It is actually expressed in long. OBJ_ENCODING_HT: expressed as dict. OBJ_ENCODING_ZIPMAP: it's an old expression that's no longer in use. It is only available in versions less than Redis 2.6. OBJ_ENCODING_LINKEDLIST: it's also an old expression, and it's no longer used. OBJ_ENCODING_ZIPLIST: expressed as ziplist. OBJ_ENCODING_INTSET: expressed as intset. For set data structures. OBJ_ENCODING_SKIPLIST: expressed as skiplist. For sorted set data structures. OBJ_ENCODING_EMBSTR: represented as a special embedded sds. OBJ_ENCODING_QUICKLIST: expressed as quicklist. For list data structures.

Let's summarize the role of robj:

Provides a unified representation of multiple data types. Allow different internal representations of the same type of data, thus saving memory as much as possible in some cases. Object sharing and reference counting are supported. When objects are shared, only one copy of memory is occupied, further saving memory. The coding process of string robj

When we execute the set command of Redis, Redis first represents the received value (string type) as a robj object of type = OBJ_STRING and encoding = OBJ_ENCODING_RAW, and then executes an encoding process before storing it in internal storage, trying to represent it as another more memory-saving encoding way. The core code of this process is the tryObjectEncoding function in object.c.

Robj * tryObjectEncoding (robj * o) {long value; sds s = o-> ptr; size_t len; / * Make sure this is a string object, the only type we encode * in this function. Other types use encoded memory efficient * representations but are handled by the commands implementing * the type. * / serverAssertWithInfo (NULL,o,o- > type = = OBJ_STRING); / * We try some specialized encoding only for objects that are * RAW or EMBSTR encoded, in other words objects that are still * in represented by an actually array of chars. * / if (! sdsEncodedObject (o)) return o; / * It's not safe to encode shared objects: shared objects can be shared * everywhere in the "object space" of Redis and may end in places where * they are not handled. We handle them only as values in the keyspace. * / if (o-> refcount > 1) return o; / * Check if we can represent this string as a long integer. * Note that we are sure that a string larger than 21 chars is not * representable as a 32 nor 64 bit integer. * / len = sdslen (s); if (len = 0 & & value)

< OBJ_SHARED_INTEGERS) { decrRefCount(o); incrRefCount(shared.integers[value]); return shared.integers[value]; } else { if (o->

Encoding = = OBJ_ENCODING_RAW) sdsfree (o-> ptr); o-> encoding = OBJ_ENCODING_INT; o-> ptr = (void*) value; return o;}} / * If the string is small and is still RAW encoded, * try the EMBSTR encoding which is more efficient. * In this representation the object and the SDS string are allocated * in the same chunk of memory to save space and cache misses. * / if (len encoding = = OBJ_ENCODING_EMBSTR) return o; emb = createEmbeddedStringObject (dslen (s)); decrRefCount (o); return emb;} / * We can't encode the object... * * Do the last try, and at least optimize the SDS string inside * the string object to require little space, in case there * is more than 10% of free space at the end of the SDS string. * * We do that only for relatively large strings as this branch * is only entered if the length of the string is greater than * OBJ_ENCODING_EMBSTR_SIZE_LIMIT. * / if (o-> encoding = = OBJ_ENCODING_RAW & & sdsavail (s) > len/10) {o-> ptr = sdsRemoveFreeSpace (o-> ptr);} / * Return the original object. * / return o;}

The operation performed by this code is complex, so it is necessary to take a closer look at each step of the operation:

Step 1 check, check type. Make sure that only objects of type string are manipulated. Step 2 check, check encoding. SdsEncodedObject is a macro defined in server.h that ensures that only OBJ_ENCODING_RAW and OBJ_ENCODING_EMBSTR-encoded string objects are manipulated. Both encoded string are stored in sds, so you can try further coding processing. # define sdsEncodedObject (objptr) (objptr- > encoding = = OBJ_ENCODING_RAW | | objptr- > encoding = = OBJ_ENCODING_EMBSTR) step 3 check, check refcount. Shared objects with a reference count greater than 1 are referenced in many places. Because the object pointer of robj may change after the coding process (we mentioned a similar interface usage pattern when we introduced the sdscatlen function in the previous article), for objects with a reference count greater than 1, you need to update references everywhere, which is not easy to do. Therefore, no coding processing is done for objects with a count greater than 1. An attempt was made to convert a string to a 64-bit long. 64-bit long can express a range of data from-2 ^ 63 to 2 ^ 63-1, with a maximum of 20 digits (including minus signs) expressed in decimal system. Here the judgment of less than or equal to 21 seems to be too much, and the actual judgment of less than or equal to 20 is enough (please let me know if I am wrong). If string2l converts the string to long successfully, it returns 1 and stores the improved long in the value variable. When the conversion to long is successful, it is divided into two situations. The first case: if the configuration of Redis does not require the LRU replacement algorithm to run, and the value of the converted long number is relatively small (less than OBJ_SHARED_INTEGERS, in the current implementation, the value is 10000), then a shared digital object will be used. The reason why the judgment here is related to LRU is that the LRU algorithm requires that each robj have a different lru field value, so you cannot share robj with LRU. Shared.integers is an array of 10000 in length with 10000 small digital objects stored in it. These small number objects are string robj objects of encoding = OBJ_ENCODING_INT. The second case: if the previous step cannot be represented by a shared small object, the original robj is encoded as encoding = OBJ_ENCODING_INT, and the ptr field is directly stored as the long value. Note that the ptr field is originally a void * pointer (that is, the memory address is stored), so it has a 64-bit width on a 64-bit machine, just enough to store a 64-bit long value. In this way, in addition to robj itself, it no longer needs additional memory space to store string values. The next step is to deal with strings that cannot be converted to 64-bit long. Finally, do two more steps: if the string length is small enough (less than or equal to OBJ_ENCODING_EMBSTR_SIZE_LIMIT, defined as 44), then call createEmbeddedStringObject to encode as encoding = OBJ_ENCODING_EMBSTR; if all previous encoding attempts are unsuccessful (still OBJ_ENCODING_RAW) and there are too many free bytes in sds, then make one last effort and call sds's sdsRemoveFreeSpace interface to release the free bytes.

For the createEmbeddedStringObject called, we need to take a look at its code:

Robj * createEmbeddedStringObject (const char * ptr, size_t len) {robj * o = zmalloc (sizeof (robj) + sizeof (struct sdshdr8) + len+1); struct sdshdr8 * sh = (void*) (oval1); o-> type = OBJ_STRING; o-> encoding = OBJ_ENCODING_EMBSTR; o-> ptr = sh+1; o-> refcount = 1; o-> lru = LRU_CLOCK (); sh- > len = len; sh- > alloc = len; sh- > flags = SDS_TYPE_8 If (ptr) {memcpy (sh- > buf,ptr,len); sh- > buf [len] ='\ 0mm;} else {memset (sh- > buf,0,len+1);} return o;}

CreateEmbeddedStringObject reallocates memory to sds and allocates robj and sds in a contiguous block of memory, which helps to reduce memory fragmentation for short strings. This contiguous block of memory contains the following parts:

16-byte robj structure. A 3-byte sdshdr8 header. An array of sds characters up to 44 bytes. 1 NULL Terminator.

It adds up to no more than 64 bytes, so such a short string can be fully allocated in a 64-byte block of memory.

Decoding process of string robj

When we need to get the value of a string, such as executing the get command, we need to do the opposite of the encoding process mentioned earlier-decoding.

The core code of this decoding process is the getDecodedObject function in object.c.

Robj * getDecodedObject (robj * o) {robj * dec; if (sdsEncodedObject (o)) {incrRefCount (o); return o;} if (o-> type = = OBJ_STRING & & o-> encoding = = OBJ_ENCODING_INT) {char buf [32]; ll2string (buf,32, (long) o-> ptr); dec = createStringObject (buf,strlen (buf); return dec) } else {serverPanic ("Unknown encoding type");}}

This process is relatively simple, and we need to pay attention to the following points:

The string robj object encoded as OBJ_ENCODING_RAW and OBJ_ENCODING_EMBSTR is returned intact without change. From the user's point of view, there is no difference between the two codes, with encapsulated sds inside.

A string robj object that is encoded as a number, reconverts long to a decimal string, and then calls createStringObject to a representation of sds. Note: the length of sds strings converted from long here must not exceed 20, and according to the implementation of createStringObject, they will definitely be encoded as objects of OBJ_ENCODING_EMBSTR. The code for createStringObject is as follows:

Robj createStringObject (const char ptr, size_t len) {

If (len type = = OBJ_STRING); if (o-> refcount! = 1 | | o-> encoding! = OBJ_ENCODING_RAW) {robj * decoded = getDecodedObject (o); o = createRawStringObject (decoded- > ptr, sdslen (decoded- > ptr)); decrRefCount (decoded); dbOverwrite (db,key,o);} return o;} robj reference counting operation

The operation of adding and subtracting 1 from the reference count of robj is defined in object.c:

Void incrRefCount (robj * o) {o-> refcount++;} void decrRefCount (robj * o) {if (o-> refcount refcount = = 1) {switch (o-> type) {case OBJ_STRING: freeStringObject (o); break; case OBJ_LIST: freeListObject (o); break; case OBJ_SET: freeSetObject (o); break; case OBJ_ZSET: freeZsetObject (o); break; case OBJ_HASH: freeHashObject (o); break Default: serverPanic ("Unknown object type"); break;} zfree (o);} else {o-> refcount--;}}

Let's pay special attention to the operation decrRefCount, which subtracts the reference count by 1. If there is only one reference left (refcount is already 1), the entire robj will be released after the decrRefCount is called.

Note: Redis's del command relies on the decrRefCount operation to release the value.

After the discussion in this article, it is easy to see that robj represents the first level of data structures exposed by Redis: string, list, hash, set, sorted set, while the underlying implementation of each data structure corresponds to which (or which) second level data structures (dict, sds, ziplist, quicklist, skiplist, etc.) are distinguished by different encoding. It can be said that robj is a bridge between two levels of data structures.

This article introduces in detail the underlying implementation of string objects of type OBJ_STRING, whose encoding and decoding process is very important and widely used in Redis, which we may encounter in a later discussion. Now that we have the conceptual basis of robj, we will discuss ziplist and its relationship to hash in the next article.

Postscript (appended to 2016-07-09): the problem of judging 21 bytes mentioned in this article when parsing the code "encoding string into long" has been submitted to @ antirez and merged into the unstable branch, as detailed in commit f648c5a.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.