Operation Interface and path expression of JSON in MySQL5.7 07/06 Update SLTechnology News&Howtos

Operation Interface and path expression of JSON in MySQL5.7

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the operation interface and path expression of JSON in MySQL5.7". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "the operation interface and path expression of JSON in MySQL5.7".

Why the native support of JSON

Document legitimacy

Before MySQL5.7.7 provided native type support for JSON, users could use TEXT or BLOB types to store JSON documents. But for MySQL, the data inserted by the user is only a common string after serialization, and the syntax validity of the JSON document itself will not be checked. The legitimacy of the document needs to be guaranteed by the user. After introducing a new JSON type, insert a syntactically incorrect JSON document, and MySQL prompts the error, and normalizes it after insertion to ensure that each key corresponds to a value.

More efficient acc

MySQL 5.7.7 + itself provides many native functions and path expressions to make it easy for users to access JSON data. For example, for the following JSON document:

{"a": [[3,2], [{"c": "d"}, 1]], "b": {"c": 6}, "one potato": 7, "b.c": 8}

Users can use the

$.a [1] [0] get {"c": "d"}

$.a [1] get [{"c": "d"}, 1]

You can also use wildcards * and * * for fuzzy matching, as detailed in the next paragraph.

Performance optimization

Before MySQL provides native JSON support, if users need to obtain or modify the key value of a JSON document, they need to read out the entire string of TEXT or BLOB and deserialize it into JSON objects, and then access JSON data through various library functions. This is obviously very inefficient, especially for larger documents. The performance of native JSON, especially the read performance, is very good. According to the performance test of 200K + data documents done by Oracle company, the query performance difference between TEXT and JSON types for the same data is more than two orders of magnitude, and users can also index the frequently accessed JSON keys to further improve the performance. The improvement in the performance of JSON data operations is based on the storage structure of the JSON data itself, which is described further below.

Operation Interface and path expression of JSON

Operation interface of JSON

According to the official documentation of MySQL, the implementation of server-side JSON function needs to meet the following conditions:

Requirements:

Non-requirements:

The list of functions provided is as follows:

JSON_APPEND () JSON_ARRAY_INSERT () JSON_UNQUOTE () JSON_ARRAY () JSON_REPLACE () JSON_CONTAINS () JSON_EXTRACT () JSON_INSERT () JSON_KEYS () JSON_LENGTH () JSON_VALID () JSON_MERGE () JSON_OBJECT () JSON_QUOTE () JSON_REMOVE () JSON_CONTAINS_PATH () JSON_SEARCH () JSON_SET () JSON_TYPE ()

Most of the calling rules for the above functions look like this:

JSON_APPEND (json_doc, path, val [, path, val]...)

The first parameter json_doc is an JSON document, or a column in a table, or a nested subdocument variable in an JSON document.

The second parameter, path, is a path expression, which is used to locate the key to be accessed. Path (that is, the path expression) is described next.

Some functions of the third parameter val may not have, if there is an Operand value corresponding to the key.

May produce surprising results on strings which are not utf8mb4 encoded.

There is limited support for decimal values nested inside JSON documents.

Performance may not be suitable for write-intensive applications.

Lets users construct JSON data values from other relational data.

Lets users extract relational data from JSON data values.

Lets users minimally introspect the structure of JSON values and text (validity, length, depth, keys).

Works on strings which are utf8mb4 encoded.

Performance should be suitable for read-intensive applications.

JSON path expression

For easier and faster access to the key values of JSON, MySQL 5.7.7 + provides new support for path expression syntax. The $.a [1] [0] mentioned earlier is a concrete example of a path expression. The complete path expression syntax is:

PathExpression >:: = scope [(pathLeg) *] scope:: = [columnReference] dollarSigncolumnReference:: = [[databaseIdentifier period] tableIdentifier period] columnIdentifierdatabaseIdentifier:: = sqlIdentifiertableIdentifier:: = sqlIdentifiercolumnIdentifier:: = sqlIdentifierpathLeg:: = member | arrayLocation | doubleAsteriskmember:: = period (keyName | asterisk) arrayLocation:: = leftBracket (non-negative-integer | asterisk) rightBracketkeyName:: = ECMAScript-identifier | double-quoted-string-literaldoubleAsterisk:: = * *

Or with

{"a": [[3,2], [{"c": "d"}, 1]], "b": {"c": 6}, "one potato": 7, "b.c": 8}

For example, let me give you a few more examples:

The value obtained by $.a [1] is [{"c": "d"}, 1]

The value obtained by $.b.c is 6

$. "b.c" gets a value of 8

Comparing the last two examples above, you can see that the expression enclosed in quotation marks is treated as a string key value.

Further explanation is needed about the wildcard characters * and * for fuzzy matching.

Two consecutive asterisks * * cannot be used as the end of an expression, and three consecutive asterisks cannot appear.

A single asterisk * matches all members of a JSON object

[*] means to match all the elements in a JSON array

Prefix**suffix represents all paths that start with prefix and end with suffix

For a specific example, enter it directly on the MySQL command line:

Select json_extract ('{"a": [[3,2], [{"c": "d"}, 1]], "b": {"c": 6}, "one potato": 7, "b.c": 8}','$* * .c')

The display result is obtained: ["d", 6].

Storage structure and implementation of JSON

The utf8mb4 character set that MySQL uses when dealing with JSON, and utf8mb4 is a superset of utf8 and ascii. Due to historical reasons, utf8 is not the UTF-8 Unicode variable length coding scheme we often talk about, but the utf8 coding scheme defined by MySQL itself, with a maximum length of three bytes. The specific difference is not the focus of this article, please Google to understand.

MySQL represents JSON documents in the form of DOM in memory, and when MySQL parses a specific path expression, it only needs to deserialize and parse the objects on the path, and it is extremely fast. To figure out how MySQL does this, we need to understand the storage structure of JSON on the hard disk. An interesting point is that the JSON object is a subclass of BLOB and is specialized based on it.

According to the official MySQL documentation:

On a high level, we will store the contents of the JSON document in three sections:

A table of pointers to all the keys and values, in the order in which the keys and values are stored. Each pointer contains information about where the data associated with the key or the value is located, as well as type information about the key or value pointed to.

* All the keys. The keys are sorted, so that lookup can use binary search to locate the key quickly.

All the values, in the same order as their corresponding keys.

If the document is an array, it has two sections only: the dictionary and the values.

If the document is a scalar, it has a single section which contains the scalar value

Let's use the schematic diagram to show its structure more clearly:

The JSON document itself is hierarchical, so MySQL is also hierarchical to JSON storage. For each level of objects, the first thing to store is the number of elements that store the current object, as well as the overall size. It is important to note that:

The Key indexes of JSON objects (the orange part in the figure) are sorted, first by length, and by code point if the length is the same; the Value index (yellow part in the figure) is sorted according to the position of the corresponding Key, and so is the real data store (the white part in the figure).

The index pairs of Key and Value store the offset and size within the object, and the size of a single index is fixed, and you can jump to an index with a distance of N by simple arithmetic.

As you can see from the MySQL5.7.16 source code, when serializing an JSON document, MySQL dynamically detects the size of a single object, if it is less than the 64KB uses an offset of two bytes, otherwise an offset of four bytes is used to save space. At the same time, dynamically checking whether a single object is a large object will result in two parsing of the large object, which is also pointed out in the source code as a point that needs to be optimized in the future.

Now limited by the offset in the index and the storage size of four bytes, the size of a single JSON document cannot exceed 4G, and the size of a single KEY cannot exceed two bytes, that is, 64K.

The offset within the index storage object is to facilitate movement. If a key value is changed, you only need to modify the offset of the affected object as a whole.

The size of the index is now redundant information, because the storage size can be easily obtained through adjacent offsets, mainly to cope with the update of variable length JSON object values. If the length becomes smaller, the JSON document as a whole does not need to be moved, only the current object size needs to be modified.

Now MySQL has no extra space for the smaller value, that is, if the length of the value becomes larger, the subsequent storage will be affected.

Combined with the path expression of JSON, we can see that the search operation of JSON only uses the elements involved in the deserialization path, which is very fast and achieves high performance of read operation.

However, MySQL's update of variable length keys for large documents may be slow and may not be suitable for write-intensive requirements

Index of JSON

Currently, MySQL does not support indexing JSON columns. The description of the official website document is:

JSON columns cannot be indexed. You can work around this restriction by creating an index on a generated column that extracts a scalar value from the JSON column.

Although indexing directly on JSON columns is not supported, MySQL states that you can first use path expressions to create virtual columns on scalar values in JSON documents, and then index virtual columns. This allows users to use expressions to index the key values they are interested in. Give a specific example to illustrate:

CREATE TABLE features (id INT NOT NULL AUTO_INCREMENT, feature JSON NOT NULL, PRIMARY KEY (id))

The format of the JSON data inserted into it is:

{"type": "Feature", "properties": {"TO_ST": "0", "BLKLOT": "0001001", "STREET": "UNKNOWN", "FROM_ST": "0", "LOT_NUM": "001", "ST_TYPE": null, "ODD_EVEN": "E", "BLOCK_NUM": "0001", "MAPBLKLOT": "0001001"}}

Use:

ALTER TABLE features ADD feature_street VARCHAR (30) AS (JSON_UNQUOTE (feature- > "$.properties.Street"); ALTER TABLE features ADD INDEX (feature_street)

In two steps, you can create an index on the stream key (feature- > "$.properties.Street") under the properties key value in the feature column.

Where the feature_street column is the newly added virtual column. The virtual column is named because it is accompanied by a storage column (stored column). The biggest difference between them is that the virtual column only modifies the metadata of the database and does not store the real data on the hard disk, and the reading process is also a way of real-time calculation, while the stored column stores the column of the expression on the hard disk. The scenarios used by the two are different, and the columns generated by expressions are virtual columns by default.

In this way, the addition and deletion of virtual columns will be very fast, and indexing on virtual columns is no different from the traditional way of indexing, which will improve the performance of virtual column reading and slow down the performance of overall insertion. The characteristics of virtual columns combined with the path expression of JSON can easily provide users with efficient key-value indexing function.

JSON comparison and sorting

JSON values can use the =, =,!, etc. Operators, such as BETWEEN, IN,GREATEST, LEAST, etc., are not yet supported. The JSON value uses a two-level collation. The first level is based on the type of JSON, and different types use a unique collation for each type.

The JSON type is from highest to lowest priority

BLOBBITOPAQUEDATETIMETIMEDATEBOOLEANARRAYOBJECTSTRINGINTEGER, DOUBLENULL

The types with high priority are large, and no other comparison operations are required; if the types are the same, each type is sorted according to its own rules. The specific rules are as follows:

BLOB/BIT/OPAQUE: compare the first N bytes of two values. If the first N bytes are the same, the short value is small.

DATETIME/TIME/DATE: sort by the expressed point in time

BOOLEAN: false is less than true

ARRAY: if the length of two arrays and the value at each position are equal, if you don't want to wait, take the sort result of the first different element, and the empty element is the smallest.

OBJECT: if two objects have the same KEY, and the KEY corresponds to the same VALUE, they are equal. Otherwise, the sizes of the two are different, but the relative sizes are not specified.

STRING: take the shorter length of the two STRING as N, and compare the first N bytes of the utf8mb4 encoding of the two values. The shorter one is smaller, and the null value is the lowest.

INTEGER/DOUBLE: including the comparison of exact and approximate values, which is slightly more complex and may produce counterintuitive results, as described in the official documentation.

If any JSON value is compared to the NULL constant of SQL, the result is UNKNOWN. For the comparison of JSON values and non-JSON values, convert non-JSON values into JSON values according to certain rules, and then compare them according to the above rules.

At this point, I believe you have a deeper understanding of "the operation interface and path expression of JSON in MySQL5.7". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.