In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Today, I would like to share with you the relevant knowledge points about how to use the Decimal type in the mysql database. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article.
1 background
Digital operation is a very common requirement in database, such as calculating quantity, weight, price and so on. In order to meet various requirements, database systems usually support accurate digital types and similar digital types. Accurate number types include int, decimal and so on. The decimal point of these types is fixed in the calculation process, and their results and behavior are more predictable. This issue is especially important when it comes to money, so some databases implement special money types. Approximate number types include float, double, etc., and the precision of these numbers is floating.
2 use of Decimal type
The use of decimal is similar in most databases. Take decimal of MySQL as an example to introduce the basic use of decimal.
2.1 describe Decimal
Different from float and double, decimal needs to specify two numbers to describe the precision, namely precision and scale. Precision refers to the total number of digits in the whole decimal, including integers and decimals, and scale refers to the number of digits in the decimal part of decimal. For example, 123.45 is a precision=5, the decimal of scale=2. We can define the decimal we want in this way when creating the table.
2.2 define Decimal when creating a table
You can define a decimal like this when creating a table:
Create table t (d decimal (5,2)); 2.3Writing decimal data
You can insert legitimate data into it, such as
Insert into t values (123.45); insert into t values (123.4)
Execute select * from t at this time and you will get
+-+ | d | +-+ | 123.45 | | 123.40 | +-+
Notice that 123.4 becomes 123.40, which is the characteristic of the exact type. Every row of data in column d requires scale=2, that is, two digits after the decimal point.
When inserting data that does not meet the definition of precision and scale
Insert into t values (1123.45); ERROR 1264 (22003): Out of range value for column 'd'at row 1insert into t values (123.456); Query OK, 1 row affected, 1 warningshow warnings +-- + | Level | Code | Message | +- -+ | Note | 1265 | Data truncated for column 'd'at row 1 | +-- + select * from t +-+ | d | +-+ | 123.46 | +-+
A number such as 1234.5 (precision=5, scale=1) seems to meet the requirements, but in fact it needs to meet the requirements of scale=2, so it will become 1234.50 (precision=6, scale=2) does not meet the requirements.
2.4 take out the deimcal for calculation
The calculation result is not limited by definition, but affected by the internal implementation format. For MySQL results, the maximum can be precision=81, scale=30, but this size can not be achieved in all cases due to the memory format of MySQL decimal and the implementation of calculation functions, which will be described in detail later. Continue with the example above:
Select d + 9999.999 from tbomachi + | d + 9999.999 | +-+ | 10123.459 | +-+
As a result, it breaks through the restrictions of precision=5 and scale=2. This involves the change of scale during operation. The basic rules are as follows:
Addition / subtraction / sum: take the largest scale on both sides
Multiplication: the addition of scale on both sides
Division: scale + div_precision_increment of the divisor (depending on the database implementation)
3 implementation of Decimal type
In this part, we mainly introduce the decimal implementation of MySQL. In addition, we will compare ClickHouse to see the differences in the design and implementation of decimal in different systems.
To implement decimal, you need to think about the following questions
How much precision and scale are supported?
Where to store scale
In continuous multiplication or division, the scale continues to grow, the integer part continues to expand, and the size of the stored buffer is always limited, what should be done at this time?
Division may produce infinite decimals, how to determine the scale of the division result?
Whether there is a conflict between the presentation range of decimal and computing performance, and whether it can be taken into account.
3.1 MySQL
Let's first take a look at the data structures related to MySQL decimal.
Typedef int32 decimal_digit_t;struct decimal_t {int intg, frac, len; bool sign; decimal_digit_t * buf;}
MySQL's decimal uses an array buf of length len decimal_digit_t (int32) to store decimal digits. Each decimal_digit_t stores up to 9 digits. Intg represents the number of digits in the integer part, frac represents the number of digits in the decimal part, and sign represents symbols. Decimal and integer parts need to be stored separately, can not be mixed in a decimal_digit_t, the two parts are aligned to the decimal point, this is because integers and decimals usually need to be calculated separately, so this format can more easily align different decimal_t decimals and integers respectively to facilitate addition and subtraction operations. Len is always 9 in MySQL implementation, which represents the upper limit of storage, while the actual effective part of buf is determined by both intg and frac. For example:
/ / 123.45 decimal (5,2) the integer part is 3, the decimal part is 2decimal_t dec_123_45 = {int intg = 3; int frac = 2; int len = 9; bool sign = false; decimal_digit_t * buf = {123,450000000,.};}
MySQL needs to use two decimal_digit_t (int32) to store 123.45, the first of which is 123. combined with intg=3, it means that the integer part is 123and the second number is 450000000 (a total of 9 digits). Because of frac=2, it represents a decimal part of .45
Let's look at a larger example:
/ decimal (81,18) 63 integer digits, 18 decimal digits, using the whole buffer// 123456789012345678901234567890123456789012345678901234567890123.012345678901234567decimal_t dec_81_digit = {int intg = 63; int frac = 18; int len = 9; bool sign = false; buf = {123456789,12345678,901234567,890123456, 789012345,678901234,567890123,1234567901234567}}
This example uses 81 digits, but there are also scenarios where 81 digits cannot be used. This is because integers and decimals are stored separately, so a decimal_digit_t (int32) may only store a valid decimal number, but the rest cannot be used for integer parts. For example, an decimal integer part has 62 digits and a fractional part has 19 digits (precision=81, scale=19). Then the decimal part needs to use 3 decimal_digit_t (int32), the integer part has 54 digits left, can not save 62 digits. In this case, MySQL will give priority to the integer part, automatically truncate the part after the decimal point and change it to decimal (80,18).
Let's take a look at how MySQL operates on this data structure. MySQL through a series of decimal_digit_t (int32) to represent a larger decimal, its calculation is also carried out on each decimal_digit_t in this array, just as we calculate a number in primary school math calculation, MySQL will calculate each decimal_digit_t as a number, carry. Because of the long code, there is no complete analysis of the specific code, only the core part of the code is analyzed. If you are interested, you can directly refer to the MySQL source code strings/decimal.h and the decimal_add, decimal_mul, decimal_div code in strings/decimal.cc.
Preparation steps
There is some preparatory work that needs to be done before the real calculation:
MySQL will ROUND_UP the number of numbers to an integer multiple of 9, so that it can be calculated in units of decimal_digit_t later.
In addition, according to the specific situation of the two decimal involved in the operation, the precision and scale of the result will be reduced by the decimal number according to the decimal_digit_t if the precision of the result exceeds the supported upper limit.
In the process of multiplication, if the reduction behavior in 2 occurs, two operands TRUNCATE are needed to prevent the intermediate result from going out of range.
Main steps of addition
First, because the precision and scale of the two numbers may be different, you need to do some preparatory work, align the decimal point, and then start calculating, adding from the last decimal to the high place, which is divided into three steps:
Copy the decimal numbers from the decimal with more decimals to the result
Add the common parts of two decimal
Add the extra integer numbers and carry of decimal with more integers to the result
Stop and stop2 are used in the code to mark the positions where digits of different lengths differ after the decimal point is aligned.
/ * part 1-max (frac)... Min (frac) * / while (buf1 > stop) *-- buf0 = *-- buf1;/* part 2-min (frac). Min (intg) * / carry = 0bot while (buf1 > stop2) {ADD (*-- buf0, *-- buf1, *-- buf2, carry);} / * part 3-min (intg). Max (intg) * / buf1 = intg1 > intg2? ((stop3 = from1- > buf) + intg1-intg2): (stop3 = from2- > buf) + intg2-intg1); while (buf1 > stop3) {ADD (*-buf0, *-- buf1, 0, carry);}
Main steps of multiplication
Multiplication introduces a new dec2 to represent a 64 bit number, because the multiplication of two decimal_digit_t (int32) may result in a 64 bit number. In the calculation, we must first convert the type to dec2 (int64), and then calculate, otherwise we will get the error result after overflow. Multiplication is different from addition, multiplication does not need to be aligned, for example, if you calculate 11.11 5.0, then you only need to calculate 111150-55550 and move the decimal point to get the correct result 55.550.
MySQL implements a double loop by multiplying each decimal_digit_t of decimal1 and every decimal_digit_t of decimal2 to get a 64-bit dec2, whose lower 32 bits are the current result and its high 32 bits are carry.
Typedef decimal_digit_t dec1;typedef longlong dec2;for (buf1 + = frac1-1; buf1 > = stop1; buf1--, start0--) {carry = 0; for (buf0 = start0, buf2 = start2; buf2 > = stop2; buf2--, buf0--) {dec1 hi, lo; dec2 p = ((dec2) * buf1) * ((dec2) * buf2); hi = (dec1) (p / DIG_BASE) Lo = (dec1) (p-(dec2) hi) * DIG_BASE); ADD2 (* buf0, * buf0, lo, carry); carry + = hi;} if (carry) {if (buf0)
< to->Buf) return accurate overload; ADD2 (* buf0, * buf0, 0, carry);} for (buf0--; carry; buf0--) {if (buf0)
< to->Buf) return accurate overload; ADD (* buf0, * buf0, 0, carry);}}
Main steps of division
Division uses Knuth's Algorithm D, and its basic idea is similar to manual division.
First of all, the first two decimal_digit_t of the divisor are used to form a trial quotient factor. Here a norm_factor is used to ensure that the number is expanded as much as possible without overflow. This is because the decimal must use shaping to calculate the accuracy. The larger the number, the more accurate the result. D3: guess quotient, that is, divide the first two decimal_digit_t of the divisor by the test quotient factor. If you do not multiply norm_factor, then neither start1 [1] nor start2 [1] will be reflected in the result.
D4: multiply the guess by the divisor, cut out the result from the divisor and make some corrections, move to the next decimal_digit_t, and repeat the process.
To learn more about this algorithm, please refer to https://skanthak.homepage.t-online.de/division.html
Norm2 = (dec1) (norm_factor * start2 [0]); if (likely (len2 > 0)) norm2 + = (dec1) (norm_factor * start2 [1] / DIG_BASE) X = start1 [0] + ((dec2) dcarry) * DIG_BASE;y = start1 [1]; guess = (norm_factor * x + norm_factor * y / DIG_BASE) / norm2;for (carry = 0; buf2 > start2; buf1--) {dec1 hi, lo; x = guess * (*-- buf2); hi = (dec1) (x / DIG_BASE); lo = (dec1) (x-(dec2) hi) * DIG_BASE) SUB2 (* buf1, * buf1, lo, carry); carry + = hi;} carry = dcarry
< carry;3.2 ClickHouse ClickHouse 是列存, 相同列的数据会放在一起, 因此计算时通常也将一列的数据合成 batch 一起计算. 一列的 batch 在 ClickHouse 中使用 PODArray, 例如上图中的 c1 在计算时就会有一个 PODArray, 进行简化后大致可以表示如下: class PODArray { char * c_start = null; char * c_end = null; char * c_end_of_storage = null;} 在计算时会讲 c_start 指向的数组转换成实际的类型, 对于 decimal, ClickHouse 使用足够大的 int 来表示, 根据 decimal 的 precision 选择 int32, int64 或者 int128. 例如一个 decimal(10, 2), 123.45, 使用这样方式可以表示为一个 int32_t, 其内容为 12345, decimal(10, 3) 的 123.450 表示为 123450. ClickHouse 用来表示每个 decimal 的结构如下, 实际上就是足够大的 int: template struct Decimal{ using NativeType = T; // ... T value;};using Int32 = int32_t;using Int64 = int64_t;using Int128 = __int128;using Decimal32 = Decimal;using Decimal64 = Decimal;using Decimal128 = Decimal; 显而易见, 这样的表示方法相较于 MySQL 的方法更轻量, 但是范围更小, 同时也带来了一个问题是没有小数点的位置, 在进行加减法、大小比较等需要小数点对齐的场景下, ClickHouse 会在运算实际发生的时候将 scale 以参数的形式传入, 此时配合上面的数字就可以正确地还原出真实的 decimal 值了. ResultDataType type = decimalResultType(left, right, is_multiply, is_division);int scale_a = type.scaleFactorFor(left, is_multiply);int scale_b = type.scaleFactorFor(right, is_multiply || is_division);OpImpl::vector_vector(col_left->GetData (), col_right- > getData (), vec_res, scale_a, scale_b, check_decimal_overflow)
For example, two decimal: a = 123.45000 (pendant 8, swarm 5) and b = 123.4 (pair4, swarm 1), then the parameters passed in the calculation are col_left- > getData () = 123.45000 10 ^ 5 = 12345000, scale_a = 1, col_right- > getData () = 123.4 10 ^ 1 = 1234, scale_b = 10000, 12345000 1 and 1234 10, 000 decimal points are aligned and can be calculated directly.
Main steps of addition
ClickHouse addition also needs to be aligned first, and the method of alignment is to multiply the numbers with lower scale by a coefficient to make the scale on both sides equal. Then do the addition directly. ClickHouse is also subdivided according to decimal's precision in the calculation. For decimal whose length is not so long, it can be calculated directly with native types such as int32 and int64, which greatly improves the speed.
Bool overflow = false;if constexpr (scale_left) overflow | = common::mulOverflow (a, scale, a); else overflow | = common::mulOverflow (b, scale, b); overflow | = Op::template apply (a, b, res); template inline bool addOverflow (T x, T y, T & res) {return _ builtin_add_overflow (x, y, & res) } template inline bool addOverflow (_ _ int128 x, _ _ int128 y, _ _ int128 & res) {static constexpr _ _ int128 min_int128 = _ _ int128 (0x8000000000000000ll) max_int128-y) | | (y
< 0 && x < min_int128 - y);} 乘法主要步骤 同 MySQL, 乘法不需要对齐, 直接按整数相乘就可以了, 比较短的 decimal 同样可以使用 int32, int64 原生类型. int128 在溢出检测时被转换成 unsigned int128 避免溢出时的未定义行为. template inline bool mulOverflow(T x, T y, T & res){ return __builtin_mul_overflow(x, y, &res);}template inline bool mulOverflow(__int128 x, __int128 y, __int128 & res){ res = static_cast(x) * static_cast(y); /// Avoid signed integer overflow. if (!x || !y) return false; unsigned __int128 a = (x >0)? X:-x; unsigned _ int128 b = (y > 0)? Y:-y; return (a * b) / b! = a;}
Main steps of division
Convert scale first and then do integer division directly. In itself, division and multiplication do not need to align the decimal point, but the difference of division is that it may produce infinite decimal places, so the general database will give the result a fixed number of decimal places. ClickHouse chooses the same number of decimal places as the divisor, so it is necessary to multiply a by scale, and then in the process of division, the scale is naturally subtracted. The resulting number of decimal places can be kept the same as the divisor.
Bool overflow = false;if constexpr (! IsDecimalNumber) overflow | = common::mulOverflow (scale, scale, scale); overflow | = common::mulOverflow (a, scale, a); if (overflow) throw Exception ("Decimal math overflow", ErrorCodes::DECIMAL_OVERFLOW); return Op::template apply (a, b); 3.3Summary
MySQL uses an array of int32 to represent a large number, while ClickHouse uses native types as much as possible. Both GCC and Clang support int128 extensions, which makes it easy to implement ClickHouse.
There is a big difference between MySQL and ClickHouse. Let's take a look at their solutions to the questions we first mentioned.
In the range of precision and scale, MySQL can define up to precision=65 and scale=30, the intermediate result can contain up to 81 digits, ClickHouse can define up to precision=38, scale=37, and the maximum value of int128 can be-2 ^ 127 ~ 2 ^ 127-1.
Where to store scale, MySQL is row storage, using volcanic model to iterate row by row, calculation is also row by row, each decimal has its own scale;ClickHouse column storage, calculation is carried out in batches of columns, and each row is processed according to the same scale processing can improve performance, so scale comes from the type derived from the expression parsing process.
Scale growth, scale growth exceeds the limit, MySQL will dynamically squeeze the decimal space, truncate operations, as far as possible to ensure the completion of the calculation, ClickHouse will directly report overflow errors.
Division scale, MySQL controls the scale of the division result through div_prec_increment, and ClickHouse uses the scale of the divisor.
Performance, MySQL uses a wider decimal representation, while ROUND_UP, decimal crowding, TRUNCATE and other actions, poor performance, ClickHouse uses native data types and computing to maximize performance.
4. The counterintuitive place of MySQL
In this section, we will talk about some of the counterintuitive aspects of MySQL implementation. These behaviors usually occur when the result is close to 81 digit, so these problems can be ignored if the range of the result is guaranteed to be small.
The multiplicative scale is truncated to 31, and the truncation is achieved by truncating the operands, for example: select 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555555
The buffer used by MySQL contains the capacity of 81 digit, but because the fractional part must be separated from the integer part, it is often impossible to use 81 digit, for example: select 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
In the process of calculation, if it is found that the integer part is too large to dynamically squeeze the decimal part, for example: select 9999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
The intermediate result of division calculation is not limited by scale = 31. The scale of the intermediate result of division must be an integer multiple of 9. The accuracy of division as an intermediate result cannot be inferred according to the final result. For example, select 2.0000 / 3 3 returns 2.00000000, while select 2.00000 / 3 3 returns 1.999999998. It can be seen that the intermediate result of division of the former actually retains more accuracy.
Divided by division, the decimal part of the final result calculated by avg is not rounded if it is exactly a multiple of 9, for example, select 2.00000 / 3 returns 0.666666666, select 2.0000 / 3 returns 0.66666667
Division, when avg calculates, if the decimal part of a number is not a multiple of 9, it will actually store a multiple of 9, so the following differences occur:
Create table T1 (a decimal (20,2), b decimal (20,2), c integer); insert into T1 values (100000.20, 1000000.10, 5); insert into T1 values (200000.20, 2000000.10, 2); insert into T1 values (300000.20,3000000.10,4); insert into T1 values (400000.20,4000000.10,6); insert into T1 values (500000.20,5000000.10, 8); insert into T1 values (600000.20,6000000.109) Insert into T1 values (700000.20, 7000000.10, 8); insert into T1 values (800000.20, 8000000.10, 7); insert into T1 values (900000.20,9000000.10,7); insert into T1 values (1000000.20,10000000.10,2); insert into T1 values (2000000.20,20000000.10,5); insert into T1 values (3000000.20,30000000.10,2); select sum (ahumb), avg (c), sum (ahumb) / avg (c) from T1 +-+ | sum (aqb) | avg (c) | sum (ahumb) / avg (c) | +-+ | 115500003.60 | 5.4167 | 21323077.590317 | +-+ 1 row in set (0.01sec) select 115500003.60 / 5.4167 +-+ | 115500003.60 / 5.4167 | +-- + | 21322946.369561 | +-+ 1 row in set (21322946.369561 sec) are all the contents of the article "how to use Decimal types in mysql Database" Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.