In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1). Data desensitization refers to the deformation of some sensitive information through desensitization rules to realize the reliable protection of sensitive private data. In the case of customer security data or some commercially sensitive data, transform the real data and provide testing use without violating the system rules. For example, personal information such as × × number, mobile phone number, card number, customer number and so on need to be desensitized. Is one of the database security technologies. "
2). The desensitization data report query introduced in this paper will use the embellish aggregator to write a SPL script to desensitize and deform sensitive information fields (such as name, certificate number, bank account, address, telephone number, enterprise name, industrial and commercial registration number, taxpayer identification number) through predefined desensitization rules, so as to realize the protection of sensitive private data.
3) drying the aggregator can make the desensitization work simple and easy, and at the same time can reduce a lot of repetitive work. The desensitized data realized by the SPL script of the aggregator can be directly used as a report dataset for query and analysis, and can also be used as a real dataset in development, testing and other non-production or outsourcing environments.
1.1 introduction to data desensitization
According to the Baidu entry, data desensitization refers to the deformation of some sensitive information through desensitization rules to achieve the reliable protection of sensitive private data. In the case of customer security data or some commercially sensitive data, transform the real data and provide testing use without violating the system rules. For example, personal information such as × × number, mobile phone number, card number, customer number and so on need to be desensitized. Is one of the database security technologies, database security technology mainly includes: database missing scan, database encryption, database firewall, data desensitization, database security audit system. "
With the development of the information age, we pay more and more attention to the security requirements of data information, such as desensitization protection of sensitive data in non-production environment. In finance, operators, government, energy and other departments, data desensitization in non-production environment has been included in the regulatory requirements. Non-production environment data is mostly used for development, testing, training and third-party data analysis and mining. If sensitive data protection is not effectively implemented, sensitive data can easily be leaked. Therefore, ensuring the security of non-production data has become an important issue, which requires us to desensitize and deform sensitive information to achieve effective data protection.
1.2 requirements for data desensitization tools
Data desensitization tools should have support for a variety of heterogeneous data sources, so that a desensitization rule can be applied to different data sources, such as the modification of the "customer name" field, the desensitization rules are basically the same, so they can be directly referenced on Excel, TXT, Oracle, MS SQLServer, MySQL, Hadoop and other data sources. In addition, the tool should also support the distribution of desensitized data completely, providing file to file, file to database, database to database, database to file, etc., without the need to install any clients on the production system or locally.
The desensitization data report query introduced in this paper will use the embellish aggregator to write a SPL script to desensitize and deform sensitive information fields (such as name, certificate number, bank account, address, telephone number, enterprise name, industrial and commercial registration number, taxpayer identification number) through predefined desensitization rules, so as to realize the protection of sensitive private data.
The dry collector can make the desensitization work simple and easy, and at the same time reduce a lot of repetitive work. The desensitized data realized by the SPL script of the aggregator can be directly used as a report dataset for query and analysis, and can also be used as a real dataset in development, testing and other non-production or outsourcing environments.
1.3 characteristics of desensitized data
Data desensitization not only needs to perform data bleaching and erase sensitive contents in the data, but also needs to maintain the original data characteristics, business rules and data relevance to ensure that development, testing, training and big data business will not be affected by desensitization, so as to achieve data consistency and effectiveness before and after desensitization:
L keep the original data characteristics
Data characteristics must be maintained before and after desensitization. For example, a × × number consists of a 17-digit Noumenon code and an one-digit check code, which are area address code (6 digits), date of birth (8 digits), sequence code (3 digits) and parity code (1 digit). Then the desensitization regulation of the × × number needs to ensure that these characteristic information is still maintained after desensitization.
L maintain consistency between data
In different businesses, there is a certain relationship between data and data. For example: date of birth or the relationship between age and date of birth. Similarly, after desensitization of the information, it is still necessary to ensure the consistency between the date of birth field and the date of birth contained in the information.
L maintain the relevance of business rules
Keeping the relevance of data business rules means that the data relevance and business semantics remain unchanged when data desensitization, in which data relevance includes: primary and foreign key relevance, business semantic relevance of associated fields, and so on. In particular, the highly sensitive account subject data often runs through all the relationship and behavior information of the subject, so we need to pay special attention to ensure the consistency of all relevant subject information.
Data consistency between multiple desensitization
If the same data is desensitized many times, or desensitized in different test systems, it is necessary to ensure that the desensitized data is always consistent. Only in this way can we ensure the continuous consistency of data changes in the business system and the consistency of the broad business.
1.4 Application scenario of data desensitization
A common data desensitization scenario is to desensitize the production data or production data file to the test database or test data file according to the desensitization rules, as shown below:
SPL using the aggregator can define and write desensitization rules according to the requirements of business scenarios, such as non-landing desensitization for the above personnel information, such as name, × × number, address, phone number, card number, etc., to meet the needs of data desensitization.
The aggregator is a frameless data computing middleware tool that can be quickly deployed and developed. It can directly run the written SPL data desensitization script for immediate data desensitization, and supports a variety of common data desensitization processing methods, including data replacement, invalidation, randomization, offset and rounding, mask shielding, flexible coding and so on. The data desensitization methods introduced in this paper can be mixed and replaced in practical applications.
The data desensitization in the application scenarios in this paper is based on the data contents of the following table, and the data is stored in the "data desensitization verification table .txt" file.
1.4.1 data replacement
Data desensitization requirement: replace the true value with the fixed fictitious value set. For example, replace the mobile phone number with 13800013800.
The script implemented using the aggregator SPL coding is as follows:
A
B
one
= file ("data desensitization verification table .txt") .import@t ()
/ Import text data
two
= A1.run (mobile=13800013800)
/ phone number data replacement
A1: import the text data of the data desensitization Verification Table. The display value before desensitization of mobile phone number is as follows:
A2: replace the mobile phone number with unified data. Directly use the run () function to assign the mobile mobile number field data and replace it with 13800013800. After data replacement, the display value of the desensitized mobile phone number is as follows:
1.4.2 invalidation
Data desensitization requirements: by truncating, encrypting, hiding and other ways to desensitize sensitive data, so that it is no longer useful, such as replacing the true value with * address. The effect of data invalidation is similar to that of data replacement.
The script implemented using the aggregator SPL coding is as follows:
A
B
one
= file ("data desensitization verification table .txt") .import@t ()
/ Import text data
two
= A1.run (address= "*")
/ address implicit invalidation
three
= A1.run (address=left (address,3) + "*")
/ address truncation invalidation
A1: import the text data of the data desensitization Verification Table. The values displayed before desensitization are as follows:
A2: the address is desensitized by data hiding. Use the run () function directly to invalidate the address address field data. After the data is invalid, the displayed value of the desensitized address is as follows:
A3: the address is desensitized by data truncation. Use the left () function to invalidate the truncation of the left three-bit string of the address address source string. After truncating the invalid address desensitization, the display value is as follows:
1.4.3 randomization
Data desensitization requirements: use random data to replace the true value and maintain the randomness of the replacement value to simulate the authenticity of the sample. For example, replace the true value with a randomly generated last name and first name.
The script implemented using the aggregator SPL coding is as follows:
A
B
C
one
= file ("last name .txt") .import@it ()
= file ("name .txt") .import@it ()
/ introduce an external name dictionary table for randomly generating name information
two
= file ("data desensitization verification table .txt") .import@t ()
/ Import text data
three
= A2.run (name=A1 (rand (A1.len ()) + 1) + B1 (rand (B1.len ()) + 1))
/ name randomization
A1: imports an external name dictionary table to randomly replace the true value of the name. It should be noted here that since the "last name" and "first name" text data are both single-column data tables, the @ I option needs to be added when using the import () function. @ I means that the text data is returned as a sequence when there is only 1 column, and random values can be obtained directly in cell A3.
A2: import the text data of the data desensitization verification table. The values displayed before desensitization are as follows:
A3: the name is randomly desensitized. Use the run () function to randomize name names directly, and use the rand () function to generate names from a randomized combination of external dictionary tables of "last name .txt" and "first name .txt". The display value of the name after randomization is as follows:
[note] in this example, we introduce an external dictionary table for data desensitization. In practice, any external dictionary table can be introduced at any time according to the requirements of data desensitization, and the desensitization of replacement truth data can be realized through randomized combination of data.
1.4.4 offset and rounding
Data desensitization requirements: change digital data by random shift, for example, the date 2018-01-02 8:12:25 changes into 2018-01-02 8:00:00, offset rounding not only maintains the security of the data, but also ensures the approximate authenticity of the range. This function is of great value in the environment in which big data is used.
The script implemented using the aggregator SPL coding is as follows:
A
B
one
= file ("data desensitization verification table .txt") .import@t ()
/ Import text data
two
= A1.run (operatetime=string (operatetime, "yyyy-MM-dd HH:00:00"))
/ offset and rounding of date
A1: import the text data of the data desensitization Verification Table. The values displayed before desensitization of the operation date are as follows:
A2: offset and desensitize the operation date. Using the string () function to format "yyyy-MM-dd HH:00:00" according to the offset and rounding rules, the display value after desensitization of the operation time is as follows:
[note] the date and time after desensitization maintain the original data characteristics, which is convenient for the subsequent use of desensitized data.
1.4.5 Mask masking
Data desensitization requirements: mask shielding is a powerful tool for desensitization of some information of account data, such as bank card number or × × desensitization.
The script implemented using the aggregator SPL coding is as follows:
A
B
one
= file ("data desensitization verification table .txt") .import@t ()
/ Import text data
two
= A1.run (idnumber=left (string (idnumber), 6) + "*" +
Right (string (idnumber), 4))
/ × × × mask mask
A1: import the text data of the data desensitization Verification Table. The values displayed before desensitization are as follows:
A2: mask and desensitize the birth date of xxx. Use the left () function to intercept the left 6 bits of the × × × sign + string * + right () function to intercept the 4 bits to the right of the × × × sign to replace the source × × × string. The displayed value of the desensitized × × × number is as follows:
1.4.6 flexible coding
Data desensitization requirements: when special desensitization rules are needed, flexible coding can be performed to meet a variety of possible desensitization rules. For example, replace the true value of the contract number with fixed letters and fixed digits.
The script implemented using the aggregator SPL coding is as follows:
A
B
one
= file ("data desensitization verification table .txt") .import@t ()
/ Import text data
two
= A1.run (contractno= "RAQA" + string (year (now () +
Mid (string (contractno), 9jol 4) + string (#, "# 000000000"))
/ flexible coding of contract number
A1: import the text data of the data desensitization Verification Table. The values displayed before desensitization of the contract number are as follows:
A2: desensitize the contract number by custom coding. Custom coding rules: 4-digit fixed code + current year + source-destination string 4-digit number + 9-digit numeric value. The functions used have been introduced and will not be repeated. The values displayed after desensitization of the contract number are as follows:
1.4.7 Distribution of desensitization data
The aggregator SPL supports desensitized data distribution from file to file, file to database, database to database, and database to file. The following are described in detail:
1.4.7.1 text distribution to text
The script for text distribution to text using the aggregator SPL encoding is as follows:
A
B
C
one
= file ("last name .txt") .import@it ()
= file ("name .txt") .import@it ()
/ introduce external name dictionary table, which is used for random combination to generate name information
two
= file ("data desensitization verification table .txt") .cursor@t ()
/ Import large amount of text data
three
= A2.run (contractno= "RAQA" + string (year (now () + mid (string (contractno), 9Power4) + string (#, "# 000000000"), name=A1 (rand (A1.len ()) + 1) + B1 (rand (B1.len ()) + 1), address=left (address,3) + "*", mobile=13800013800,idnumber=left (string (idnumber), 6) + "*" + right (string (idnumber), 4), operatetime=string (operatetime, "yyyy-MM-dd HH:00:00"))
/ desensitize the data sheet according to the desensitization rules
four
> file ("desensitization data result table .txt") .export @ at (A3)
/ Export directly to a text file
A1-B1: introduce text data of external dictionary tables "last name" and "first name" for random combination to generate name information.
A2: use cursors to import a large amount of data desensitization verification table text data.
A3: desensitize the data table according to the desensitization rules.
A4: export desensitized data directly to a text file. Use the export () function to export desensitized data, where @ t specifies that the first line of record is used as the field name, and if you do not use the @ t option, it will use _ 1 to record 2,... As the field name, @ a means additional write, and without @ a means overwrite. The desensitization result distributed to the text is as follows:
[note] the file processing capability of SPL also supports importing and exporting xls, xlsx, csv and other types of files.
1.4.7.2 text distribution to the database
The script to distribute the text to the database using the SPL encoding of the aggregator (take MySQL as an example) is as follows:
A
B
C
one
= file ("last name .txt") .import@it ()
= file ("name .txt") .import@it ()
/ introduce external name dictionary table, which is used for random combination to generate name information
two
= file ("data desensitization verification table .txt") .cursor@t ()
/ Import large amount of text data
three
= A2.run (contractno= "RAQA" + string (year (now () + mid (string (contractno), 9Power4) + string (#, "# 000000000"), name=A1 (rand (A1.len ()) + 1) + B1 (rand (B1.len ()) + 1), address=left (address,3) + "*", mobile=13800013800,idnumber=left (string (idnumber), 6) + "*" + right (string (idnumber), 4), operatetime=string (operatetime, "yyyy-MM-dd HH:00:00"))
/ desensitize the data sheet according to the desensitization rules
four
= connect ("MySQL")
/ connect to MySQL data source
five
> A4.update (A3 personinfo.coderecoverynameaddress
Mobile,idnumber,operatetime;code)
/ perform update updates and export directly to the database
six
> A4.close ()
/ close database connection
A1-A3: ditto.
A4: connect to the MySQL data source. Use connect () to connect to the MySQL database. If you click on the A4 cell with the mouse, you can view the connection information of the MySQL database directly. Check out the configuration instructions in the relevant chapters of the database configuration tutorial.
A5: update the data of the "personinfo" database table in the MySQL database. Use update () to update the cursor data for cell A3 to the MySQL database "personinfo" library table. Use the database tool to view the results as follows
A6: use the close () function to close the MySQL data source connection established by A4.
1.4.7.3 assign the database to the database
The script for distributing the database to the database using the SPL encoding of the aggregator is as follows (take MySQL as an example):
A
B
one
Same as above
/ introduce external name dictionary table, which is used for random combination to generate name information
two
= connect ("MySQL")
/ connect to MySQL data source
three
= A2.cursor ("select * from personinfo_copy")
/ Vernier reads personinfo_copy table data to be desensitized in MySQL
four
Same as cell A3 above
/ desensitize the data sheet according to the desensitization rules
five
> A2.update (A4 personinfooted copywriting testwriting coderecovertno
Name,address,mobile,idnumber,operatetime;code)
/ perform update updates to export desensitized data directly to the personinfo_copy_test table of the database
six
> A2.close ()
/ close database connection
A1: ditto.
A2: connect to the MySQL data source.
A3: the cursor reads the desensitized data of the table "personinfo_copy" in MySQL. The data of the table are as follows:
A4: ditto.
A5: update the data of the "personinfo_copy_test" database table in the MySQL database. Use update () to update the cursor data for cell A3 to the "personinfo_copy_test" library table in the MySQL database. The results are as follows:
A6: use the close () function to close the MySQL data source connection established by A2.
1.4.7.4 Database to text
The script that distributes to the text the database implemented using the aggregator SPL encoding (take MySQL as an example) is as follows:
A
B
one
Same as above
/ introduce external name dictionary table, which is used for random combination to generate name information
two
Same as above
/ connect to MySQL data source
three
Same as above
/ Vernier reads personinfo_copy table data to be desensitized in MySQL
four
Same as cell A4 above
/ desensitize the data sheet according to the desensitization rules
five
> file ("desensitization data result table .txt") .export @ at (A4)
/ Export directly to a text file
six
> A2.close ()
/ close database connection
A1-A4: ditto.
A5: distribute desensitized database (MySQL) data directly to text files. Desensitization results distributed to the text are the same as above.
A6: use the close () function to close the MySQL data source connection established by A2.
1.5 instance of desensitized data report query
Next, combined with the data desensitization method described above, we implement a report query instance that can dynamically configure desensitized data. The general process is as follows:
1.5.1 preparation of SPL script for data desensitization of aggregator
Use the above "data desensitization verification table .txt" text data to achieve desensitization data report query, the specific script is as follows:
A
B
C
D
one
= file ("last name .txt") .import@it ()
= file ("name .txt") .import@it ()
/ introduce external name dictionary table, which is used for random combination to generate name information
two
Func
/ call the data desensitization rules in the configuration file for data desensitization
three
= file ("data desensitization rule configuration .ini") .property (A2 (2))
four
If type== "type2"
= eval (B3, "A1", "A1", "B1", "B1")
/ dynamic resolution replacement of special rules "?" Value
five
= eval (B3Jing A2 (1))
Dynamic parsing replacement of general rules "?" Value
six
Return ${B3}
seven
= file ("data desensitization verification table .txt") .cursor@t ()
eight
If typewriter 0
=
nine
= A7.run (contractno=func (A2, [contractno, "type1"]), name=func (A2, [name, "type2"]), address=func (A2, [address, "type3"])
Mobile=func (A2, [mobile, "type4"]), idnumber=func (A2, [idnumber, "type5"]), operatetime=func (A2, [operatetime, "type6"]))
/ desensitize the data sheet according to the desensitization rules
ten
Return if (typewriter 0, B9, A7)
/ description: parameter type controls whether to desensitize the data (0: no desensitization)
A1-B1: introduce text data of external dictionary tables "last name" and "first name" for random combination to generate name information.
A2: define a subroutine. A general data desensitization rule processing subroutine is defined by using the func function, which mainly calls the data desensitization rules in the configuration file for data desensitization. Different data fields can be reused according to their own characteristics and business requirements. For more information about subroutines, please refer to: aggregator-> tutorials-> Advanced Code-> subroutine documentation.
B3: read data desensitization rule profile information. Use the property () function to read the type attribute value from the data desensitization rule configuration .ini property file.
B4-B5: use dynamic parsing and calculating the rules in the rule configuration file to realize the data desensitization of the corresponding fields. Among them, the subroutine uses the eval () function to dynamically parse and evaluate the expression to dynamically parse and replace the "?" in the desensitization rule configuration file (* .ini). Value, add a type value to judge, will be the general type "?" Replace it with the location value of the nominative cell of the calling func subroutine, and judge the replacement "?" separately for the tpye2 rules that introduce external data dictionary tables. The value is the cell value of the external dictionary, and finally the replacement expression is evaluated and the data desensitization of the corresponding field is performed.
B6: use macros to dynamically evaluate expressions and return the results, use the return function to replace the type attribute values read from the property configuration file with the "${}" macro and return the results to the program invoked by the B9 cell.
A7: the cursor acquires undesensitized source production data.
A8: determine whether to desensitize the data by the passed grid parameter type (type=0: no desensitization). If desensitized, desensitize the source production data of the B9 cell.
B9: desensitize the data table according to the desensitization rule, and directly call the A2 main lattice subroutine func for data desensitization.
A10: returns the corresponding desensitized or undesensitized data according to the type value.
Next, you need to set a parameter "type" in the function menu "programs-> Grid parameters" of the aggregator designer, which is used to receive the report parameter transfer for desensitized data access control.
At this point, the SPL script of the aggregator is written and set up, and the next step is to create a new setting for the "data desensitization rule configuration .ini" file.
1.5.2 data desensitization rule profile
The file "data desensitization rule configuration .ini" provides the data field desensitization rule configuration for the SPL script of the aggregator, so that the desensitization rules can be separated from the script, and the desensitization rules can be customized without modifying the script. Of course, this configuration file can also be stored in the database, providing global desensitization rule configuration management. The contents of the configuration file are as follows:
Configuration file description: # Custom configuration desensitization rules, use the eval () function to implement dynamic parsing and replace parsing, usually the "?" in type. Refers to the nominative case of a fixed call to the func subroutine, where the tpye2 rule is special and requires a separate judgment to replace "?".
[note] the purpose here is to provide a configuration idea of desensitization rules in order to maximize reuse and flexible invocation, so that similar data fields do not need to define and write desensitization rules repeatedly. In practical applications, programmers can customize the configuration according to their needs.
1.5.3 report template preparation
Develop a report template using the latest version of the dry report V2018 and set whether the report is desensitized by the parameter "type" (which corresponds to the grid parameter in the SPL script of the aggregator).
Set the aggregator SPL script to the dataset "ds1" of the report, select the corresponding dfx script, and configure the type parameter expression, as follows:
The developed report template "report data desensitization .rpx" is as follows:
[note] the aggregate dataset called here returns cursors. You need to set the aggregator dataset to a large dataset in report Properties-> General, and this function requires the report product to include aggregator authorization.
1.5.4 desensitization data report release
Start the web service directly in report designer and browse the report using a browser. When the parameter type is set to "0" without desensitization, the report display data is as follows:
When the parameter type is set to a value other than "0", the report displays the data as follows:
1.5.5 Summary of desensitized data report query
This desensitized data report query example has the following four characteristics:
L 1) directly desensitize the source data and query and display the data on the WEB side of the report.
Instead of distributing the desensitized data into the library or file in the way of conventional data desensitization, the desensitized data is directly desensitized by using the aggregator SPL script, which cooperates with the asynchronous data loading of the large data set of the report to realize big data's immediate desensitization data query display. Remove the steps of desensitization of source data-> target storage-> target storage of data display.
L 2) eliminate the step of building a new data desensitization database and reduce the desensitization workload.
In order to deal with some old projects or special situations, for example, desensitized data tables are displayed in clear text, but desensitized database tables cannot be distributed or newly built, by directly extracting and encrypting plaintext data, the step of building a new desensitization library is avoided. Reduce the overall desensitization workload.
L 3) Custom configuration data desensitization rules.
Rule files can be flexibly configured to meet different rule configuration requirements.
L 4) dynamically control whether desensitization permission is enabled for the data.
According to the permissions of the platform users to view the data, the dynamic transmission of parameter values can control whether the data is desensitized or not, on the one hand, it prevents the leakage of the data and ensures the security of the data from the bottom. On the other hand, it also provides a way for high-authority customers to view sensitive data.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.