In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Invention patent technology
Structured big data communication protocol
Inventor: fan Yongzheng
269779216@qqqq.com
Technical field
Structured big data communication protocol is not only a communication protocol, but also a technology to make data qualified structured big data. Structured big data communication protocol is also similar to ETL,ETL to deal with data problems generated by existing information systems, while structured big data communication protocol begins to prevent data problems at the beginning of the design of information systems. ETL is to treat diseases for data, and structured big data communication protocol is to prevent diseases caused by data. ETL is a minor fix to the problems caused by the existing technology, and the structured big data communication protocol puts forward a new data processing scheme. Structured big data communication protocol is also a kind of software development mode. All kinds of information systems established by structured big data communication protocol are big data information systems. As long as the data in each big data information system is uploaded to the big data center in a mirror image, it can be added to a qualified structured big data. Qualified structured big data is structured data that can be mined efficiently without ETL conversion.
Background technology
With the arrival of the era of big data, people have found that there are many information systems in various industries. However, although there are many information systems, they can not meet the needs of the big data era. The isolated island of information is serious, it is difficult to interconnect, and it is difficult to share data. There is a lot of data in various industries, but although there is much data, it is difficult to mine efficiently. At present, relational database is used to solve these problems, but it can only solve local problems, not fundamentally. The structured big data communication protocol is created to solve these problems. The structured big data communication protocol, which originated from imitating brain memory, association and thinking, began in 1982, when computers wanted computers to imitate the associative function of the brain.
Invent content
Structured big data communication protocol avoids the problems of information isolated island, interconnection and data sharing through the optimization of data and the change of software development mode, and makes data mining easy. The structured big data communication protocol can make the data have 12 technical characteristics: "uniqueness, attribution, identifiability, independence, integrity, standardization, coupling with the system (the degree of coupling is zero), structural unity, accumulability, portability, timeliness, authenticity". Only data that meets 12 technical characteristics at the same time is a qualified structured big data.
Technical problems to be solved in inventions
The technical problems to be solved in the invention are "Variety" and "velocity" in big data 4V. Specific technical problems: there are many information systems in various industries, but many information systems can not meet the needs of big data era, the isolated island of information is serious, it is difficult to connect and communicate, and it is difficult to share data; there is a lot of data in various industries, but it is difficult to mine efficiently.
Beneficial effect
The realization of interconnection, data sharing is easy, query speed is fast, and data mining is easy.
Specific implementation mode
The innovation of structured big data communication protocol is shown in the following five aspects:
1. 12 technical characteristics of structured big data are put forward for the first time. Only the data that meets the 12 technical characteristics at the same time can become a qualified structured big data. In order to make the data meet 12 technical characteristics, 12 data optimization methods corresponding to 12 technical characteristics are established.
2. The basis of communication is that both parties must adopt the same protocol. The "12 technical characteristics of structured big data" put forward by the structured big data communication protocol is the "communication protocol" of structured data interconnection.
3. Data items reflecting "uniqueness of data" and "attribution of data" are added to each piece of data of structured big data. Because the existing database technology is used to deal with small data, they do not consider the role of these two data items, and there are no these two data items in the existing data. These two data items are key data items that indicate whether a data is a qualified structured big data.
4. Special emphasis is placed on the standardization and standardization of data. Because in big data's environment, standardized and standardized data can automatically imitate the associative function of the brain, thus greatly improving the speed and flexibility of querying data. The relational database does not impose any restrictions on the data and is completely defined by the designers of the database; the structured big data communication protocol has very strict restrictions on the data and absolutely does not allow designers to define data arbitrarily, and all data must be standardized, which is also an important measure to make big data easy to mine.
5. Make use of the 12 technical characteristics of structured big data to guarantee the authenticity of big data. Small data is only used within a unit, and big data is used among many units, so big data's authenticity, notarization, authority and irrepentance are very important.
When optimizing the data, the structured big data communication protocol uses the "universal data structure table" (shown in Table 1) to store data, and the "universal data structure table" can store all kinds of structured data in one table.
Table 1: examples of universal data structure tables storing data
ID
Thing code name
Attributes of things
Attribute value of things
Extra-long attribute value
Unit
Attachment
time
1099
1280
Data source
Guangzhou first Hospital
2014.5.3
1100
1280
Classification of things
Medical record
2014.5.3
1101
1280
Classification of things
Medical history of hospitalization
2014.5.3
1102
1280
Classification of things
Medical expenses
2014.5.3
1103
1280
× × × number
XXXXXXXXXX
2014.5.3
1104
1280
Hospitalization number
XXXXXXXXXX
2014.5.3
1105
1280
Name
Zhang San
2014.5.3
1106
1280
Gender
Male
2014.5.3
1107
1280
Traditional Chinese medicine fee
fifty-six
Yuan
2014.5.3
1108
1280
Western medicine fee
seventy-two
Yuan
2014.5.3
1109
1280
Other expenses
one hundred and eighty
Yuan
2014.5.3
Description 1: 12 technical characteristics and 12 data optimization methods of qualified structured big data
Qualified structured big data has 12 technical characteristics, or only structured data that meets 12 technical characteristics at the same time is qualified structured big data. Structured big data communication protocol is a method to make structured data meet 12 technical characteristics. In order to make the data have 12 technical characteristics of structured big data, structured big data communication protocol puts forward 12 corresponding data optimization methods.
1. Uniqueness of data
Uniqueness of data: all kinds of data of the same thing should be unique and identifiable in the life cycle and in different information systems, and should not become unidentifiable data because of the change of time and space.
The problem that the uniqueness of data aims at: at present, all kinds of data of the same thing have different expressions in different information systems, so it is difficult to accurately identify them in big data mining. For example, the same commodity has different codes in the information systems of different dealers; when the same patient is treated in different hospitals, the patient's hospitalization number is different, and when checking the patient's medical history in big data's environment, it will be difficult to query because the data related to the patient do not have a unified identification code.
Data optimization method 1: all data of the same thing must contain a unique and unified big data identification code in different time, space and environment. Big data identification number is the data × ×, license plate number. Big data identification code is essentially different from ID in relational database. ID only identifies data within the range of a table, while big data identification code identifies data within the range of big data.
Big data scope: different big data involves different ranges. In international trade, the scope of big data is global, the scope of big data of National Medical big data is the medical industry, and the data range of Guangzhou big data is Guangzhou.
Big data identification code can be divided into two kinds, one is the identification code of a specific thing, like the serial number of the equipment, but it is essentially different from the serial number of the equipment, and the serial number of the equipment is written by the enterprise itself. Big data identification code needs to be coded according to international unified standards; the other is the identification code of certain things. For example, when you know the sales of a certain type of mobile phone at various dealers, you need the big data identification number of that type of mobile phone, because the mobile phone is sold by hundreds of thousands of dealers around the world. Mobile phone manufacturers need to interconnect with hundreds of thousands of information systems around the world. All data related to people should contain a × × number to ensure that the data related to someone is unique and identifiable to the same person on a global scale and at any time. Big data will involve many different information systems, and small data only exist in the same information system, so in big data's environment, the uniqueness of data is very important. the lack of a unified, standard and standardized identification code will make data mining very difficult. The uniqueness of data is the basis of big data's mining and analysis. Big data identification code must make it convenient for data classification and statistics.
2. Attribution of data
Attribution of data: data should reflect not only the various attributes of things, but also who owns the data (or who collects it, or where it comes from).
Data optimization method 2: the data of everything should contain "data source" data items. "data source" is structured data with "attribution". In general, the unit name can be used to represent the "data source".
Big data comes from thousands of units, and if the "data source" is not marked, it will cause confusion in identification when big data excavates.
3. Recognition of data.
Data recognizability: it means that the information system can be identified and people can also identify it. Furthermore, we should not only be able to identify our own information system, but also be able to identify other people's information system, not only to ourselves, but also to others.
The problem of data recognition: the data in the relational database can only be identified by the designer of the database and his own information system. Other people and other information systems can only be identified after interpreting, annotating and translating the data in the database through software.
The third method of data optimization: make the data identifiable with appropriate redundancy, express the data in standard and standardized natural language as far as possible, and avoid using code to express data as much as possible. The principle when optimizing the data is "so that the technicians in the corresponding field can understand it, and other people's information systems can also recognize it, not just the designers of the database, nor just their own systems."
In big data's environment, one of the most important and critical characteristics of data is "data recognition". One strategy of relational databases is to minimize data redundancy. Relational database not only reduces data redundancy, but also increases the difficulty of data identification. The strategy of structured big data communication protocol is just the opposite of relational database. The strategy of structured big data communication protocol: make the data identifiable with appropriate redundancy, so that the data can be read by others and recognized by other people's information systems.
Relational database is a kind of database which is closely related to data, data structure, program and database system. Because the data in the relational database becomes meaningless after it is separated from the specific table structure and program, the data in the relational database is meaningful only in the specific table.
The universal data structure table is a kind of data structure that has nothing to do with the program, or what it is, which has nothing to do with the program. Because after the data in the Universal data structure Table is separated from its data structure, the true meaning of the data remains the same. the data in the Universal data structure Table is expressed in standard and standardized natural language. as long as you understand the natural language, anyone can understand the true meaning of the data in the Universal data structure Table.
On the face of it, relational databases reduce data redundancy, which is one of its major advantages. However, this is also one of the biggest shortcomings of relational databases. Relational database not only reduces data redundancy, but also leads to data distortion. The result of data distortion leads to some problems, such as "information exchange, information isolated island, difficult data mining" and so on. In relational database, the problem of data distortion can be solved only by writing a large number of programs. Numerous facts show that relational databases pay a very high price because of data redundancy. When "data and programs are inseparable", a large number of programs must be written in order to store, read and query data. When "data has nothing to do with the program", as long as a general program is written, other people can easily store, read and query data with the help of this program, without having to develop a large number of software for each database.
A principle of structured big data communication protocol: basically ignore the problem of data redundancy, exchange space for intelligence and ease of use, and let the data speak for itself, rather than let the program speak for the data. On the other hand, relational data speaks instead of data through applications. Replace programs with data: would rather add a lot of "redundancy", but also make the data independent, complete, and identifiable. In other words, in order to make the data independent, complete and identifiable, the problem of data redundancy is not considered, no matter how much redundancy is added. When designing an information system with a relational database, programs are always used to interpret the data in the database. The serious consequence of this strategy is that a large number of programs need to be written when dealing with data, and the data cannot be processed without programming.
Structured big data communication protocol strategy: at all costs, let the data speak for itself, put an end to the use of programs as translators!
The purpose of "Let the data speak for itself" is that the same and complete meaning can be expressed independently and completely no matter where or in any environment. In the era of big data, a data will appear in different information systems, so it is necessary to ensure that the data have the same meaning in different information systems and different environments. The purpose of structured big data communication protocol to make data with "independence, integrity, identification, uniqueness and attribution" is to let the data speak for itself. In big data environment, this can greatly reduce the number of programs written. The data in the relational database is neither independent nor complete, and the relational database cannot "let the data speak for itself". The data in the relational database needs a variety of "relationships" to express the complete meaning. The structured big data communication protocol can let the data speak for itself, while the data in the relational database need to be equipped with the "relationship" of "seven aunts and eight aunts" to accurately express the corresponding meaning.
The "relationship" of the "seven aunts" of the relational database: there is a close relationship between the data and the database system, between the data and the table structure, and between the data and the application program. there is a close relationship between data and many tables in the database. The data in the relational database must rely on the relational database system, data structure, data type and application program to be meaningful. When the data in the relational database system is separated from the corresponding relational database system, data structure, data type and application, it becomes meaningless data. The problems of "information isolated island, information exchange, data interface, interconnection, system upgrading" in the current information system are all caused by the fact that the data in the relational database system can not speak on its own.
When designing an electronic medical record system with a relational database system, the "basic situation of the patient" will be in the following form:
Table 2: patient profile table (table in relational database)
ID
HZXM
GZDW
ZB
XB
ZZ
NL
RQ
HF
BXRQ
MZ
CSZ
twenty-six
Hu Feng
Rubber factory
Workers
0
No. 2 Mongolia Road
thirty-two
1991-4-3
Already
1991-4-3
Han
Myself
The above-mentioned forms of data are the classic structural forms of the small data era. In fact, "field name" is also very important information, which must be described in standard and standard natural language. The expression of "basic condition of patients" in the "Universal data structure Table" after being optimized by the structured big data communication protocol:
Table 3: basic condition table of patients (universal data structure table)
ID
Thing code name
Attributes of things
Attribute value of things
Extra-long attribute value
Unit
Attachment
time
one hundred
1001
Data source
Shanghai first Hospital
one hundred and one
1001
Classification of things
Medical record
one hundred and two
1001
Classification of things
Medical history of hospitalization
one hundred and three
1001
Classification of things
Admission medical history
one hundred and four
1001
Classification of things
Basic condition of patients
one hundred and five
1001
Patient number
SH10-199103Z21
one hundred and six
1001
Health card number
XXXXXXXXXXXX09
one hundred and seven
1001
× × × number
XXXXXXXXXXXXXX
one hundred and eight
1001
Name
Hu Feng
one hundred and nine
1001
Work unit
Shanghai Rubber Factory
one hundred and ten
1001
Job classification
Workers
one hundred and eleven
1001
Gender
Female
one hundred and twelve
1001
Address
No. 20 Mongolia Road
one hundred and thirteen
1001
Age
thirty-two
one hundred and fourteen
1001
Date of admission
1991-4-30
one hundred and fifteen
1001
Marriage or not
Married
one hundred and sixteen
1001
Date of taking of medical history
1991-4-30
one hundred and seventeen
1001
Ethnic group
Han
one hundred and eighteen
1001
Disease narrator
Myself
Through the comparison of the above two tables, it is found that the information expressed by the "universal data structure table" is a kind of undistorted information expressed entirely in natural language, and its meaning is the same no matter where it is placed.
On the surface, the information stored in the "universal data structure table" will occupy about twice as much storage space, but storing data in this way can reduce a lot of complex data extraction and transformation work. "data redundancy" in the "universal data structure table" is to let the "data speak for itself", so that the data does not rely on the database system, data structure, data types, and applications. The strategy of structured big data communication protocol is "exchange space for intelligence and ease of use". Compared with 30 years ago, the storage capacity of hard drives has increased by more than 100000 times, and the cost of taking up about twice as much storage space is so low that it can be ignored. "Let the data speak for itself" is to let the data express its meaning exactly and correctly like the natural language, without annotation or application interpretation.
4. Independence of data
Independence of data: data does not rely on database systems, data structures, annotations, and applications to express a certain meaning independently.
Aiming at the problem: the data in the relational database is not independent, we need to interpret the meaning of the data with the help of annotations, data structures and applications. The field names of many tables in relational databases use non-standard letter abbreviations. When presented to users, it is necessary to add headers to the tables through the information system to express the true meaning of the data.
The fourth method of data optimization: through a certain degree of data redundancy, but the data can speak by itself, so that "the data does not rely on the database system, the data structure, the annotation, and the application program to express a certain meaning independently." as shown in Table 3 above, the universal data structure table can achieve data independence.
5. Data integrity
Data integrity: data does not rely on the database system, does not rely on the data structure, does not rely on annotations, does not rely on the application to express a complete meaning.
Aiming at the problem: the data in the relational database is not complete, and we need to interpret the meaning of data integrity with the help of annotations, data structures and applications.
Data optimization method 5: through a certain degree of data redundancy, but the data can speak by itself, so that "the data does not rely on the database system, the data structure, the annotation, and the application program to express a certain meaning independently." as shown in Table 3 above, the universal data structure table can achieve data independence.
6. Standardization of data
The standardization of data: the data should be standard, standardized, unified and unambiguous.
Aiming at the problem: data mining is very difficult because of the non-standard data in various information systems at present.
Data optimization method 6: ensure that the data is standardized in the stage of information system design and data collection.
The standardization of data needs to be established on the basis of "national standard big data standard, national big data standard, industry big data standard", rather than on the basis of internal data standards and norms of a unit. Only the data that meet the "national standard big data standard, national big data standard, industry big data standard" can be qualified to become a qualified structured big data. The current problem is that the data norms of various units are only formulated by themselves, and they are different, and there is no "national standard big data standard, national big data standard, and industry big data standard." this is a major obstacle to the development of big data. With standards, specifications, and implementation according to standards and specifications, then when mining big data, ETL is no longer needed.
How to embody the standardization of structured big data: the standardization of data should be considered when designing the information system, and when collecting and generating data, the data must be input and generated in strict accordance with the "national standard big data standard, national big data standard, various industry big data standard". Only in this way, the data generated by the information system is the standard data.
The standardization and standardization of data in various industries is a project with a huge amount of engineering. Only by doing a good job in this work can we ensure the standardization of structured big data. Data standardization is the foundation of big data. It can be said that there is no qualified big data without the standardization of data. For the big data project, the standard comes first. From a certain point of view, due to the current international and domestic industries have failed to do a good job in data standardization, so there is no qualified big data!
"Information system name, database name, table name, field name, data in the database" should use standard, standardized and unified natural language, and avoid using non-standard code as far as possible, which is the key to the natural formation of "associative relationship" of data and the realization of omnipotent query. This is also a very important reason why the structured big data communication protocol advocates data standardization! In big data environment, this "association relationship" can bring great convenience to data mining and greatly improve the speed of querying data.
Relational database theory basically does not have any restrictions on data, all defined by the designer at will. This is a fundamental reason why it is very difficult to mine data in relational databases. The structured big data communication protocol has very strict requirements and restrictions on data. It is strictly required that the data must be standard, standardized and unified, must meet 12 technical characteristics, and each data must strictly comply with international standards, national standards and industry standards. Designers are strictly forbidden to customize data arbitrarily. Data, like general mechanical parts, must be standardized.
Big data standard involves not only every industry, but also all kinds of business. Big data standard involves data standard, data structure standard, business standard, business process standard, information system standard and so on.
In the era of big data, we must adopt unified, standard and standardized natural language in the information system and avoid using code as far as possible. This is a necessary measure to ensure data independence, data integrity and data identification, and to reduce the coupling between data and the system.
7. Coupling of data and system
Coupling between data and system: the higher the degree of coupling between data and system, the higher the dependence of data on the system. When the data is highly dependent on the system, once the data is separated from the original system, it becomes meaningless data. If a data can be read by users without any interpretation of the information system, then the coupling between the data and the information system is zero.
Aiming at the problem: the coupling between the data in the relational database and the information system is very high. The data in the relational database is closely related to the database system, the data structure and the application program. once the data in the relational database is separated from the original information system to the environment of big data, it becomes meaningless data.
Data optimization method 7: we must ensure that the coupling between each data and the information system is zero. Make the data have independence, integrity, identification, standardization, uniqueness and attribution with appropriate data redundancy, and ensure that each data is zero coupling with the information system based on the independence, integrity, identification, standardization, uniqueness and attribution of the data.
Big data's data comes from the systems of thousands of units. therefore, the data in big data should be data with zero coupling with the system, otherwise it will need to write a lot of applications to interpret the data. this will increase the difficulty and cost of processing the data. All kinds of articles written by people in natural language can be read directly by the corresponding professionals and do not need any interpretation of the information system, so the coupling between this kind of data and the information system is zero. In big data, the amount of data is in the hundreds of billions. If each of the data has a certain degree of coupling with the system, then it is necessary to write a massive program to interpret big data. If every data in big data is data with zero coupling with the information system, then when dealing with big data, there is no need to write any programs to interpret the data.
Designers of relational databases are used to representing all kinds of data in code. For example, some designers use "0" for women, "1" for men, while some designers use "W" for women and "M" for men. In the face of hundreds of billions of data generated by thousands of information systems, this kind of non-standard and non-standard code will bring great disaster for big data mining.
One of the important reasons why the information system established by relational database has a serious problem of isolated island of information is that the data in relational database is incomplete, independent and difficult to identify. Relational database uses all kinds of "relationships" to express the relationship between all kinds of things. The data in the relational database is closely related to the relational database system, the table structure and the corresponding application program. once separated, the data in the relational database will become meaningless data. it is this kind of "relationship" that inevitably leads to "information island" in relational database.
The data in the Universal data structure Table has nothing to do with the database system, table structure and application, and can be completely separated from the database system, table structure and application. The data in Table 1 is optimized by the structured big data communication protocol, which can express the original meaning even if it is separated from the table structure.
Big data's principle: try to avoid code and try to use standard natural language.
The method of judging whether the data is qualified big data: only the data with zero coupling with the information system is eligible to become a qualified big data.
Corollary: because all the data in the current relational database are closely coupled with the information system, the data in the current relational database are not qualified big data.
8. The unity of data structure
Unity of data structure: qualified structured big data's data structure must be unified. At present, only the "universal data structure table" can make the data realize the "unity of data structure".
Aiming at the problem: the data structure of each relational database is different.
Data optimization method 8: structured big data communication protocol uses "universal data structure table" (as shown in table 4 below) to realize the "unity of data structure" of data. Structured big data communication protocol does not allow designers to design any data structure, all structured data must be stored in one or several tables with exactly the same structure, standard and unified. It is impossible to standardize the data structure with the theory of relational database.
Table 4: universal data structure table can realize the unity of data structure
ID
Thing code name
Attributes of things
Attribute value of things
Extra-long attribute value
Unit
Attachment
time
one hundred
1001
Data source
Shanghai first people's Hospital
one hundred and one
1001
Classification of things
Medical record
one hundred and two
1001
Classification of things
Medical history of hospitalization
one hundred and three
1001
Classification of things
Admission medical history
one hundred and four
1001
Classification of things
Basic condition of patients
one hundred and five
1001
Patient number
SH10-19910430Z21
one hundred and six
1001
Health card number
XXXXXXXXXXXXX09
one hundred and seven
1001
× × × number
XXXXXXXXXXXXXXX
one hundred and eight
1001
Name
Hu Feng
one hundred and nine
1001
Work unit
Shanghai Rubber Factory
one hundred and ten
1001
Job classification
Workers
one hundred and eleven
1001
Gender
Female
one hundred and twelve
1001
Address
No. 20 Mongolian road, shanghai
one hundred and thirteen
1001
Age
thirty-two
one hundred and fourteen
1001
Date of admission
1991-4-30
one hundred and fifteen
1001
Marriage or not
Married
one hundred and sixteen
1001
Date of taking of medical history
1991-4-30
one hundred and seventeen
1001
Ethnic group
Han
one hundred and eighteen
1001
Disease narrator
Myself
10000
52367
Data source
Guangzhou Zoo
10001
52367
Classification of things
Animal management system
10002
52367
Classification of things
Penguin
10003
52367
Classification of things
Emperor penguin
10004
52367
Classification of things
Animal archives
10005
52367
Big data identification code
GZQE0003
10006
52367
First name
Emperor Wu of the Han Dynasty
10007
52367
Date of purchase
2013-3-21
10008
52367
Height
1.2
M
10009
52367
Body weight
twenty
Kg
10010
52367
Date of birth
2011-4-2
10011
52367
Photo
JPG
10012
52367
Cage number
098
10013
52367
Administrator
Zhang San
10014
52367
Father
GZQE0001
10015
52367
Mother
GZQE0002
10016
52367
Gender
Male
The biggest problem of relational database is that the data structure is not standard. The theory of relational database has no restriction on the data structure, and it is entirely up to the designer to define the data structure. The standardization of data structure is the basis of dealing with big data, and non-standard data structure will make data processing very difficult.
9. Accumulation of data
The accumulation of data: refers to "so that the data can (like books) can be accumulated without any processing".
Aiming at the problem: the current relational database system has produced a lot of data, but these data can not be added to big data.
Data optimization method 9: the accumulation of data can be realized through the uniqueness of data, the attribution of data, the identification of data, the independence of data, the integrity of data, the standardization of data, the coupling of data and system, and the unity of data structure. it can also be said that only the data with these attributes are cumulative.
The traditional information written on paper is cumulative, the library is the sum of many books, and the archives is the sum of many archives. If the data are cumulative, then after all the data of various departments of the Guangzhou Municipal Government are centrally stored on the cloud platform in a mirror image, Guangzhou big data is established. All the data from 978000 medical institutions across the country are uploaded to the National Medical big data Center in a mirror image, which is tantamount to the establishment of the National Medical big data. Unfortunately, the data in the current information systems are not cumulative.
10. Portability of data
Portability of data: "No matter the data is transplanted to any environment, the original meaning of the data can remain unchanged, and it can be identified by various information systems and users." only in this way can the data be portable.
Aiming at the problem: the information system established by relational database is difficult to interconnect, that is, the data in one system can not be transplanted to another system.
Data optimization method 10: the portability of data is realized through "uniqueness of data, attribution of data, identification of data, independence of data, integrity of data, standardization of data, coupling of data and system, and unity of data structure". It can also be said that only the data with these attributes have portability.
The portability of data is related to the interconnection of information systems. Only the data with portability can be interconnected and interconnected arbitrarily between various systems. The portability of data is the same as the accumulation of data, and the portability of data is also cumulative, but the portability of data is used to reflect whether the data can be interconnected between various systems. the accumulation of data refers to whether a large number of small data can be added to big data.
11. Timeliness of data
Timeliness of data: every data in big data should have a corresponding time.
Data optimization method 11: add a timestamp to each data.
12. Authenticity of the data
The authenticity of the data: small data is like the data generated by accounting, and big data is like the data generated by the flow of funds between different units, so the authenticity of big data is very important.
Data optimization method 12: data anti-counterfeiting and data tampering prevention must be regarded as important work, and the authenticity of data can be guaranteed through third-party authentication, third-party notarization and third-party data filing.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.