In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The data sources of big data's services not only come from organizations belonging to different industries, but also have Variety characteristics.
Diversity means that big data's service includes not only structured data such as name and age, but also unstructured data such as songs and movies. In addition, data such as web pages and e-mails are between structured and unstructured. It is also an important data source for big data services.
The structured data comes from the business requirements, and the system analyst extracts and abstracts the static nouns in the requirements as the basis for the design of the database table structure. For example, we design a student status management system, and through analysis, we find that students such as "Zhang San" and "Li Si" have attributes such as name, age, department, selected courses, course scores and so on. so the system analyst selects these attributes and designs a "student" class, then the "student" table structure is equivalent to a template. Structured data such as the names, ages and classes of students such as "Zhang San" and "Li Si" can be stored in the data table. Because the data table is two-dimensional, the structured data can be queried and counted from multiple dimensions with the help of the SQL language of relational database.
The opposite of structured data is unstructured data. As the name implies, unstructured data can not extract fields and define attributes, but can only exist in the form of pictures, voice and video. Although unstructured data is not as statistically analyzed as structured data, it does not mean that unstructured data has no value.
Unstructured data can exist in the form of multimedia and feedback information vividly, so we can collect valuable information from unstructured data and transform these new data into structured data. through the "understanding" of unstructured data to find the hidden value.
Between structured data and unstructured data is semi-structured data. The structure and content of semi-structured data are mixed together, such as e-mail, web pages, etc. A lot of valuable data can also be extracted from semi-structured data, such as sender, recipient, title and so on. By analyzing the address, frequency and subject of e-mail, a social network with e-mail as communication medium can be formed.
Enterprises can analyze and design according to the requirements of the application, the scale of data, the type of data and other dimensions, and choose different storage architecture.
For applications with large data scale, simple data structure and high query efficiency, distributed storage architecture such as Hadoop/Hbase can be adopted. Because the Hadoop/HBase storage architecture adopts key storage structure and has good scalability, the query efficiency can be improved by increasing infrastructure resources. The overall performance of the system increases linearly with the increase of cluster size.
For analytical applications that need to be associated with multiple data models, we can consider using relational databases as repositories. For unstructured data in the form of mail, documents, audio recordings, videos and other files, we can use NAS (Network Attached Storage) storage architecture. For structured data with high access frequency and small amount of data accessed at a single time, it has a clear data type and data length, so we can consider using SAN (Storage Area Network) storage architecture.
For unstructured data whose access unit is file, it is suitable to use NAS (Network Attached Storage) storage architecture. Typically, the storage architecture is a mix of SAN and NAS.
SAN and NAS belong to the system architecture of "host + disk array". In the era of big data, with the continuous increase of data, enterprises increasingly adopt the system architecture of "stand-alone + hard disk". This architecture is suitable for analytical applications that require batch data processing, and the capability of a single application device is not high, it can effectively locate the old low-end equipment, and can quickly achieve horizontal resource expansion.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.