Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Data Collection of big data's Service Operation

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

The process of data acquisition includes integration, import and format.

In the process of data acquisition, data from different sources are first integrated. Data integration should consider storage architecture, acquisition mode, interface mode, acquisition cycle and so on.

In terms of storage architecture, we can consider setting up a data storage area (Staging Area) on the data source side and a temporary storage area on the acquisition platform side. According to the amount of data and cumulative speed to set a reasonable size of data storage area to prevent data overflow.

In terms of access methods, different access methods can be adopted according to the needs of the application. The collection method includes single acquisition and batch acquisition. For applications with small amount of data and high timeliness requirements, a single acquisition method can be adopted. When the data is formed, it can be synchronized to the data warehouse immediately. For example, the operation log for audit can be collected by a single way, when the operation log is generated, it can be synchronized to the data warehouse in real time. For the data with a large number of files and relatively low real-time requirements, we can wait for the number of files to reach a certain scale or reach a certain time period, batch collection or push to the data warehouse.

In terms of interface mode, FTP can be considered for batch data collection, and API or Web Services interface can be used for single data collection.

In terms of acquisition cycle, the shorter the collection period is, the higher the real-time performance of the data is, and the more timely the results of data analysis are. Enterprises can set different collection cycles according to the needs of the application, and consider whether the data temporary storage area can meet the requirements.

In terms of data import, it can be divided into three import types according to the size of the data.

The first is the scenario where there is a large amount of data and needs to import data definitions, such as indexes, partitions, etc., you can consider using large file import method to ensure the integrity of the data source.

The second is for the simple data source structure, many import files, large-scale data, we can use the batch file import method, so that we can see the errors in the import process and correct them in time to ensure the quality of data import.

Finally, for individual files with a small amount of data, such as some code tables, configuration files, and so on, they can be imported one by one through the data import tool, which is relatively simple and flexible.

Data standardization in the data acquisition phase is very important, because data analysis must be based on a unified standard, and multiple data sources usually have differences in the formation and content of a certain data. For example, in data source A, the date format is stored in the form of "year-month-day", while in data source B, it is stored as "month-day-year", so it is necessary to unify the format in the two data sources.

Some fields store different data types, for example, in A data source, the age field is stored in string format, while in B data source, the integer format is stored, so the two fields need to be unified into one data type. Some data store different contents in different data sources, but express the same meaning. For example, "gender" in A data source is "M" and "F" represents "male" and "female", while "gender" in B data source uses "1" to represent "male" and "0" to represent "female". Therefore, it is necessary to realize the semantic unity of the "gender" of the two data sources.

The reason why different data sources differ on the same data is that the design of information systems does not take into account that other information systems or different application providers do not follow common coding specifications.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report