Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Data flow of multi-source data acquisition and processing

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Data processing flow chart of data platform

Data preparation:

It is mainly divided into several sources: FTP data sources, data pushed by partners, data obtained from Ctrip's open API interface, hotel management system log data and online travel agency website data sources. Data access:

According to the characteristics of multi-sources of data, a data access method for a specific scenario is developed.

Data from a.FTP sources: developed using shel scripts, including checking whether the data is ready, starting downloading, decrypting and unpacking, lzop compression, and uploading files to HDFS in put mode

b. Data pushed by partners: build a simple web service, accept requests pushed by Ctrip, use Nginx to complete the request load, and use Nginx to record the data in the request and write to the file. Later, the data can be obtained through the log collection system (in fact, the data can be pushed directly to Kafka from the partner)

c. Partner API interface data: the development program forms the producer-consumer model. The producer writes the task to the queue, and the consumer gets the task from the queue and uses the thread pool to concurrently obtain data from the partner API interface.

D.PMS log data: mainly done by open source Flume components

e. Website data: crawling website data with crawlers

3. Data storage:

There are two ways to store real-time data and offline data, which are stored through Kafka and HDFS respectively.

4. Data processing:

In the part of data processing, MapReduce and Spark are mainly used to develop data processing tasks.

5. Data query:

Hive is defined in the process of data query, and users query the data through Hive in the process of using the data platform.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report