What is the real-time architecture and offline architecture of big data recommendation system? 02/14 Update SLTechnology News&Howtos

What is the real-time architecture and offline architecture of big data recommendation system?

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, the editor will take you to understand the real-time architecture and offline architecture of big data recommendation system. The knowledge points in the article are introduced in great detail. Friends who feel helpful can browse the content of the article with the editor, hoping to help more friends who want to solve this problem find the answer to the problem. Let's follow the editor to learn more about the real-time architecture and offline architecture of big data recommendation system.

1. Overview

Recommendation system is one of the most common and easy-to-understand applications in big data, such as Taobao guess that you like and users of websites such as JD.com provide personalized content. But not only e-commerce will use recommendation engines to provide users with additional products, recommendation systems can also be used in other industries, as well as in different applications, such as NetEyun Music's daily song recommendations, events, products to dates.

2. Big data recommendation system architecture

Generally medium-sized websites (more than 10W PV) generate more than 1G Web log files every day. Large or super-large websites may generate 10 gigabytes of data per hour.

Specifically, such as an e-commerce site, online group buying business. The number of daily PV is 100w and the number of independent IP is 5w. Users usually have the most visits on weekdays from 10:00 to 12:00 and from 15:00 to 18:00. It is mainly accessed through PC browsers during the day, and more on rest days and at night through mobile devices. The search volume of the website accounts for 80% of the entire site. Less than 1% of PC users will consume, and 5% of mobile users will consume.

For log data of this size, using HADOOP for log analysis is the most suitable. Through log analysis, increase sales, sell more different products, improve user satisfaction, and better understand what users want. The following is the recommended architecture for offline mode and real-time mode of the recommendation system. The two architectures often complement each other.

2.1 offline mode process

(1) data sources

A js program is embedded in the page to bind events to the tags you want to listen to on the page. As long as the user clicks or moves to the tag, the ajax request can be triggered to the background servlet program to record the event information with log4j, thus forming a growing log file on the web server (nginx, tomcat, etc.). On the mobile device, the back end records the access log through the access interface.

(2) data acquisition

Custom development collection program, or using the open source framework FLUME,flume is a distributed log collection system, which collects the data from each server and sends it to a designated place, such as the HDFS in the diagram. To put it simply, flume collects logs.

The reason why flume is so amazing is due to a design of its own, which is that agent,agent itself is a java process that runs on the log collection node-the so-called log collection node is the server node.

Agent contains three core components: source-- > channel-- > sink, similar to the producer, warehouse, and consumer architecture.

The source:source component is designed to collect data and can handle various types and formats of log data, including avro, thrift, exec, jms, spooling directory, netcat, sequence generator, syslog, http, legacy, and customization.

After the data is collected by the channel:source component, it is temporarily stored in channel, that is, the channel component is specially used to store temporary data in agent-- the collected data is simply cached and can be stored in memory, jdbc, file and so on.

Sink:sink components are components used to send data to destinations, including hdfs, logger, avro, thrift, ipc, file, null, hbase, solr, custom

(3) data aggregation

The raw logs are aggregated to the HDFS distributed storage system through flume.

(4) data preprocessing

The custom development mapreduce program runs in the hadoop cluster, and the regular data is put into hdfs.

(5) data warehouse technology

Based on the Hive on hadoop, the regular data is mapped into a table.

(6) ETL

Query the data in hive and write the sql export results. Or through the mahout machine learning algorithm to analyze the recommended data and write it to the recommended raw material. For example, collaborative filtering algorithm.

(7) recommendation engine

The recommendation results are imported into the business database, and the web recommendation engine makes recommendations according to the database.

(8) Visualization

According to the recommendation information of the business database, the front end displays the recommendation result.

2.2 Real-time mode process

Popular events, popular styles. Real time referrals are required.

(1) data sources

(2) data acquisition

Custom development collection program, or using the open source framework FLUME,flume is a distributed log collection system, which collects data from each server and sends it to a designated place, such as HDFS. To put it simply, flume collects logs.

(3) data aggregation

The raw logs are aggregated to the kafka cluster via flume. Part of the data is sent to storm for real-time processing, and the other part is sent to hdfs for offline processing.

(4) Real-time processing

Through storm and sparkStreaming to read kafka messages for real-time data processing, statistics of the latest developments to the recommended raw materials.

(5) recommendation engine

The recommendation results are imported into the business database, and the web recommendation engine makes recommendations according to the database.

(6) Visualization

According to the recommendation information of the business database, the front end displays the recommendation result.

3. Summary

Personalized product recommendation

The recommendation system helps to understand the preferences and intentions of each visitor and to display the relevant recommendation types and products in a timely manner. As the engine learns more about each visitor, the recommendation system is improved.

Website personalization

Allows you to increase sales and conversion by distinguishing and locating users' personalized messages and reminders in real time.

Timely notice

Such an engine helps brands build trust with users and construct a sense of presence and urgency through timely display notification when customers visit the site.

Personalized customer loyalty programs and services

Research shows that people are more interested in projects that provide personalized services than stereotyped content, especially those related to customer loyalty. Such an engine can customize recommendation content based on real-time interaction with users. The data analysis algorithm uses different purchasing behaviors and integrates context information to pay attention to different product strategies, which also improves the quality of recommendations.

Thank you for your reading, the above is the whole content of "what is the real-time architecture and offline architecture of big data recommendation system". Friends who learn to learn to hurry up to operate it. I believe that the editor will certainly bring you better quality articles. Thank you for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.