The system Architecture Evolution of e-sports big data platform FunData 07/02 Update SLTechnology News&Howtos

The system Architecture Evolution of e-sports big data platform FunData

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

In the era of e-sports big data, data plays a vital role in both viewing and professionalism of the game. Similarly, this also puts higher and higher requirements on the richness and real-time of eSports data.

From the audience point of view, the richness of e-sports data can be divided into events, teams and player data; from the game point of view, dimensions can be composed of heroes, battles, props and skills; the real-time nature of e-sports data includes historical battle records of two teams before the game, real-time scores during the game, winning rate prediction, post-game analysis and hero comparison, etc.

If you want to understand the learning route of big data, want to learn big data knowledge and need free learning materials, you can add group: 784789432. Welcome to join. Every day at 3 p.m. live broadcast to share basic knowledge, at 20:00 p.m. live broadcast to share big data project actual combat.

If you still think blogging is a waste of time, check out Dave Robinson's article.

Therefore, multi-dimensional data support, terabyte to petabyte level mass data storage and real-time analysis have higher requirements for the architecture design of the underlying system, and also bring more severe challenges.

This article will introduce the design ideas and related technologies in the evolution of FunData architecture, including ×× processing scheme, structured storage to unstructured storage scheme and data API service design. In its v1.0 beta release, FunData provides a data interface for the top-tier MOBA game DOTA 2 (from Valve).

1.0 Architecture

In the early stages of the project, we quickly rolled out the first version of FunData's system (architecture shown in Figure 1), following the MVP theory (Minimize Viable Product). The system has two main modules: Master and Slave.

Figure 1 1.0 ETL architecture diagram

The Master module functions as follows:

Call Steam interface regularly to obtain game ID and basic information;

Distribute game analysis tasks to Slave nodes via In-Memory's message queue;

Record the progress of game analysis and detect the status of Slave nodes.

Slave module functions as follows:

Listen to queue messages and obtain tasks. The tasks are mainly video analysis. For video analysis, refer to GitHub projects Clarity and Manta;

Analysis data warehousing.

The initial operation of the system is relatively stable, and the data of each dimension can be quickly pulled. However, with the increase of data volume, the maintainability problem of data and system becomes more and more prominent:

The DB index needs to be rebuilt when adding new data fields. The number of rows in the data table exceeds 100 million. The construction time is too long and the table is locked for a long time.

The system coupling degree is high, and it is not easy to maintain. After the update and restart of the Master node, the Slave reconnection mechanism needs to be restarted completely. At the same time, the In-Memory message queue has the risk of losing messages.

The scalability of the system is low, and the Slave node needs to frequently create virtual machine images when expanding. There is no unified management of the configuration, and the maintenance cost is high.

DB is a master-slave mode and has limited storage space, which leads to the need for custom logic in the data API layer to read data from different databases for aggregation analysis;

Node granularity is large, Slave may bear multiple analysis tasks, and the impact area is large when failure occurs.

Before embarking on the 2.0 architecture design and transformation, we tried to use the cold storage approach to relieve system stress by migrating data (the architecture design is shown in Figure 2). Due to the large amount of data in the data table, it takes a lot of time for multiple data migration tasks to occur concurrently. The process of cleaning up the data will also trigger rebuilding the index. The online solution does not fundamentally solve the problem.

Figure 2 Cold storage scheme

2.0 architecture

Learning from the experience of 1.0 system, in the 2.0 architecture design (architecture diagram as shown in Figure 3), we consider the basic characteristics that the new data system architecture should have from three aspects: maintainability, scalability and stability:

The granularity of data processing tasks is refined, and high concurrent processing is supported (the number of DOTA2 games in the world is 1.2 million in one day, video analysis is relatively time-consuming, and serial processing will lead to serious task accumulation);

Distributed data storage;

Decoupled system, each node can be elegantly restarted and updated.

Figure 3 2.0 ETL architecture

2.0 The system selects Google Cloud Platform to build the entire data ETL system, using PubSub (similar to Kafka) as a message bus, tasks are refined into multiple topics for listening, and processed by different Workers. In this way, on the one hand, the coupling degree of different tasks is reduced, and the interruption of other tasks caused by the abnormal processing of one task is prevented; on the other hand, the task is transferred based on the message bus, and the scalability of different data tasks becomes better, and the performance can be rapidly expanded horizontally when the performance is insufficient.

task granularity refinement

From the perspective of task granularity (as shown in Figure 3), data processing is divided into two parts: basic data processing and high-order data processing. Basic data, i.e. game details (KDA, damage and finishing) and video analysis data (Roshan kill data, damage type and hero split thermal map) are triggered by Supervisor obtaining Steam data, and stored in Google Bigtable after worker cleaning; high-level data, i.e. multi-dimensional statistical data (such as hero, item and team battle data), are triggered after video analysis, and aggregated through GCP Dataflow and self-built analysis nodes (workers), and finally stored in MongoDB and Google Bigtable.

From the detailed architecture of Leauge-ETL (as shown in Figure 4), the original single Slave node is divided into four sub-modules, namely, league data analysis module, league video analysis module, analysis/mining data DB agent module and league analysis monitoring module.

League data analysis module is responsible for video file extraction (Salt, Meta file and Replay file acquisition) and game basic data analysis;

The league video analysis module is responsible for analyzing the game video and pushing the analyzed data to PubSub;

Analysis/mining DB agent is responsible for receiving video analysis data and writing them into Bigtable in batches;

League analysis monitoring module is responsible for monitoring the progress of each task and real-time alarm.

At the same time, all modules choose Golang reconstruction, and use its "natural" concurrency ability to improve the performance of data mining and data processing of the whole system.

Figure 4 League-ETL architecture

distributed storage

As mentioned above, in the 1.0 architecture we used MySQL to store a large amount of match data and video analysis data. MySQL In the scenario of high concurrency of large data, the development of the overall application becomes more and more complex, such as the inability to support frequent changes in schema, the need to reasonably consider the timing of sub-database and sub-table design, and the scalability problem faced when the data of the sub-database reaches a certain magnitude.

Refer to Google's Bigtable, as a distributed and scalable large data storage system, Bigtable and HBase can well support random and real-time read and write access of data, which is more suitable for the data magnitude and complexity of FunData data system.

Figure 5. Hadoop ecosystem

On the data model, Bigtable and HBase locate a piece of data (Cell, as shown in Figure 6) by RowKey, column name and timestamp.

Figure 6 Data index

For example, in the FunData data system, the RowKey of the match data is constructed by hash_key+match_id, because the match_id of DOTA2 increases sequentially (the value is not unique by self-increment), and the hash_key calculated by the consistent hash algorithm is added before each match_id, which can prevent the problem of single hot spot in distributed storage, improve the data Load Balancer capability of the whole storage system, achieve better sharding, and ensure the scalability of subsequent DataNodes.

As shown in Figure 7, we preset multiple key values on the hash ring as the prefix of RowKey. When match_id is obtained, the key value corresponding to match_id at the hash ring point is obtained through consistent hash algorithm, and finally RowKey is constructed by concatenating key value and match_id.

Figure 7. Consistent hash construction RowKey

The use of timestamps is convenient for us to repeatedly write the data of the same RowKey and Column when aggregating data. HBase/Bigtable has a self-defined GC policy, and the expired timestamp data will be cleaned up. When reading, the data of the latest time node can be taken.

Bigtable and HBase can only be used as a first-level index. After RowKey is added with hash_key, it is impossible to read in batch using row_range or query in batch according to time. In the process of using Bigtable and HBase, the secondary index needs to be customized on the business. In the actual scenario, when our worker processes each game data, it will build an index for the timestamp-RowKey and store it in MySQL. When it is necessary to query the index table based on time, it will first query the list of RowKeys and then obtain the corresponding data list.

Bigtable/HBase is also very different from MySQL in terms of data reading and writing. MySQL generally uses query cache, which will be invalid when Schema is updated. In addition, query cache relies on global lock protection. When caching a large amount of data, if query cache fails, it will lead to table lock.

As shown in Figure 8, taking HBase as an example, when reading data, the client first locates the RegionServer where RowKey is located through zookeeper. After the read request reaches RegionServer, RegionServer organizes Scan operations and finally merges the query results to return data. Because a query operation may include multiple RegionServers and multiple Regions, the lookup of data is performed concurrently and the LRUBlockCache of HBase does not lock all the queries of data.

Figure 8 HBase architecture

Based on the new storage architecture, our data dimension expands from single games to players, heroes, leagues, etc., as shown in Figure 9.

Figure 9 Data Dimensions

system decoupling

As mentioned above, in the 1.0 architecture, the message queue using In-Memory is used for data transfer. Since the queue data in memory is not stored persistently and is strongly coupled with the Master module, data loss will occur after the Master node is updated or an abnormal Panic, and the recovery time is lengthy. Therefore, in the 2.0 architecture, a third-party message queue is adopted as the message bus to ensure the decoupling of the "upstream and downstream" nodes of the system, which can be iteratively updated many times, the historical messages become traceable, and the message accumulation based on the cloud platform also becomes visible (as shown in Figure 10).

Figure 10 Data Monitoring

Data API layer

1.0 The data API layer of the system is designed and optimized for quick launch without much design and optimization on the architecture. The domain name is used to realize Load Balancer, and the ORM layer built by the open source DreamFactory is used to access data by using its RESTful interface.

The architecture encountered a number of problems during development and use:

API layer is deployed on Alibaba Cloud in China, and data access needs to cross oceans;

The API provided by ORM layer obtains the full-field data of the table, and the data granularity is large;

No cache, dealing with high-traffic scenarios (such as the 17-year epicenter cup and ESL) often appear that the service is unavailable;

Data aggregation of multi-DB is placed in API layer, and its performance is insufficient;

Service updates are costly to maintain, and each update requires removing machines from the domain name first.

To address these issues, we refactored the 1.0 data API layer in two ways, as shown in Figure 11.

Figure 11. New Data API Architecture

link stability

On global links, we use CDN dynamic acceleration to ensure access stability. At the same time, multi-cloud vendor CDN is used for backup disaster recovery to achieve second-level switching.

In terms of scheduling ability and recovery ability, we have built our own gray system to schedule data requests of different dimensions to different data APIs, reducing the impact of data requests of different dimensions on the system; with the help of gray system, the risk of API service update and the impact surface of exceptions are also effectively controlled. Depending on the characteristics of the cloud platform availability zone, the gray system can also easily implement the backend API service across the availability zone, so as to achieve disaster recovery on the physical architecture.

In addition, in order to ensure the stability of internal trans-ocean access links, we set up a data proxy layer in Alibaba Cloud's North American computer room and use overseas dedicated lines to improve access speed.

Data High Availability

After accessing the distributed storage system, the external data API layer is also split according to the extended data dimension, and multiple data APIs provide external services, such as game data and league schedule, which have a large access volume and should be separated from hero, personal and prop data to prevent abnormal effects of game/event interfaces.

For hot spot data, the internal Cache layer will do regular write cache operations, data updates will also trigger cache rewriting, to ensure that the data is available during the event.

conclusion

This article introduces the evolution process of FunData data platform architecture, analyzes the shortcomings of the old system in architecture, and what technologies and architectural means are used to solve them in the new system.

Since FunData launched on April 10, more than 300 technology developers have applied for API-KEY. Efforts are also being made to quickly iterate new data points, such as league statistics. Real-time game data, etc.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.