What is the full infrastructure of the Internet back-end? 07/04 Update SLTechnology News&Howtos

What is the full infrastructure of the Internet back-end?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

In order to solve this problem, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.

1.1 back-end infrastructure

The purpose of using Java back-end technology is to build business applications and provide users with online or offline services. Therefore, what technologies a business application needs and which infrastructure it depends on determines what back-end technologies need to be mastered. Taking a comprehensive view of the entire Internet technology system and the current situation of the company, the author believes that the essential or critical back-end infrastructure technologies / facilities are as follows:

The back-end infrastructure here mainly refers to the key components or services that applications need to rely on to run stably online. Developing or building the above back-end infrastructure can generally support the business for a long time. In addition, for a complete architecture, there are many basic system services that applications are not aware of, such as load balancing, automated deployment, system security and so on, which are not included in the description of this chapter.

1.1.1 Unified request entry-API Gateway

In the development process of mobile APP, the interfaces provided by the backend usually need the support of the following features:

Load balancing

API access control

User authentication

In general, use Nginx as load balancer, and then do access control and user authentication of API interface in each business application. A more optimized way is to make the latter two public class libraries for all business calls. But on the whole, these three features belong to the public requirements of the business, and it is preferable to integrate them together as a service, which can dynamically modify the access control and authentication mechanism. it can also reduce the cost of integrating these mechanisms per business. This kind of service is the API gateway, and you can choose to implement it yourself. You can also use open source software such as Kong and Netflix Zuul. The general architecture of API gateway is shown in the following figure:

But one of the problems with the above scheme is that because all API requests go through the gateway, it can easily become the performance bottleneck of the system. Therefore, the solution that can be taken is to remove the API gateway, let the business application directly connect with the unified certification authority, and ensure that each API call needs to be authenticated by the unified certification authority first at the basic framework level. Here, the method of caching the certification results can be adopted to avoid excessive request pressure on the unified certification authority.

1.1.2 Business applications and back-end infrastructure

Business applications are divided into online business applications and internal business applications.

Online business applications: applications and interfaces directly facing Internet users, which are typically characterized by large number of requests, high concurrency and low tolerance for failures.

Internal business applications: applications mainly for internal users of the company. For example, internal data management platform, advertising platform and so on. Compared with online business applications, it has the characteristics of high data confidentiality, low pressure, small concurrency and allowing failures to occur.

Business applications are developed based on the basic framework of the backend. For Java backend, there should be the following frameworks:

MVC framework: a Web/ back-end framework that unifies the development process, improves development efficiency, and shields some key details. Typical examples such as SpringMVC, Jersey, JFinal developed by Chinese and WebX of Ali.

IOC framework: a framework for implementing dependency injection / control inversion. The core of the most popular Spring framework in Java is the IOC function.

ORM framework: a database operation framework that can shield the details of the underlying database and provide a unified data access interface. In addition, it can support distributed features such as client master-slave, sub-database, sub-table and so on. MyBatis is the most popular ORM framework at present. In addition, the JdbcTemplate provided in Spring ORM is also good. Of course, you generally need to implement the requirements of sub-library and sub-table and master-slave separation, while the open source ones are Ali's TDDL and Dangdang's sharding-jdbc (solve the problem of sub-database and sub-table, read-write separation at the datasource level, transparent to the application, zero intrusion). In addition, in order to uniformly solve the problems of sub-database and sub-table, read-write separation, master / slave switching, caching and fault recovery at the service level, many companies have their own database middleware, such as Ali's Cobar, 360's Atlas (based on MySQL-Proxy), NetEase's DDB, etc.; open source are MyCat (based on Cobar) and Kingshard, in which Kingshard has been used online on a certain scale. MySQL officially provides MySQL Proxy. You can use lua scripts to customize the logic of master-slave, read-write separation, and partition, but its performance is poor and currently used less.

Cache framework: unified encapsulation of caching software operations such as Redis and Memcached, which can support client-side distributed scheme, master-slave and so on. Generally, you can use Spring's RedisTemplate, or you can use Jedis to do your own encapsulation to support client-side distributed solutions, master-slave, and so on.

JavaEE application performance testing framework: for online JavaEE applications, there needs to be a unified framework integrated into each business to detect the time-consuming and state of every request, method call, JDBC connection, Redis connection and so on. Jwebap is a performance testing tool that can be used, but since it has not been updated for many years, it is recommended to do secondary development based on this project if possible.

Generally speaking, the above frameworks can complete a prototype of a back-end application.

1.1.3 Cache, database, search engine, message queue

Caching, database, search engine and message queuing are all the basic back-end services that applications depend on, and their performance directly affects the overall performance of the application. Sometimes the best code you write may be because these services make it impossible to improve application performance.

Caching: caching is usually used to solve the access problem of hot data, and is a powerful weapon to improve the performance of data query. In high concurrency back-end applications, loading data from the data persistence layer into the cache can isolate high concurrency requests from the back-end database and prevent the database from being destroyed by a large number of requests. At present, in addition to the local cache in memory, the more common centralized cache software are Memcached and Redis. Among them, Redis has become the most mainstream cache software.

Database: database can be said to be the most basic infrastructure for back-end applications. Basically, the vast majority of business data is persisted in the database. Mainstream databases include traditional relational databases (MySQL, PostgreSQL) and NoSQL (MongoDB, HBase) that have become popular in recent years. Among them, HBase is a column database used in big data domain, which is limited by its query performance, so it is generally not used as a business database.

Search engine: search engine is a software designed for full-text retrieval and data query in various dimensions. At present, the most widely used open source software is Solr and Elasticsearch, both of which are based on Lucence. The difference mainly lies in the storage of termIndex, the support of distributed architecture and so on. Because of its good support for clusters and high-performance implementation, Elasticsearch has gradually become the mainstream open source scheme for search engines.

Message queuing: one way to transfer data is through message queuing. At present, the commonly used message queues include Kafka designed for logs and RabbitMQ with heavy transactions. In scenarios that are not particularly sensitive to message loss and do not require message transactions, choosing Kafka can achieve higher performance; otherwise, RabbitMQ is the better choice. In addition, ZeroMQ is a network programming Pattern library for message queuing, which is located on top of Socket and below MQ.

1.1.4 File Storage

Whether it is business applications, dependent back-end services, or other services, it ultimately depends on the underlying file storage. Generally speaking, file storage needs to meet the following characteristics: reliability, disaster tolerance, stability, that is, to ensure that the stored data will not be easily lost, even if a failure can have a rollback scheme, but also to ensure high availability. At the bottom, the traditional RAID can be used as the solution, and then on the next layer, HDFS of Hadoop is the most common distributed file storage solution, and of course, shared file systems such as NFS and Samba also provide simple distributed storage.

In addition, if file storage does become the bottleneck of the application or must improve the performance of file storage to improve the performance of the whole system, then the most direct and simple way is to abandon the traditional mechanical hard disk and replace it with SSD hard disk. For example, when many companies are solving business performance problems, the final key point is often SSD. This is also the most direct and effective way to exchange money for time and labor costs. The SSDB described in the database part is a kind of high-performance KV database which makes use of the characteristics of SSD hard disk after encapsulating LevelDB.

As for HDFS, if you want to use the above data, you need to go through Hadoop. Some technologies like xx on Yarn are solutions that run non-Hadoop technologies on HDFS.

1.1.5 Unified Certification Authority

Unified Certification Authority, which mainly provides authentication services for APP users, internal users, APP, etc., including

User registration, login authentication, Token authentication

Management and login authentication of internal information system users

Management of APP, including secret generation of APP, verification of APP information (such as verifying interface signature) and so on.

The reason for the need for a unified certification center is to be able to centrally manage the information that will be used by all APP, and to provide unified authentication services for all applications. Especially when there are many businesses that need to share user data, it is very necessary to build a unified certification authority. In addition, it is also natural to build a single sign-on for mobile APP through unified certification authority: imitating the mechanism of Web, the authenticated information is encrypted and stored in local storage for use by multiple APP.

1.1.6 single sign-on system

At present, many large online Web sites have a single sign-on system. Generally speaking, only one user login is needed to access multiple business applications (permissions can be different), which is very convenient for users to operate. In mobile Internet companies, a variety of internal management, information systems and even external applications also need single sign-on system.

At present, the more mature and most frequently used single sign-on system should be the open source CAS of Yale University, which can be customized and developed based on https://github.com/apereo/cas/tree/master/cas-server-webapp.

Basically, the principle of single sign-on is similar to the following figure:

1.1.7 Unified configuration Center

In Java backend applications, a common way to read and write configuration is to write configuration files in Propeties, YAML, HCON and other files. When you modify them, you only need to update the files and redeploy them, so that changes at the code level are not involved. The Unified configuration Center is a unified service that manages the relevant configuration files of all businesses or basic back-end services based on this approach, and has the following characteristics:

The configuration file can be dynamically modified online and take effect.

Configuration files can distinguish between environments (development, testing, production, etc.)

Relevant configurations can be introduced in Java through annotations and XML configuration.

Baidu open source Disconf and Ctrip Apollo can be used in the production environment, and you can also develop your own configuration center according to your own needs. Generally, you choose Zookeeper as the configuration storage.

1.1.8 Service governance framework

For external API calls or client access to the back-end API, you can use the HTTP protocol or RESTful (of course, you can also invoke it directly through the original socket). However, calls between internal services are generally invoked through the RPC mechanism. At present, the mainstream RPC protocols are:

RMI

Hessian

Thrift

Dubbo

These RPC protocols have their own advantages and disadvantages and need to make the best choice according to the business needs.

In this way, as your system services increase, the RPC call chain becomes more and more complex, and in many cases, documents need to be constantly updated to maintain these call relationships. A framework for managing these services can greatly reduce the resulting tedious human work.

Traditional ESB (Enterprise Service bus) is essentially a service governance solution, but ESB as a proxy exists between Client and Server, and all requests need to go through ESB, which makes ESB easily become a performance bottleneck. Therefore, a better design based on traditional ESB is shown in the following figure:

As shown in the figure, with the configuration center as the hub, the invocation relationship only exists between the Client and the Server providing services, thus avoiding the performance bottleneck of the traditional ESB. For this design, ESB should support the following features:

Registration and management of service providers

Registration and management of service consumers

Service version management, load balancing, flow control, service degradation, resource isolation

Fault tolerance and breakage of service

Ali's open source Dubbo has made a good implementation of the above, which is also a solution used by many companies at present, while Dangdang's expansion project Dubbox has added some new features to Dubbo. At present, Dubbo has been contributed to Apache by Ali and is in the state of incubating. In terms of operation and maintenance monitoring, Dubbo itself provides a simple management console dubbo-admin and monitoring center dubbo-monitor-simple. Dubboclub/dubbokeeper on Github is a more powerful service management and monitoring system developed on it.

In addition, Netflix's Eureka also provides the function of service registration discovery, which together with Ribbon can achieve soft load balancing on the client side of the service, and support a variety of flexible dynamic routing and load balancing strategies.

1.1.9 Unified Dispatch Center

In many businesses, timing scheduling is a very common scenario, such as regularly fetching data, regularly refreshing the status of orders, and so on. The usual approach is to rely on Linux's Cron mechanism or Quartz in Java for their respective businesses. On the other hand, the unified scheduling center manages all the scheduling tasks, so that the scheduling cluster can be tuned, expanded, task management and so on. The Oozie of Azkaban and Yahoo is the streaming work management engine of Hadoop and can also be used as a unified scheduling center. Of course, you can also use Cron or Quartz to implement your own unified dispatch center.

Schedule tasks according to Cron expressions

Dynamically modify, stop, delete tasks

Support task execution in fragments

Support task workflow: for example, one task is completed before the next task is executed

Tasks support scripts, code, url, and other forms

Log recording of task execution, fault alarm

For Java's Quartz, it needs to be explained here: this Quartz needs to be distinguished from Spring Quartz, which is Spring's simple implementation of the Quartz framework and is currently the most commonly used scheduling method. However, it does not support high availability clusters. Although Quartz is supported by clusters, it is very complex to configure. Nowadays, many solutions use Zookeeper to implement distributed clusters of Spring Quartz.

In addition, the open source elastic-job of Dangdang adds more powerful functions such as flexible resource utilization on top of the basic distributed scheduling.

1.1.10 Unified logging Service

Logging is an essential part of the development process. The timing and skill of printing logs can reflect the engineer's coding level. After all, logs are the most direct information that online services can locate and troubleshoot anomalies.

In general, it is very inconvenient to manage and troubleshoot problems if the logs are scattered in various businesses. Unified Log Service uses a separate log server to record logs, and each business outputs logs to the log server through a unified log framework.

You can implement the unified logging framework by implementing Log4j or Logback's Appender, and then print the log to the log server through a RPC call.

1.1.11 data Infrastructure

Data is a very popular area in recent years. From "lean data analysis" to "growth hackers", they all emphasize the extraordinary role of data. Many companies are also promoting product design, market operation, research and development through data. What needs to be explained here is that only when your data scale is really too large to be processed by a single machine should you use big data-related technology, and never go to big data for big data's sake. In many cases, the problems that can be solved by using stand-alone programs + MySQL have to be Hadoop, which is both a waste of time and manpower.

What needs to be added here is that for many companies, especially those whose offline business is not so dense, the resources of big data cluster are wasted in many cases. Therefore, a series of xx on Yarn technologies enable non-Hadoop technologies to make use of the resources of big data cluster, which can greatly improve the utilization of resources, such as Dockeron Yarn.

Data highway

Then the unified log service mentioned above, the output log is eventually turned into data on the data highway for subsequent data processing program consumption. The intermediate process includes log collection and transmission.

Collection: after the unified log service prints the logs on the log service, it needs to be centralized by the log collection mechanism. At present, the common log collection schemes are: Scribe, Chukwa, Kakfa and Flume. Compare with the following figure:

In addition, Logstash is also an optional log collection solution, different from the above, it is more inclined to data preprocessing, and the configuration is simple and clear, and it is often used in operation and maintenance scenarios with the architecture of ELK (Elasticsearch + Logstash + Kibana).

Transmission: data is transferred to a data processing service through message queuing. For logging, you can usually choose Kafka as a message queue.

In addition, a key technology here is the problem of data synchronization between the database and the data warehouse, that is, the scheme used when the data to be analyzed is synchronized from the database to a data warehouse such as Hive. Apache Sqoop can be used for timestamp-based data synchronization. In addition, Ali's open source Canal implements incremental synchronization based on binlog, which is more suitable for general synchronization scenarios, but a lot of business development work needs to be done based on Canal.

Offline data analysis

Offline data analysis can be delayed, which generally aims at non-real-time data analysis work, and produces reports with an one-day delay. At present, the most commonly used offline data analysis technology is not only Hadoop but also Spark. Compared with Hadoop,Spark, it has a great advantage in performance, and of course it has high requirements for hardware resources. Among them, as a resource management and scheduling component, Yarn in Hadoop can also be used in Spark (Spark on Yarn) in addition to serving MR, while Mesos is another resource management and scheduling system.

For Hadoop, traditional MR writing is complex and not easy to maintain, so you can choose to use Hive instead of writing MR with SQL. For Spark, there is also a Spark SQL similar to Hive.

In addition, for offline data analysis, there is a very key problem is data tilt. The so-called data skew refers to the uneven distribution of region data, resulting in low load of some nodes and high load of some nodes, thus affecting the overall performance. It is very important to deal with the problem of data skew for data processing.

Real-time data analysis

Compared with offline data analysis, real-time data analysis is also called online data analysis, aiming at business scenarios with real-time requirements for data, such as advertising settlement, order settlement and so on. At present, the more mature real-time technologies are Storm and Spark Streaming. Compared with Storm,Spark Streaming, it is essentially based on batch computing. If it is a delay-sensitive scenario, you should still use Storm. In addition to these two, Flink is a popular distributed real-time computing framework recently, which supports the semantics of Exactly Once, has the advantage of high throughput and low latency under a large amount of data, and can well support state management and window statistics, but its document and API management platform still need to be improved.

Real-time data processing is generally based on incremental processing, which is not reliable compared to offline. In the event of failure (such as cluster crash) or data processing failure, it is difficult to recover or repair abnormal data. Therefore, the combination of offline + real-time is the most commonly used data processing scheme at present. Lambda architecture is an architecture scheme that combines offline and real-time data processing.

In addition, there is a very common scenario in real-time data analysis: real-time analysis of multi-dimensional data, that is, the ability to combine any dimension for data display and analysis. At present, there are two solutions to this problem: ROLAP and MOLAP.

ROLAP: use relational databases or extended relational databases to manage data warehouse data, represented by Hive, Spark SQL, and Presto.

MOLAP: a multi-bit storage engine based on data cubes that trades space for time and materializes all analysis into physical tables or views. Represented by Druid, Pinot and Kylin, unlike ROLAP (Hive, Spark SQL), it natively supports multi-dimensional data query.

As mentioned in the previous section, ROLAP's scheme, in most cases, users' offline data analysis can not meet the real-time needs, so MOLAP is a common scheme for real-time analysis of multi-dimensional data. For the three commonly used frameworks, the comparison is as follows:

. Using scene language protocol features Druid real-time processing analysis JavaJSON real-time aggregation Pinot real-time processing analysis JavaJSON real-time aggregation KylinOLAP analysis engine JavaJDBC/OLAP preprocessing, cache

Among them, Druid is relatively lightweight, uses more people, and is more mature.

Data impromptu analysis

Some reports generated by offline and real-time data analysis are for reference to data analysts and product managers, but in many cases, online programs can not meet the needs of these demand sides. At this time, it is necessary for the demand side to query and statistics the data warehouse. In view of these demand sides, SQL is easy to use and easy to describe, which determines that it may be the most appropriate way. Therefore, providing an impromptu query tool for SQL can greatly improve the efficiency of data analysts and product managers. Presto, Impala, and Hive are all such tools. If you want to further provide the demand side with a more intuitive ui interface, you can build an internal Hue.

1.1.12 Fault Monitoring

For user-oriented online services, failure is a very serious thing. Therefore, it is very important to do a good job in fault detection and alarm of online services. Fault monitoring can be divided into the following two levels of monitoring:

System monitoring: mainly refers to the monitoring of the host's bandwidth, CPU, memory, hard disk, IO and other hardware resources. You can use Nagios, Cacti and other open source software for monitoring. At present, there are also many third-party services on the market that can provide monitoring of host resources, such as monitoring treasure and so on. Ganglia can be used for monitoring distributed service clusters such as Hadoop, Storm, Kafka, Flume, etc. In addition, Xiaomi's open source OpenFalcon is also very good, covering system monitoring, JVM monitoring, application monitoring and so on, as well as supporting custom monitoring mechanisms.

Business monitoring: monitoring above the host resource level, such as PV of APP, UV data exception, transaction failure, etc. You need to add relevant monitoring code to the business, such as adding a log record where the exception is thrown.

Another key step in monitoring is the alarm. There are many ways to alarm: email, IM, SMS and so on. Taking into account the different importance of the fault, the rationality of the alarm, easy location and other factors, there are the following suggestions:

The alarm log records the ID of the failed machine, especially in the cluster service. If the machine ID is not recorded, it will be difficult to locate the subsequent problems.

It is necessary to aggregate the alarms, and do not alarm each fault separately, which will cause great trouble to the engineer.

Alarms should be graded, and all alarms cannot be given the same priority.

Using Wechat as the alarm software can ensure the arrival rate of the alarm while saving the cost of short messages.

After the fault alarm, then the most important thing is to deal with it. For startups, 24-hour standby is a necessary quality, when you encounter an alarm, you need to respond to the fault as soon as possible, find the problem, and be able to solve the problem in a controllable time. For troubleshooting problems, basically rely on the log. As long as the log is played reasonably, in general, the problem can be located quickly, but if it is a distributed service and the amount of log data is very large, how to locate the log has become a difficult problem. Here are a few options:

A centralized log analysis platform for ELK (Elasticsearch + Logstash + Kibana) is established, which is convenient to search and locate logs quickly. With Yelp open source Elastalert, you can achieve alarm function.

The establishment of a distributed request tracking system (also known as full-link monitoring system) can greatly facilitate the rapid location and collection of single abnormal request information in mass calls for distributed systems, especially micro-service architectures. it can also quickly locate the performance bottleneck of a request link. VIPSHOP's Mercury, Ali's Eagle Eye, Sina's WatchMan and Twitter's open source Zipkin are basically based on Google's Dapper papers, while Dianping's real-time application monitoring platform CAT adds fine-grained call performance statistics on the basis of supporting distributed request tracking (code intrusive). In addition, HTrace, which is being incubated by Apache, is a distributed tracking scheme designed for large distributed systems such as HDFS file system and HBase storage engine. If your microservice implementation uses Spring Cloud, then Spring Cloud Sleuth is the best distributed tracking solution. It should also be mentioned that SkyWalking in Apache incubation is a complete APM (application performance monitoring) system based on distributed tracking. One of its biggest features is that it is based on Java agent + instrument api and does not invade the business code. Pinpoint is another similar APM system that has been used in production environment.

This is the answer to the question about the full set of infrastructure at the back end of the Internet. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.