In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Which of these five must-know big data processing framework technologies should be used in your project?
Big data is a general term for non-traditional strategies and technologies needed to collect, collate, process large-capacity data sets and gain insights from them. Although the computing power or storage capacity required to process data has long exceeded the upper limit of a computer, the universality, scale, and value of this type of computing have only experienced large-scale expansion in recent years. This article will introduce one of the most basic components of the big data system: the processing framework. The processing framework is responsible for calculating the data in the system, such as processing data read from non-volatile storage, or data that has just been ingested into the system. The calculation of data refers to the process of extracting information and opinions from a large number of single data points.
Processing framework
What is the processing framework of big data?
The processing framework and processing engine are responsible for calculating the data in the data system. Although there is no authoritative definition of the difference between "engine" and "framework", most of the time the former can be defined as the component that is actually responsible for handling data operations, while the latter can be defined as a series of components that perform similar roles. For example, Apache Hadoop can be seen as a processing framework that uses MapReduce as the default processing engine. Engines and frames can usually replace each other or be used at the same time. For example, another framework, Apache Spark, can incorporate Hadoop and replace MapReduce. This interoperability between components is one of the reasons why big data's system is so flexible.
Although the systems responsible for processing data at this stage of the life cycle are usually complex, their goals are very consistent in a broad sense: to improve understanding by performing operations on the data and to reveal the patterns contained in the data. and gain insights into complex interactions.
To simplify the discussion of these components, we will classify them according to the state of the data being processed according to the design intent of the different processing frameworks. Some systems can process data in batches, while others can stream data that flows into the system. In addition, there are some systems that can process both types of data at the same time.
Before delving into the metrics and conclusions of different implementations, we need to give a brief introduction to the concepts of different processing types.
Batch processing system
Batch processing has a long history in big data's world. Batch processing mainly operates on large-capacity static data sets and returns results after the calculation process is completed.
Datasets used in batch mode usually meet the following characteristics.
Bounded: batch datasets represent a finite set of data
Persistence: data is usually always stored in some type of persistent storage location
Large numbers: batch operations are usually the only way to deal with extremely large datasets
Batch processing is ideal for computing work that requires access to a full set of records. For example, when calculating totals and averages, the dataset must be treated as a whole, not as a collection of multiple records. These operations require the data to maintain its own state while the calculation is in progress.
Tasks that need to process large amounts of data are usually best suited for batch processing. Whether processing the dataset directly from the persistent storage device or loading the dataset into memory first, the batch system takes full account of the amount of data in the design process and provides sufficient processing resources. Batch processing is often used to analyze historical data because of its excellent performance in dealing with large amounts of persistent data. Welcome to join big data Learning Exchange and sharing Group: 658558542 blow water exchange and study together (click on ☛ to join the group chat)
It takes a lot of time to process a large amount of data, so batch processing is not suitable for situations that require high processing time.
Apache Hadoop
Apache Hadoop is a processing framework dedicated to batch processing. Hadoop is the first big data framework that has received a lot of attention in the open source community. Based on Google's published papers and experiences on massive data processing, Hadoop has reimplemented related algorithms and component stacks to make large-scale batch technology easier to use.
The new version of Hadoop contains multiple components, that is, multiple layers, which can be used together to process batch data:
HDFS:HDFS is a distributed file system layer that coordinates storage and replication between cluster nodes. HDFS ensures that data is still available after inevitable node failures and can be used as a data source to store the processing results of intermediate states and the final results of calculations.
YARN:YARN is an acronym for Yet Another Resource Negotiator (another resource manager) and acts as a cluster coordination component of the Hadoop stack. This component is responsible for coordinating and managing the operation of the underlying resources and scheduling jobs. By acting as an interface to cluster resources, YARN enables users to run more types of workloads in an Hadoop cluster than in previous iterations.
MapReduce:MapReduce is the native batch engine for Hadoop.
Batch processing mode
The processing function of Hadoop comes from the MapReduce engine. The processing technology of MapReduce meets the requirements of map, shuffle and reduce algorithms using key-value pairs. The basic processing process includes:
Read datasets from the HDFS file system
Split the dataset into small chunks and assign them to all available nodes
Compute for a subset of data on each node (the intermediate state result of the calculation is rewritten to HDFS)
Reallocate intermediate state results and group them by key
"Reducing" the value of each key by summarizing and combining the calculated results of each node
Write the final result of the calculation back to HDFS
Advantages and limitations
Because this approach relies heavily on persistent storage, each task requires multiple read and write operations, so it is relatively slow. On the other hand, because disk space is usually the most abundant resource on the server, this means that MapReduce can handle very large datasets. It also means that Hadoop's MapReduce can usually run on cheap hardware compared to other similar technologies, because it doesn't need to store everything in memory. MapReduce has extremely high scaling potential, and applications containing tens of thousands of nodes have appeared in the production environment.
The learning curve of MapReduce is steep, and although other peripheral technologies in the Hadoop ecosystem can significantly reduce the impact of this problem, it still needs to be paid attention to when implementing some applications quickly through Hadoop clusters. Welcome to join big data Learning Exchange and sharing Group: 658558542 blow water exchange and study together (click on ☛ to join the group chat)
A vast ecosystem has been formed around Hadoop, and the Hadoop cluster itself is often used as a component of other software. Many other processing frameworks and engines can also use HDFS and YARN Explorer through integration with Hadoop.
Summary
Apache Hadoop and its MapReduce processing engine provide a proven batch model that is best suited for processing very large datasets that are not time-critical. A fully functional Hadoop cluster can be built through very low-cost components, so that this cheap and efficient processing technology can be flexibly applied in many cases. Compatibility and integration with other frameworks and engines make Hadoop the underlying foundation for multiple workload processing platforms using different technologies.
Stream processing system
The stream processing system calculates the data that enters the system at any time. This is a completely different approach compared to batch mode. Stream processing does not need to perform operations on the entire dataset, but on each data item transmitted through the system.
The dataset in streaming is "borderless", which has several important effects:
The complete data set can only represent the total amount of data that has entered the system so far.
Working data sets may be more relevant and can only represent a single data item at a particular time.
Processing is event-based and there is no "end" unless it is explicitly stopped. The processing results are available immediately and will continue to be updated as the new data arrives.
Stream processing system can process almost unlimited amount of data, but only one (real stream processing) or a very small amount (micro-batch, Micro-batch Processing) data can be processed at a time, and only a minimum of state is maintained between different records. Although most systems provide methods for maintaining certain states, flow processing is optimized for more functional processing (Functional processing) with fewer side effects.
Functional operations mainly focus on discrete steps with limited state or side effects. Performing the same operation on the same data or slightly other factors produces the same results, and this kind of processing is very suitable for flow processing, because the state of different items is usually a combination of difficulties, limitations, and results that are not needed in some cases. So while some types of state management are usually feasible, these frameworks are generally simpler and more efficient when there is no state management mechanism.
This type of processing is well suited for certain types of workloads. Tasks with near-real-time processing requirements are well suited to use the stream processing mode. Analytics, server or application error logs, and other time-based metrics are the most appropriate types, as responding to changes in data in these areas is critical to business functions. Stream processing is ideal for dealing with data that must respond to changes or peaks and focus on changing trends over time.
Apache Storm
Apache Storm is a flow processing framework that focuses on very low latency and may be the best choice for workloads that require near real-time processing. This technology can handle very large amounts of data and provide results with lower latency than other solutions.
Stream processing mode
The flow processing of Storm can orchestrate the DAG (Directed Acyclic Graph, directed acyclic graph) called Topology (topology) in the framework. These topologies describe the different transformations or steps that need to be performed for each incoming fragment after the data fragment enters the system.
The topology consists of:
Stream: ordinary data flow, which is borderless data that continues to reach the system.
Spout: the source of data flow at the edge of the topology, such as API or query, from which the data to be processed can be generated.
Bolt:Bolt represents the processing step that needs to consume stream data, apply operations to it, and output the results in the form of a stream. The Bolt needs to establish a connection with each Spout and then connect to each other to make up all the necessary processing. At the end of the topology, you can use the final Bolt output as input to other interconnected systems.
The idea behind Storm is to define a large number of small discrete operations using the above components, and then combine multiple components into the desired topology. By default, Storm provides a "at least one" processing guarantee, which means that each message can be processed at least once, but in some cases it may be processed multiple times if it fails. Storm cannot guarantee that messages can be processed in a particular order.
To achieve strict one-time processing, that is, stateful processing, an abstraction called Trident can be used. Strictly speaking, a Storm that does not use Trident is usually called Core Storm. Trident has a great impact on the processing power of Storm, increasing latency, providing state for processing, and using micro-batch mode instead of itemized pure stream processing mode.
To avoid these problems, Storm users are generally advised to use Core Storm whenever possible. It is also important to note, however, that Trident's strict one-time processing of content is also useful in some cases, such as when the system cannot intelligently handle duplicate messages. If you need to maintain state between items, such as calculating how many users have clicked on a link in an hour, Trident will be your only option. Although it does not take full advantage of the inherent advantages of the framework, Trident increases the flexibility of Storm.
The Trident topology consists of:
Stream batch: this refers to micro-batches of streaming data that provide batch semantics in chunks.
Operation: a batch process that can be performed on data.
Advantages and limitations
At present, Storm may be the best solution in the field of near real-time processing. This technology can process data with very low latency and can be used for workloads that want the lowest latency. If the processing speed has a direct impact on the user experience, such as the need to provide the processing results directly to the website page opened by the visitor, Storm will be a good choice.
The combination of Storm and Trident allows users to use microbatches instead of pure streaming. While this gives users more flexibility to build tools that better meet the requirements, it also weakens the technology's greatest advantage over other solutions. Having said that, it is always good to have one more way of handling streams.
Core Storm cannot guarantee the order in which messages are processed. Core Storm provides a guarantee that messages can be processed "at least once", which means that every message can be processed, but duplication can also occur. Trident provides a strict one-time processing guarantee, which can provide sequential processing between different batches, but cannot be implemented within a batch.
In terms of interoperability, Storm can be integrated with Hadoop's YARN Explorer, so it can be easily integrated into existing Hadoop deployments. In addition to supporting most processing frameworks, Storm supports multiple languages, providing more choices for users to define their topologies.
Summary
For pure streaming workloads with high latency requirements, Storm is probably the most appropriate technology. This technology can ensure that every message is processed and can be used with a variety of programming languages. Because Storm cannot batch, you may need to use other software if these capabilities are needed. If there is a high requirement for strict one-time processing guarantee, you can consider using Trident at this time. In this case, however, other flow processing frameworks may be more appropriate. Welcome to join big data Learning Exchange and sharing Group: 658558542 blow water exchange and study together (click on ☛ to join the group chat)
Apache Samza
Apache Samza is a flow processing framework closely bound to the Apache Kafka messaging system. Although Kafka can be used in many stream processing systems, Samza is designed to give better play to the unique architectural advantages and guarantees of Kafka. This technology provides fault tolerance, buffering, and state storage through Kafka.
Samza can use YARN as the resource manager. This means that Hadoop clustering is required by default (with at least HDFS and YARN), but it also means that Samza can directly use YARN's rich built-in capabilities.
Stream processing mode
Samza relies on the semantics of Kafka to define how the flow is handled. Kafka involves the following concepts when dealing with data:
Topic (topic): each data flow that enters the Kafka system can be called a topic. A topic is basically a data stream of related information that can be subscribed to by consumers.
Partition (partitions): in order to spread a topic to multiple nodes, Kafka divides incoming messages into multiple partitions. Partitioning will be based on the key (Key), which ensures that every message containing the same key can be divided into the same partition. The order of partitions can be guaranteed.
Broker (proxy): each node that makes up a Kafka cluster is also called an agent.
Producer (generator): any component that writes data to a Kafka topic can be called a generator. The generator can provide the keys needed to divide the topic into partitions.
Consumer (consumer side): any component that reads topics from Kafka can be called consumer side. Consumers are responsible for maintaining information about their branches so that they can know which records have been processed after a failure.
Because Kafka is equivalent to an immutable log, Samza also needs to deal with immutable data streams. This means that any new data flow created by the transformation can be used by other components without affecting the original data flow.
Advantages and limitations
At first glance, Samza's dependence on Kafka-like query systems seems to be a limitation, but it can also provide the system with some unique guarantees and functions that other stream processing systems do not have. For example, Kafka already provides a copy of the data store that can be accessed in a low-latency manner, as well as a very easy-to-use and low-cost multi-subscriber model for each data partition. All output, including the results of the intermediate state, can be written to Kafka and can be used independently by the downstream steps. This close dependence on Kafka is similar in many ways to the dependence of MapReduce engines on HDFS. While reliance on HDFS between each calculation in a batch causes some serious performance problems, it also avoids many other problems encountered in streaming.
The close relationship between Samza and Kafka allows the processing steps themselves to be very loosely coupled. You can add any number of subscribers to any step of the output without prior coordination, which is useful for organizations with multiple teams that need to access similar data. Multiple teams can subscribe to all the data topics that enter the system, or arbitrarily subscribe to topics created by other teams after some processing of the data. All of this does not put additional pressure on load-intensive infrastructure such as databases.
Writing directly to Kafka also avoids back pressure (Backpressure) problems. Backpressure refers to a situation in which the peak load causes the flow of data to exceed the real-time processing capacity of the component, which may lead to a standstill in processing and possible loss of data. By design, Kafka can keep the data for a long time, which means that the component can continue processing when it is convenient and can be restarted directly without worrying about any consequences.
Samza can store data using a fault-tolerant checkpoint system implemented as local key-value storage. This gives Samza delivery assurance "at least once", but in the face of failures due to possible multiple deliveries of data, the technology does not provide accurate recovery of aggregated states, such as counting.
The high-level abstraction provided by Samza makes it easier to use in many ways than the Primitive provided by systems such as Storm. Currently, Samza only supports the JVM language, which means that it is not as flexible as Storm in terms of language support.
Summary
For environments that already have or are easy to implement Hadoop and Kafka, Apache Samza is a good choice for streaming workloads. Samza itself is well suited for organizations where multiple teams need to use (but not necessarily closely coordinate with each other) multiple data streams at different processing stages. Samza can greatly simplify a lot of stream processing work and achieve low latency performance. If the deployment requirements are not compatible with the current system, it may not be suitable for use, but if very low latency processing is required, or if there is a high demand for strict primary processing semantics, it is still appropriate to consider at this time.
Hybrid processing systems: batch and stream processing
Some processing frameworks can handle both batch and streaming workloads. These frameworks can simplify different processing requirements by using the same or related components and API to process two types of data. As you can see, this feature is mainly implemented by Spark and Flink, both of which are described below. The implementation of such a function focuses on how to unify the two different processing modes and what assumptions should be made about the relationship between fixed and non-fixed data sets. Although projects that focus on a particular type of processing will better meet the requirements of specific use cases, the hybrid framework is intended to provide a general solution for data processing. This framework can not only provide the methods needed to process data, but also provide its own integration items, libraries and tools, which can be competent for a variety of tasks, such as graphic analysis, machine learning, interactive query and so on.
Apache Spark
Apache Spark is a next-generation batch framework that includes streaming capabilities. Spark, developed on the same principles as Hadoop's MapReduce engine, focuses on speeding up batch workloads through sound memory computing and processing optimization mechanisms.
Spark can be deployed as a separate cluster (requiring the cooperation of the appropriate storage layer), or it can be integrated with Hadoop and replace the MapReduce engine.
Batch processing mode
Unlike MapReduce, Spark's data processing is done entirely in memory, requiring interaction with the storage layer only when the data is initially read into memory and the final result is persisted. The processing results of all intermediate states are stored in memory.
Although in-memory processing can greatly improve performance, Spark also has a significant increase in speed when dealing with disk-related tasks, because better overall optimization can be achieved by analyzing the entire task set in advance. For this reason, Spark can create a Directed Acyclic Graph (directed acyclic graph), namely DAG, which represents all the operations to be performed, the data to be operated, and the relationship between the operations and the data, so that the processor can coordinate tasks more intelligently.
To implement batch computing in memory, Spark uses a model called Resilient Distributed Dataset (Elastic distributed dataset), or RDD, to process the data. This is an immutable structure that represents a dataset and is only in memory. Actions performed on RDD generate a new RDD. Each RDD can be traced back to the parent RDD through Lineage, and eventually to the data on disk. Spark enables fault tolerance through RDD without having to write the results of each operation back to disk.
Stream processing mode
Stream processing power is implemented by Spark Streaming. Spark itself is mainly designed for batch workloads, and to make up for the differences in engine design and streaming workload characteristics, Spark implements a concept called Micro-batch *. In terms of specific strategy, this technology can treat the data flow as a series of very small "batches", which can be processed through the native semantics of the batch engine.
Spark Streaming buffers the stream in subsecond increments, which are then batch processed as small fixed datasets. The actual effect of this approach is very good, but there are still deficiencies in performance compared with the real flow processing framework. Welcome to join big data Learning Exchange and sharing Group: 658558542 blow water exchange and study together (click on ☛ to join the group chat)
Advantages and limitations
The main reason for using Spark instead of Hadoop MapReduce is speed. With the help of mechanisms such as memory computing strategy and advanced DAG scheduling, Spark can process the same dataset at a faster speed. Another important advantage of Spark is diversity. The product can be deployed as a stand-alone cluster or integrated with an existing Hadoop cluster. The product can run batch and stream processing, and run a cluster to handle different types of tasks. In addition to the capabilities of the engine itself, an ecosystem containing various libraries has been built around Spark, which can provide better support for machine learning, interactive query and other tasks. Compared to MapReduce,Spark tasks, it is "notoriously" easier to write, so it can greatly increase productivity.
The batch processing method is used for the streaming system, and the data entering the system needs to be buffered. The buffering mechanism allows the technology to process a very large amount of incoming data and improve the overall throughput, but waiting for the buffer to empty can also lead to increased latency. This means that Spark Streaming may not be suitable for workloads with high latency requirements. Because memory is usually more expensive than disk space, Spark is more expensive than disk-based systems. However, the increase in processing speed means that tasks can be completed more quickly, which can often offset the increased cost in environments where resources need to be paid by hours.
Another consequence of the Spark memory computing design is that you may encounter insufficient resources if deployed in a shared cluster. It consumes more resources than Hadoop MapReduce,Spark and may have an impact on other tasks that need to use the cluster at the same time. By its very nature, Spark is less suitable for coexisting with other components of the Hadoop stack.
Summary
Spark is the best choice for diverse workload processing tasks. Spark batch capabilities provide an unparalleled speed advantage at the cost of higher memory footprint. For workloads that focus on throughput rather than latency, it is more appropriate to use Spark Streaming as a stream processing solution.
Apache Flink
Apache Flink is a streaming framework that can handle batch tasks. This technique can treat batch data as a data stream with finite boundaries, thus processing batch tasks as a subset of stream processing. Taking a flow-first approach to all processing tasks can have a series of interesting side effects.
This flow-first approach is also called the Kappa architecture, as opposed to the better-known Lambda architecture, in which batch processing is used as the primary processing method, streams are used as complements, and early unrefined results are provided. Everything is streamlined in the Kappa architecture to simplify the model, which is only possible after the recent maturity of the streaming engine.
Flow processing model
Flink's stream processing model treats each item as a real data stream when processing incoming data. The DataStream API provided by Flink can be used to handle endless data streams. The basic components that can be used with Flink include:
A Stream (stream) is an immutable, unbounded data set that flows through the system.
Operator (operator) is a function that performs operations on data streams to generate other data streams.
Source (source) is the entry point at which data flows into the system.
Sink (slot) refers to the location to which the data flow enters after leaving the Flink system. The slot can be a database or a connector to another system.
In order to recover after a problem is encountered during the calculation, the flow processing task creates a snapshot at a predetermined point in time. To implement state storage, Flink can be used with a variety of state back-end systems, depending on the complexity and persistence level of the implementation required.
In addition, the stream processing power of Flink can also understand the concept of "event time", which refers to the time when the event actually occurred, and it can also handle sessions. This means that the execution order and grouping can be ensured in some interesting way.
Batch processing model
To a large extent, Flink's batch model is only an extension of the convection model. At this point, the model no longer reads data from the persistent stream, but reads the bounded dataset in the form of a stream from the persistent storage. Flink uses exactly the same runtime for these processing models.
Flink can optimize batch workloads to some extent. For example, because batch operations can be supported through persistent storage, Flink can not create snapshots of batch workloads. Data can still be recovered, but regular processing operations can be performed faster.
Another optimization is to decompose batch tasks so that different phases and components can be invoked as needed. In this way, Flink can better coexist with other users of the cluster. Analyzing the task in advance allows Flink to view all the operations that need to be performed, the size of the dataset, and the steps that need to be performed downstream to achieve further optimization.
Advantages and limitations
Flink is currently a unique technology in the area of processing frameworks. Although Spark can also perform batch and stream processing, the micro-batch architecture of Spark's stream processing makes it impossible to apply to many use cases. The Flink stream processing first approach provides low latency, high throughput, and almost itemized processing capabilities.
Many of the components of Flink are self-managed. Although this practice is rare, for performance reasons, the technology can manage memory on its own without relying on the native Java garbage collection mechanism. Different from Spark, Flink does not need to optimize and adjust manually after the characteristics of the data to be processed change, and this technology can also handle data partitioning and automatic caching on its own.
Flink optimizes tasks by dividing work in a variety of ways. This analysis is partly similar to the optimization of relational databases made by SQL query planner, which can determine the most efficient implementation method for specific tasks. The technology also supports multi-phase parallel execution and aggregates the data of blocked tasks. For iterative tasks, for performance reasons, Flink attempts to perform the corresponding computing tasks on the node where the data is stored. In addition, you can perform "incremental iterations", or iterate only for the parts of the data that have changed.
In terms of user tools, Flink provides a Web-based scheduling view that makes it easy to manage tasks and view system status. Users can also view the optimization scheme for the submitted task to see how the task is ultimately implemented in the cluster. For analytical tasks, Flink provides SQL-like queries, graphical processing, and machine learning libraries, as well as in-memory computing.
Flink works well with other components. If used in conjunction with the Hadoop stack, this technique fits well into the environment and takes up only the necessary resources at all times. This technology can be easily integrated with YARN, HDFS, and Kafka. With the help of compatibility packs, Flink can also run tasks written for other processing frameworks, such as Hadoop and Storm.
One of the biggest limitations of Flink at present is that it is still a very "young" project. In the real world, the large-scale deployment of this project is not as common as other processing frameworks, and there is no in-depth research on the limitations of Flink in terms of scaling capabilities. With the advance of the rapid development cycle and the improvement of features such as compatibility packs, as more and more organizations start to try, there may be more and more Flink deployments.
Summary
Flink provides low latency stream processing while supporting traditional batch tasks. Flink is perhaps best suited for organizations with extremely high-flow processing requirements and a small number of batch tasks. The technology is compatible with native Storm and Hadoop programs and can be run on YARN-managed clusters, so it can be easily evaluated. The rapid progress of development work makes it worthy of attention.
Conclusion
Big data system can use a variety of processing techniques.
For workloads that only require batch processing, Hadoop, which is cheaper to implement than other solutions, would be a good choice if it is not time-sensitive.
For workloads that only require streaming, Storm can support a wider range of languages and achieve extremely low latency processing, but the default configuration may produce repetitive results and cannot guarantee order. Samza's tight integration with YARN and Kafka provides greater flexibility, easier use by multiple teams, and easier replication and state management.
For mixed workloads, Spark provides streaming in both high-speed batch and micro-batch modes. The support of this technology is more perfect, and it has a variety of integration libraries and tools to achieve flexible integration. Flink provides true streaming and batch capabilities, and deep optimization allows you to run tasks written for other platforms, providing low-latency processing, but it is too early for practical application.
The most appropriate solution depends largely on the state of the data to be processed, the need for time to process, and the desired results. Specifically, the use of a full-featured solution or a solution that focuses primarily on a project needs to be carefully weighed. As it matures and becomes widely accepted, similar issues need to be considered when evaluating any emerging innovative solutions.
Conclusion
Thank you for watching. If there are any deficiencies, you are welcome to criticize and correct them.
If you have a partner who is interested in big data or a veteran driver who works in big data, you can join the group:
658558542 (click on ☛ to join the group chat)
It collates a large volume of learning materials, all of which are practical information, including the introduction to big data's technology, high-level analysis language for massive data, distributed storage for massive data storage, and distributed computing for massive data analysis. for every big data partner, this is not only a gathering place for Xiaobai, but also Daniel online solutions! Welcome beginners and advanced partners to join the group to learn and communicate and make progress together!
Finally, I wish all the big data programmers who encounter bottlenecks to break through themselves and wish you all the best in the future work and interview.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.