Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Zhang Xiaolong talks about "big data's five Open Source processing Technologies"

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Overview

There are now more than 250000 open source technologies on the market. How to choose? Let's take a look at five exciting big data techniques.

Storm and Kafka are the main methods of data flow processing in the future, and they have been used in some large companies, including Groupon, Alibaba and The Weather Channel.

Storm, born in Twitter, is a distributed real-time computing system. Storm is designed to handle real-time computing, while Hadoop is mainly used to handle batch operations.

Kafka is a message system developed by LinkedIn, which exists in the system as a basic part of the pipeline for data processing. When you use them together, you can get data in real time and linearly incrementally.

1. Storm\ Kafka

Storm and Kafka are used to make the data flow processing linear, ensuring that each message acquisition is real-time and reliable. The front and rear Storm and Kafka can process 10000 pieces of data smoothly per second. Data flow processing solutions such as Storm and Kafka make many enterprises pay attention to and want to achieve excellent ETL (decimation transformation loading) data integration solutions. Storm and Kafka are also good at memory analysis and real-time computing support. It is impossible and difficult for enterprises to achieve real-time business requirements by using Hadoop solutions for batch processing.

Real-time data stream processing is necessary in the enterprise big data solution because it beautifully handles "3v"-volume,velocity and variety (capacity, rate and diversity).

Drill and Dremel achieve fast, low-load, large-scale, real-time query data search. They provide the possibility of searching P-level data in seconds to deal with ad hoc queries and forecasts, and provide strong virtualization support.

Drill and Dremel provide powerful business processing capabilities, not just for data engineers. Everyone on the business side will like Drill and Dremel. Drill is the open source version of Google's Dremel. Dremel is a technology provided by Google to support big data query.

The Hadoop ecosystem makes MapReduce a very friendly and beneficial tool for advertising analysis. From Sawzall to Pig to Hive, the establishment of many interface layer applications makes Hadoop more friendly and closer to business, but, like the SQL system, these abstraction layers ignore an important fact-MapReduce (or Hadoop) exists to systematize data processing processes.

In the basic methodology of workflow for heap comparison, many business-driven BI and analysis queries are very basic and temporary interaction, low-latency analysis. Some data scientists have long speculated that Drill and Dremel will be better than Hadoop. In Infochimps we like to use Elasticsearch full-text indexing engine to achieve database data search, but really in big data processing we think Drill will become the mainstream.

2 、 R

R is a powerful open source statistical programming language. Since 1997, more than 2 million of statistical analysts have used R. This is a modern version of S language in the field of statistical computing that was born in Bell Labs and quickly became a new standard statistical language. R makes complex data science cheaper. R is an important leader of SAS and SPASS and an important tool for the best statisticians.

Because it is supported by an extraordinarily powerful community, you can find all R class libraries and create virtual scientific data of all types without writing new code. R is exciting because of the people who maintain him and the new daily creation. R community is one of the exciting places in big data field. R is a great technology that will not go out of date in big data's field.

In recent months, thousands of new features have been introduced by analysts with an increasingly open knowledge base. Moreover, R and Hadoop work well together, as a part of big data's processing has been proved.

3 、 Jualia

Julia is an interesting replacement for R because it doesn't like R's slow interpreter. The Julia community is not very strong right now, but it can wait if you don't use it right away. Gremlin and Giraph help enhance graphical analysis and are used in graphical databases such as Neo4j and InfiniteGraph, and in Giraph that works with Hadoop.

Golden Orb is another example of a high-level flow processing diagram-based project. You can take a look. The graph database is a charming marginalized database. There are many interesting differences between them and relational databases, which is that you always want to use graph theory rather than relational theory at the beginning.

Another similar graph-based theory is Google's Pregel, which is an open source alternative to Gremlin and Giraph. In fact, these are examples of copycat implementations of Google technology. Diagrams play an important role in computing network modeling and social networking, and can connect arbitrary data. Another frequent application is mapping and geographic information computing. From A to B, calculate the shortest distance.

Graphs are also widely used in the field of biological and physical computing, for example, they can draw unusual molecular structures. Massive graphs, graph databases and analysis languages and frameworks are all part of a real-world implementation of big data. The basic theory of graph is a killer application. Why do you say that? Any one to solve the problem of large network nodes is dealt with through the path between nodes and nodes. Many creative scientists and engineers clearly use the right tools to solve the corresponding problems.

4 、 SAP hANA

SAP Hana is a full-memory analysis platform, which includes an in-memory database and some related tools to create analysis flow and standardize the correct format for data input and output.

Hana assumes that other programs are not fast enough to solve problems encountered, such as financial modeling and decision support, website personalization and fraud detection, and so on. The biggest disadvantage of Hana is "full memory", which means accessing soft-state memory, which is clearly a bit, but it is also an expensive part compared to disk storage. According to organizers, don't worry about operating costs, Hana is a fast, low-latency big data processing tool.

5 、 D3

D3 is a javascript document-oriented visual class library. It is powerful and innovative so that we can see the information directly and let us interact normally. It was written by Michael Bostock, a graphical interface designer for the New York Times. For example, you can use D3 to create HTMl tables from any number of arrays. You can use any data to create interactive progress bars, etc. With D3, programmers can create interfaces between them and organize all kinds of data.

It has been nearly a year since the formal use of Hadoop. During this period, from Baidu to the present BitWare, different companies use different technologies to solve problems. But in essence, there are always a few problems, and of course now many companies are beginning to try Hadoop. It is understandable that this is the general environment.

Storm and Kafka have been paying attention since 11 years, and Storm also has some second-line applications in Ali, but on the whole, Storm, which has just turned one year old, has become more and more stable under the polishing of the hero nathanmarz, and there are some online applications. So on the whole, I am very optimistic about this technology, because now using hadoop can not achieve real-time processing, using HBase to use for the main database, it can still be solved for the time being.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report