Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The top ten big data technologies adopted today

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Big data is growing explosively, and new projects are emerging from companies from all over the world every day.

The good news is that all technologies are open source and can be adopted by you today.

Hadoop

Solid, the foundation of enterprise strength and everything else. You need the infrastructure of YARN and HDFS as well as Hadoop as the primary data store and run critical big data servers and applications

Spark

Easy to use, support all important big data languages (Scala,Python,Java,R), a huge ecosystem, fast growth, easy to miniature / batch / SQL support. This is another wise choice.

Here I still want to recommend the big data Learning Exchange Group I built myself: 529867072, all of them are developed by big data. If you are studying big data, the editor welcomes you to join us. Everyone is a software development party. Irregularly share practical information (only related to big data software development), including the latest big data advanced materials and advanced development tutorials sorted out by myself. Welcome to join us if you want to go deep into big data.

NiFi

NSA's tools allow easy data intake, storage and processing from so many sources, with minimal coding and a flexible user interface. Dozens of sources from social media, JMS,NoSQL,SQL,Rest / JSON Feeds,AMQP,SQS,FTP,Flume,ElasticSearch,S3,MongoDB,Splunk,Email,HBase,Hive,HDFS,Azure Event Hub,Kafka, etc. If you don't have the source or receiver you need, writing your own processor is straightforward Java code. Another great Apache project in your toolbox. This is a Swiss big data tool.

Apache Hive 2.1

Apache Hive has always been the SQL solution on Hadoop. Through the latest version, performance and feature enhancements, Hive has become the solution of big data SQL.

Kafka

The choice of asynchronous distributed message transmission between big data systems. It integrates into most stacks. From Spark to NiFi to third-party tools, from Java to Scala, it is a good glue between systems. This needs to be in your stack.

Phoenix

HBase-Open source BigTable, a large number of companies are committed to HBase and make it large. NoSQL is supported by HDFS and integrates perfectly with all tools. Adding Phoenix to HBase is making it the first choice for NoSQL. This adds SQL,JDBC,OLTP and operational analysis to HBase.

Zeppelin

Easy to integrate notebook tools for dealing with Hive,Spark,SQL,Shell,Scala,Python and a large number of other data exploration and machine learning tools. It is very easy to use and is a good way to explore and query data. The tool is gaining support and functionality. They just need to improve their charts and drawings.

H2O

H2O fills the gap in machine learning of Spark and works properly. It can do all the machine learning you need.

Apache Beam

A unified framework for data processing pipeline development in Java. This allows you to support Spark and Flink as well. Other frameworks will come online, and you don't have to learn too many frameworks.

Stanford CoreNLP

Natural language processing is huge, but growing more. Stanford University is continuing to improve their framework.

Obviously, there are a large number of big data projects, so your best choice is to start with the basic distribution, which includes and tests versions of the project, and ensures that they work smoothly with security and management. I recommend using Hortonworks Connected Data Platforms as your foundation. If we are in the top 20, I will add more projects, especially Storm, SOLR,Apache Oozie and Apache HAWQ. There are a lot of great techniques below, and in most cases, you don't see or know things like Apache Tez (although you need to configure it when you run Hive), Apache Calcite,Apache Slider,Apache Zookeeper and Livy. These projects are critical to running the big data infrastructure.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report