In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "what are the commonly used open source data analysis applications". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what are the commonly used open source data analysis applications?"
1. Hadoop
When it comes to open source data analysis technology, it is impossible not to mention Hadoop. The Apache Foundation project has almost become synonymous with big data, which allows enterprises to process extremely large data sets on a large scale and distributed. A survey conducted jointly by TDWI and SAS found that nearly 60 per cent of companies expect to have Hadoop clusters in production environments by the end of 2016.
However, it is worth mentioning that Hadoop itself can not achieve data analysis. It is usually part of a larger solution to gain insight from big data.
2. Spark
Spark, also a project owned by Apache, promises to deal with big data quickly. In fact, it claims that "programs run in memory 100 times faster than Hadoop MapReduce and 10 times faster on disk." Because of this excellent performance, it is often used to analyze streaming data or in applications that require interactive analysis. Many companies often use it with Hadoop or Mesos, but it can also run independently. Recently, its popularity has risen sharply, with a survey conducted by Syncsort in 2016 found that nearly 70 per cent of big data staff in the company surveyed were interested in Spark.
3. Talend
Unlike the previous two projects, Talend is managed by a for-profit company, not by a foundation. Therefore, a fee payment service is provided. Talend offers both free and paid products. Its free open source solution, called Talend Open Studio, has been downloaded more than 2 million times.
Market research firm Gartner recently rated Talend as a "leader" in the field of data integration. The company claims that it helps companies analyze big data five times faster than competing solutions at a cost of only 1/5.
4. Jaspersoft
Like Talend, there are several versions of Jaspersoft, some for free and some for a fee. Community editions are free and open source, while Reporting, AWS, Professional and Enterprise editions are available for a fee, but come with support services.
Jaspersoft is an open source business intelligence tool designed to enable enterprise users to meet their needs through self-service. The company claims that its technology supports more than 130000 applications and provides embedded business intelligence.
5. Pentaho
Pentaho boasts a "comprehensive platform for data integration and business intelligence." The company mainly promotes its commercial version of its software, which is based on the open source community version. Many companies use it with tools such as Hadoop and Spark to be able to report and display big data. The software claims to have a large number of well-known customers, including BT, Caterpillar, Nasdaq, the Department of Homeland Security, the National Oceanic and Atmospheric Administration (NOAA), the New York Times, EMC and many other corporate organizations.
6. RapidMiner
RapidMiner claims to be the "number one open source data science platform" and Gartner rates it as the leader in the Magic Quadrant report for advanced analysis. It can achieve self-service predictive analysis, promising to improve fast performance. Users include BMW, Lufthansa, Domino Pizza, Sony, Ford, Salesforce, Amnesty International and General Electric. The entire RadiMiner platform consists of three separate components: RapidMiner Studio, RapidMiner Server and RapidMiner Radoop. All three components are licensed under an open source license or commercial license, and the price of the commercial version depends on the number of users.
7. Storm
Apache Storm, which is used by companies like Yahoo, Twitter, Spotify, Yelp, Flipboard and Groupon, is a real-time big data processing engine. Its official website explains: "Storm allows users to easily and reliably handle unlimited data streams, and its real-time processing function is similar to that of Hadoop in batch processing." Customers can use it with any database or any programming language. It has the advantages of extensibility, fault tolerance and ease of partial use. Users should note, however, that Storm has not yet reached the stage of version 1.0.
8. H2O
H2O is used by more than 60000 data scientists and more than 7000 business organizations and claims to be "the world's leading open source machine learning platform." Because of its memory technology, it provides excellent performance. It is also integrated with many other open source data analysis tools such as Hadoop and Spark to support all major popular databases and provide fee-based support services.
In addition to the standard version of H2O, the company also offers Sparkling Water, which integrates Spark and Steam, an end-to-end artificial intelligence application engine.
9. Lumif3y
Lumify, developed by a company called Altamira Technologies, calls itself the "open source big data analysis and visualization platform." It makes it easy for users to create 2D or 3D graphics, display relationships between entities, or overwrite data on a map. For those interested in learning more about how it works, the official website offers several videos showing the actual operation of Lumify, as well as a demonstration site that allows users to upload their own data and try out the software.
10. Drill
Apache Dril enables users to use SQL queries for non-relational data storage systems. It supports a range of NoSQL and cloud-based data storage systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage and Swift. It also allows users to use a single query to search for multiple datasets stored in different technologies. In addition, it supports many popular business intelligence tools.
11. MongoDB
As one of the best-known NoSQL databases, MongoDB is an open source non-relational data storage solution. Customers include MetLife, Chicago, Expedia, Google, Weather Channel, BuzzFeed and Facebook. In addition to the free and open source version, the company also offers a paid enterprise version and cloud-hosted version of MongoDB Atlas. Forrester Research, a well-known market research institution, rated MongoDB as the "leader" in big data's NoSQL field.
12. SpagoBI
SpagoBI is an open source business intelligence and big data analysis platform. The software is completely free, but also provides user support, maintenance, consulting and training services for a fee. It includes tools for reporting, multidimensional analysis (OLAP), charts, location intelligence, data mining, ETL (extraction transformation and loading), and more. It is also integrated with the popular memory processing engine to achieve real-time processing.
At this point, I believe you have a deeper understanding of "what are the commonly used open source data analysis applications?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.