In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
After reading the previous section, let's explain this section:
Data visualization
In big data's ocean, how to explore and visualize the data more intuitively is also the most noteworthy direction at present, such open source projects include D3Chart.js, Arbor, DC.js, Sigma.js, Zeppelin and so on. Siege masters who are familiar with the front-end technology can use these excellent library to show big data directly to people in the form of charts and charts.
=
The following are personal comments:
1. The current data warehouse products have undergone great changes. The maturity of kylin and durid provides a better solution for OLAP. In addition, spark is becoming more and more mature in data analysis.
2. MADlib has become the top-level project of apache, and HAWQ has also been released to 2.2.0.0. It is believed that it will soon become a top-level project.
3. GemFire is a good thing. Its copyright belongs to Pivotal (a joint venture between EMC and VMware/GE). Salvatore Sanfilippo, the founder of Redis, now works for Pivotal. At present, it mainly applies banking, social security, 12306 and other trading systems.
Big data open source framework
ElasticSearch
1.1 advantages of ElasticSearch:
High concurrency. It is measured that es allocates 10g memory single instances on a stand-alone machine, and its writing capacity is 1200qpsfuther 60g memory. Three instances from 12-core CPU are expected to reach 6000qps.
Write a single piece of data to average 3ms in the same room (slower than mysql, mg is not clear)
Fault tolerance is better than mg. For example, if the master has more than one slave, the master and slave will be automatically topped.
To meet the real-time reading and writing needs of big data, there is no need for sub-libraries (there is no concept of libraries).
Easy to expand. The concurrency and volume can be expanded by configuring between instances, and the write mechanism of automatic allocation does not need to worry about the criticism of multi-master synchronization in traditional db.
Support more complex conditional queries, group by, sorting are not a problem
It has a certain relationship, but you don't have to worry about large fields.
1.2 disadvantages of ElasticSearch:
Transactions are not supported
There is a certain delay in reading and writing
No right to limit management.
Lucene
Lucene is a JAVA search class library, which is not a complete solution by itself and requires additional development work.
2.1 benefits of Lucene
Mature solutions, there are many successful cases. Apache top-level projects are continuing to make rapid progress. A large and active development community, a large number of developers. It is just a class library with enough room for customization and optimization: after simple customization, it can meet most common needs; after optimization, it can support 1 billion + search.
2.2 disadvantages of Lucene
Additional development work is required. All the extension, distribution, reliability and so on need to be implemented on their own; non-real-time, there is a time delay from indexing to searching, while the scalability of the current "near real-time" (Lucene Near Real Time search) search scheme needs to be further improved. Redis
3.1 benefits of Redis
Excellent reading and writing performance
Data persistence is supported, AOF and RDB are supported.
Master-slave replication is supported, and the host automatically synchronizes the data to the slave, allowing read-write separation.
Rich data structure: in addition to supporting string type value, it also supports string, hash, set, sortedset, list and other data structures.
3.2 disadvantages of Redis
Redis does not have automatic fault tolerance and recovery features, and the downtime of the host slave will cause the read and write requests of the frontend to fail. You need to wait for the machine to restart or manually switch the IP of the frontend to recover.
The host is down, some of the data can not be synchronized to the slave in time before the downtime, and the problem of data inconsistency will be introduced after switching IP, which reduces the availability of the system.
The master-slave replication of Redis adopts full replication. In the process of replication, the master fork makes a snapshot of the memory of a child process, and saves the memory snapshot of the child process as a file to send to the slave. This process needs to ensure that the host has enough free memory. If the snapshot file is large, it will have a great impact on the service capacity of the cluster, and the replication process will be carried out when the slave joins the cluster or when the slave is disconnected from the host network. That is, network fluctuations will cause a full data replication between the host and the slave, which causes a lot of trouble to the actual system operation.
It is difficult for Redis to support online expansion, and it will become very complicated when the cluster capacity reaches the upper limit. In order to avoid this problem, operators must ensure that there is enough space when the system is online, which causes a great waste of resources.
HBase
4.1 benefits of HBase
Columns can be dynamically increased, and empty columns do not store data, saving storage space.
Hbase automatically splits the data, so that the data storage automatically has horizontal scalability.
Hbase can provide support for highly concurrent read and write operations
4.2 disadvantages of HBase
Conditional queries are not supported, only queries by Row key are supported.
The failover of Master server is not supported for the time being, and when the Master goes down, the entire storage system will die.
Hadoop
Benefits of 5.1 Hadoop
The expansibility of Hadoop cluster is one of its major characteristics. Hadoop can be expanded to thousands of nodes, which is suitable for the demand of continuous growth of data and huge amount of data.
The cost of Hadoop is another advantage, because Hadoop is an open source project, and it not only saves costs from software, but also has low hardware requirements. At present, it is popular to go to IOE, and low-cost Hadoop is also a big push.
Hadoop ecosystem is active, its surrounding open source projects are rich, and there are many basic open source projects such as HBase, Hive,Impala and so on.
6.5.2 disadvantages of Hadoop
Full scenario, in-task serial
Heavy throughput, response time is not guaranteed at all
The intermediate result is invisible and cannot be shared.
Single input and single output, chain waste is serious.
Chained MR cannot be parallel
Coarse-grained fault tolerance, which may cause traps
The graph calculation is not friendly.
Iterative computation is not friendly
Can not support second-level computing, only suitable for offline data analysis tasks
These are just some of my opinions. If there is anything wrong, you can point it out at any time.
Some other articles about big data will be updated in the future.
Many people know that I have big data training materials, and they naively think that I have a full set of big data developers, hadoop, spark and so on.
Frequent learning materials. I would like to say that you are right. I do have a full set of video materials developed by big data, hadoop and spark.
If you are interested in big data development, you can add a group to get free learning materials: 763835121
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.