In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
1 HDFS
1.1 concept
Hadoop distributed file system (HDFS) is designed to be suitable for distributed file systems running on general hardware (commodity hardware).
1.2 Features
-High fault tolerance
-low hardware requirements
-provide high-throughput data access
1.3 File system Command Line
1.3.1 get help
Hadoop fs-help
1.3.2 ls command
Hadoop fs-ls / hadoop fs-ls-R / user
1.3.3 getconf command
Hdfs getconf-helphdfs getconf-namenodes
1.3.4 version Information
Hdfs version
Note: as it is similar to the usage of linux system instructions, please refer to the official link at the end of the article for details.
2 MapReduce
2.1 introduction to MapReduce
MapReduce is a programming model for parallel operations on large datasets (larger than 1TB).
2.2 how it works
If there are black beans, soybeans, mung beans and red beans on a plate, you want to pick out the red beans now.
The MapReduce method is:
Step1 found a team to deal with (equivalent to a cluster of servers)
Step2 distributes beans equally to each member of the team (equivalent to allocating data to servers in the cluster)
Step3 asked team members to start picking out red beans (the equivalent of cluster computers processing data in parallel)
Step4 aggregates the beans singled out by team members (equivalent to cluster summary and output results)
3 Hive
3.1 introduction to Hive
3.1.1 concept
Hive is a data warehouse platform based on Hadoop.
3.1.2 the role of Hive
Through hive, we can easily carry out the work of ETL.
Hive defines a query language similar to SQL
HQL can convert QL written by users into corresponding Mapreduce programs and execute them based on Hadoop.
3.1.3 History of Hive projects
Hive is a data warehouse framework opened by Facebook in August 2008. its system goals are similar to those of Pig, but it has some mechanisms that Pig does not support yet.
For example: richer type system, more SQL-like query language, persistence of Table/ metadata, and so on.
4 impala
4.1introduction to Impala
Impala is a real-time interactive SQL big data query tool developed by Cloudera under the inspiration of Google's Dremel. Instead of using slow Hive+MapReduce batch processing, Impala can query data directly from HDFS or HBase with SELECT, JOIN and statistical functions by using a distributed query engine similar to that in commercial parallel relational databases (composed of Query Planner, Query Coordinator and Query Exec Engine), thus greatly reducing latency.
4.2 shell for Impala
4.2.1 start shell
Impala-shell
4.2.2 version query
Select version ()
4.3 Operation of the library
4.3.1 query the database
Show databases
4.3.2 create a database
Create database testdb;create database testdb2
Database storage path:
Hdfs dfs-ls / user/hive/warehouse/
4.3.3 using the database
Use testdb
4.3.4 display the current database
Select current_database ()
4.3.5 Delete the database
Drop database testdb
4.4 Table operation
4.4.1 create a table
Create table T1 (x int); create table T3 (id int, word string); create table city (id int,name string,countrycode string,district string,population int)
4.4.2 display tables in the database
Show tables;show tables in testdb;show tables in testdb like 'tweets'
4.4.3 Table structure description
Describe city
4.4.4 modify the table name
Alter table t3 rename to t2
4.4.5 insert data
Insert into T1 values (1), (3), (2), (4); insert into T2 values (1, "one"), (3, "three"), (5, 'five')
4.4.6 data query
Select min (x), max (x), sum (x), avg (x) from T1 10 select word from T1 join T2 on (t 1.x = t2.id)
5 sentry
5.1 enable permissions
5.1.1 enable permissions
Hive/Impala > Configuration > Service-Wide > Sentry Service > Select "sentry"
5.1.2 specify authentication server
Hive > Configuration > Service-Wide > Advanced > Server Name for Sentry Authorization (hive.sentry.server) > fill in the sentry server name or IP address
5.1.3 set up privileged users
Hive > Configuration > Service-Wide > Security > Bypass Sentry Authorization Users (sentry.metastore.service.users) > enter the bypassed linux user name (hive,impala,hue,hdfs, etc.)
5.1.4 configure the proxy user for Hive
HDFS > Configuration > Service-Wide > Proxy > Hive Proxy User Groups (hadoop.proxyuser.hive.groups) > fill in the linux user name of the agent (hive,impala,hue,hdfs, etc.)
5.1.5 restart the service
Restart the service of Hive/Impala
5.2 authorization
5.2.1 create database users and groups
Groupadd gp1useradd user1-G gp1useradd user2-G gp1
5.2.2 switching execution user
Su-impala
5.2.3 create a database
Switch to hive shell
Hive
New library
Create database testdb
Exit hive shell
Quit
5.2.4 creating roles
Switch to impala shell
Impala-shell
Create a role
Create role ro1
5.2.5 confirm the role created
Show roles
5.2.6 Association of user groups and roles
Grant role ro1 to group gp1
5.2.7 role authorization
Grant all on database testdb to role ro1
Refer to the material:
=
Docs:
-
Http://hadoop.apache.org/docs/current/
Hadoop Common Guide:
-
Http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html
File System Shell Guide:
Http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#Overview
MapReduce Common Guide:
-
Http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html
Hive Docs
-
Http://hive.apache.org
LanguageManual:
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual
GettingStarted:
Https://cwiki.apache.org/confluence/display/Hive/GettingStarted
User Documentation:
Https://cwiki.apache.org/confluence/display/Hive/Home#Home-UserDocumentation
Impala Docs
-
Impala SQL
Http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_langref_sql.html#langref_sql
Impala Tutorials
Http://www.cloudera.com/documentation/enterprise/latest/topics/impala_tutorial.html
Impala Explore
Http://www.cloudera.com/documentation/enterprise/latest/topics/impala_tutorial.html#tutorial_explore
Sentry Docs
-
Overview of Impala Security
Http://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_security.html#security
Enabling Sentry Authorization for Impala
Http://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_authorization.html#authorization
Impala Grant
Http://www.cloudera.com/documentation/enterprise/5-6-x/topics/impala_grant.html#grant
Hive Grant
Http://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_hive_sql.html#concept_c2q_4qx_p4__col_level_auth_sentry
Disabling Hive CLI
Http://www.cloudera.com/documentation/enterprise/5-6-x/topics/sg_sentry_overview.html
= =
Other references:
= =
The concept of ETL:
-
Http://www.cnblogs.com/elaron/archive/2012/04/09/2438372.html
Introduction to Apache Sentry Architecture
Http://blog.javachen.com/2015/04/29/apache-sentry-architecture.html
Enable Kerberos authentication
Http://www.cloudera.com/documentation/enterprise/latest/topics/cm_sg_intro_kerb.html#xd_583c10bfdbd326ba--6eed2fb8-14349d04bee--76dd
Introduction to the architecture of Impala
Http://www.mutouxiaogui.cn/blog/?p=319
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.