In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to realize the big data development environment based on Docker". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to realize the big data development environment based on Docker".
Big data's development relies heavily on the running environment and data. For example, when developing Spark applications, they often rely on Hive, but the local development environment does not have Hive, so it is not efficient to copy the code between the local and the server. I think using Docker to build a stand-alone big data cluster locally, and then copying the code into a container for testing can improve this situation. I have explored this idea myself. Hadoop, Hive, Spark and other components are installed in this image, which can basically meet the requirements, but there are also some problems, such as the need to adjust the configuration to maintain consistency with the production environment, although it can be done, but also a lot of work.
In fact, both CDH and HDP provide similar stand-alone images, and the version of the components in HDP is relatively new and consistent with the company's technology stack, so explore, if the experience is better, you will use it for related development in the future.
Sandbox acquires system requirements
Install Docker 17.09or later
More than 10 gigabytes of memory needs to be configured for Windows and Mac,Docker
Script download and execution
You can visit the https://www.cloudera.com/downloads/hortonworks-sandbox/hdp.html page in the browser to download it, or you can download it directly from the command line with wget:
$wget-- no-check-certificate https://archive.cloudera.com/hwx-sandbox/hdp/hdp-3.0.1/HDP_3.0.1_docker-deploy-scripts_18120587fc7fb.zip
Extract and execute the script:
$unzip HDP_3.0.1_docker-deploy-scripts_18120587fc7fb.zipArchive: HDP_3.0.1_docker-deploy-scripts_18120587fc7fb.zip creating: assets/ inflating: assets/generate-proxy-deploy-script.sh inflating: assets/nginx.conf inflating: docker-deploy-hdp30.sh$ sh docker-deploy-hdp30.sh
After execution, you will start to pull the docker image. You need to download dozens of gigabytes of data and wait patiently.
Sandbox verification
After the script is executed, using docker ps, you can see that two containers are started:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMESdaf0f397ff6c hortonworks/sandbox-proxy:1.0 "nginx-g 'daemon of..." About an hour ago Up About an hour 0.0.0.0 b925f92f368d hortonworks/sandbox-hdp:3.0.1 1080-> 1080/tcp,... b925f92f368d hortonworks/sandbox-hdp:3.0.1 "/ usr/sbin/init" About an hour ago Up About an hour 22/tcp, 4200/tcp, 8080/tcpsandbox-hdp
Ignore sandbox-proxy as the container and pay attention to sandbox-hdp, when all the components of HDP have been started.
UI verification
Because port mapping has been done, if you want to access a specific UI, you can directly access the port corresponding to the localhost. You can first visit the Splash page of the localhost:1080:
Here is a wizard. Click the Launch Dashboard on the left to open the Ambari login page and the Tutorial page of HDP. Clicking the Quick Links on the right will open the next wizard, including jump links for Ambari, Zeppelin, Atlas, Ranger and other components:
The login password of Ambari can be obtained by referring to the https://www.cloudera.com/tutorials/learning-the-ropes-of-the-hdp-sandbox.html page, and different users can be selected according to different uses:
User role password adminAmbari Admin initializes the maria_devSpark and SQL Developermaria_devraj_opsHadoop Warehouse Operatorraj_opsholger_govData Stewardholger_govamy_dsData Scientistamy_ds using the ambari-admin-password-reset command
Readers can verify the situation of Web UI one by one. Let's verify the underlying storage and computing.
Functional verification
The command line enters the container:
Docker exec-it sandbox-hdp bashHDFS verification
A simple ls:
[root@sandbox-hdp /] # hdfs dfs-ls / Found 13 itemsdrwxrwxrwt-yarn hadoop 0 2018-11-29 17:56 / app-logsdrwxr-xr-x-hdfs hdfs 0 2018-11-29 19:01 / appsdrwxr-xr-x-yarn hadoop 0 2018-11-29 17:25 / atsdrwxr-xr-x-hdfs hdfs 0 2018-11-29 17:26 / atsv2drwxr-xr-x-hdfs Hdfs 0 2018-11-29 17:26 / hdpdrwx--livy hdfs 0 2018-11-29 17:55 / livy2-recoverydrwxr-xr-x-mapred hdfs 0 2018-11-29 17:26 / mapreddrwxrwxrwx-mapred hadoop 0 2018-11-29 17:26 / mr-historydrwxr-xr-x-hdfs hdfs 0 2018-11-29 18:54 / rangerdrwxrwxrwx- Spark hadoop 0 2021-02-06 07:19 / spark2-historydrwxrwxrwx-hdfs hdfs 0 2018-11-29 19:01 / tmpdrwxr-xr-x-hdfs hdfs 0 2018-11-29 19:21 / userdrwxr-xr-x-hdfs hdfs 0 2018-11-29 17:51 / warehouseHive Verification
Sandbox already has some test data built into it. Just select it.
Start the hive command line first:
[root@sandbox-hdp /] # hive
See which databases are available:
0: jdbc:hive2://sandbox-hdp.hortonworks.com:2 > show databases;+-+ | database_name | +-+ | default | | foodmart | | information_schema | | sys | +-+
Select foodmart and see which tables are available:
0: jdbc:hive2://sandbox-hdp.hortonworks.com:2 > use foodmart;0: jdbc:hive2://sandbox-hdp.hortonworks.com:2 > show tables +-- + | tab_name | +-- + | account | |. | +- -+
You can see that there are many tables, so we choose the account table:
0: jdbc:hive2://sandbox-hdp.hortonworks.com:2 > select * from account limit 1 +- -+ + | account.account_id | account.account_parent | account.account_description | account.account_type | account.account_rollup | account.custom_members | +-+-+ -- +-- + | 1000 | NULL | | Assets | Asset | ~ | | +-| -+
Very OK.
Spark verification
Query the account table after starting spark-sql:
Spark-sql > select * from foodmart.account limit 1 UnresolvedRelation error in query: Table or view not found: `foodmart`.`roomt`; line 1 pos 14 politic GlobalLimit 1 situation-'LocalLimit 1 challenge -' Project [*] +-'UnresolvedRelation `foodmart`.`roomt`
strange
Spark-sql > show databases;default
Only the default library.
After doing some search, it seems that great changes have taken place in the Hive table accessed by Spark after HDP 3.0.The verification of Spark needs further research.
Sandbox Management stop Sandbox
Use the docker stop command to:
Docker stop sandbox-hdpdocker stop sandbox-proxy restart Sandbox
Use the docker start command to:
Docker start sandbox-hdpdocker start sandbox-proxy cleans up Sandbox
First stop and then remove:
Docker stop sandbox-hdpdocker stop sandbox-proxydocker rm sandbox-hdpdocker rm sandbox-proxy
If you want to delete a mirror:
Docker rmi hortonworks/sandbox-hdp:3.0.1 thank you for reading, the above is the content of "how to achieve the big data development environment based on Docker". After the study of this article, I believe you have a deeper understanding of how to achieve this problem in the big data development environment based on Docker, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.