Presto Cluster installation & Integration of hive | mysql | jdbc 07/09 Update SLTechnology News&Howtos

Presto Cluster installation & Integration of hive | mysql | jdbc

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Presto is a distributed system running on multiple servers. The full installation includes a coordinator (scheduling node) and multiple worker. The query is submitted by the client, from the Presto command line CLI to coordinator. Coordinator parses, parses and executes the query plan, and then distributes the processing queue to worker.

Table of contents:

Prepare the environment before construction

Cluster plan

Connector

Installation steps

Configuration file

Run presto

Integrate hive testing

Integrate mysql testing

Integrate jdbc testing

1. Prepare the environment before construction

CentOS 6.7

Java8

Python3.4.4

Hadoop2.6.4

two。 Cluster plan

Hd1 (192.168.174.131): scheduling node (coordinator)

Hd2 (192.168.174.132): worker node

Hd3 (192.168.174.133): worker node

3. Connector

Presto supports reading Hive data from the following versions of Hadoop: supports the following file types: Text, SequenceFile, RCFile, ORC

Apache Hadoop 1.x (hive-hadoop1)

Apache Hadoop 2.x (hive-hadoop2)

Cloudera CDH 4 (hive-cdh5)

Cloudera CDH 5 (hive-cdh6)

In addition, remote Hive metadata is required. Local or embedded mode is not supported. Presto doesn't use MapReduce, just HDFS

4. Stand-alone installation steps

Download presto-server-0.100, (download address: https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.100/presto-server-0.100.tar.gz) or: link: http://pan.baidu.com/s/1qYTvTwg password: 4xz6

Upload presto-server-0.100.tar.gz to the linux host (hd1), and the decompressed file directory structure is as follows (called installation directory): Presto needs a data directory for storing logs, local metadata, and so on. It is recommended that you create a data directory outside the installation directory. This makes it convenient for Presto to upgrade, such as / presto/data

5. Configuration file

Create an etc directory in the installation directory and place the following configuration files in the etc directory:

1. Config.properties: Presto service configuration

2. Node.properties: environment variable configuration, specific configuration for each node

3. Jvm.config: command line options for Java virtual machines

4. Log.properties: allows you to set different log levels according to different log structures

5. Catalog directory: per connector configuration (data sources)

Config.properties

Contains all the configuration information for Presto server. Each Presto server is both a coordinator and a worker. However, in large clusters, for performance considerations, it is recommended to use a single machine as the coordinator. The etc/config.properties of a coordinator should at least contain the following information:

Coordinator=truenode-scheduler.include-coordinator=falsehttp-server.http.port=18080task.max-memory=1GBdiscovery-server.enabled=truediscovery.uri= http://192.168.174.131:18080

1. Coordinator: specify whether to operate and maintain the Presto instance as a coordinator (receive queries from the client to manage the execution of each query)

2. Node-scheduler.include-coordinator: whether scheduling work in coordinator services is allowed, for large clusters, Presto server on a node as both coordinator and worke will degrade query performance. Because if a server is used as a worker, then most of the resources will be consumed by worker, so there will not be enough resources to schedule critical tasks, manage and monitor query execution

3. Http-server.http.port: specify the port of HTTP server. Presto uses HTTP for all internal and external communications

4. Task.max-memory=1GB: the maximum memory used by a single task (an execution portion of a query plan is executed on a specific node). This configuration parameter limits the number of Group in the GROUP BY statement, the size of the right association table in the JOIN association, the number of rows in the ORDER BY statement, and the number of rows processed in a window function. This parameter should be adjusted according to the number of concurrent queries and the complexity of queries. If this parameter is set too low, many queries will not be executed, but if set too high will cause JVM to run out of memory

5. Discovery-server.enabled:Presto finds all the nodes in the cluster through the Discovery service. In order to find all the nodes in the cluster, each Presto instance registers itself with the discovery service at startup. To simplify deployment and do not want to add a new service process, Presto can run a Discovery service embedded in coordinator. The embedded Discovery service and Presto share HTTP server and use the same port

6. URI of discovery.uri:Discovery server. Because the Discovery service embedded in Presto coordinator is enabled, this uri is the uri of Presto coordinator. Note: this URI must not end with "/"

Node.properties

Contains specific configuration information for each node. A node is a Presto instance installed on a machine. The etc/node.properties configuration file contains at least the following configuration information

Node.environment=testnode.id=bigdata_node_worker_hd1node.data-dir=presto/data

Node.environment: cluster name. All Presto nodes in the same cluster must have the same cluster name.

Node.id: a unique indication of each Presto node. The node.id for each node must be unique. The node.id of each node must remain the same during the restart or upgrade of Presto. If you install multiple Presto instances on a node (for example, multiple Presto nodes on the same machine), each Presto node must have a unique node.id

Node.data-dir: the location of the data storage directory (the path on the operating system) where Presto will store dates and data

Jvm.config

Contains a series of command line options that you need to use when starting JVM. The format of this configuration file is: a series of options, each line configured with a separate option. Because these options are not used in the shell command. So even if each option is separated by spaces or other delimiters, the java program does not separate these options, but treats them as a command-line option, with the following information:

-server-Xmx16G-XX:+UseConcMarkSweepGC-XX:+ExplicitGCInvokesConcurrent-XX:+CMSClassUnloadingEnabled-XX:+AggressiveOpts-XX:+HeapDumpOnOutOfMemoryError-XX:OnOutOfMemoryError=kill-9% p-XX:ReservedCodeCacheSize=150M

Log.properties

This configuration file allows you to set different log levels according to different log structures. Each logger has a name (usually the fully marked class name of a class that uses logger). Loggers passes "." in the name. To represent the hierarchy and integration relationships, the information is as follows:

Com.facebook.presto=DEBUG

Configure the log level, similar to log4j. Four levels: DEBUG,INFO,WARN,ERROR

Catalog Properties

Complete the registration of catalogs by creating a catalog properties file in the etc/catalog directory. For example, you can create an etc/catalog/jmx.properties file with the following contents to mount a jmxconnector on jmxcatalog.

Connector.name=jmx

Create a hive.properties under the etc/catalog directory with the following information:

Connector.name=hive-hadoop2hive.metastore.uri=thrift://192.169.168.131:9083hive.config.resources=/root/apps/hadoop/etc/hadoop/core-site.xml,/root/apps/hadoop/etc/hadoop/hdfs-site.xmlhive.allow-drop-table=true

The above is the stand-alone deployment of presto, which has been completed so far.

6. Cluster installation steps

Copy the presto-server-0.100 from hd1 to hd2,hd3

Scp-r / root/apps/presto-server-0.100 root@hd2:/root/apps/scp-r root/apps/presto-server-0.100 root@hd3:/root/apps/

Modify the configuration file in hd2:

Config.properties

Coordinator=falsehttp-server.http.port=18080task.max-memory=1GBdiscovery-server.enabled=truediscovery.uri= http://192.168.174.131:18080

Node.properties

Node.environment=testnode.id=bigdata_node_worker_hd2node.data-dir=presto/data

Modify the configuration file in hd3

Config.properties

Coordinator=falsehttp-server.http.port=18080task.max-memory=1GBdiscovery-server.enabled=truediscovery.uri= http://192.168.174.131:18080

Node.properties

Node.environment=testnode.id=bigdata_node_worker_hd3node.data-dir=presto/data

At this point, the presto cluster is configured.

7. Run presto

Start presto in the presto-server-0.100/bin directory of hd1,hd2,hd3:

. / launcher start

In Presto, you can start as a background process using the following command:

Bin/launcher start

Or run in the foreground, you can view the specific log

Bin/launcher run

Stop service process command

Bin/laucher stop

View service process commands

Bin/laucher status

View process: ps-aux | grep PrestoServer or jps

It can also be viewed through the browser interface: http://192.168.174.131:18080

8. Integrate hive testing

If you want to query the data connected to hive, you need to start hive's metastore first.

Startup mode:

Bin/hive-- service metastore # or background startup: bin/hive-- service metastore 2 > & 1 > > / var/log.log & # background startup, closing shell connection still exists: nohup bin/hive-- service metastore 2 > & 1 > > / var/log.log &

If startup fails, check to see if there is the following configuration of metastore in hive-site.xml. If not, add this paragraph before starting metasotre.

Hive.metastore.uris thrift://192.168.174.131:9083

Then download presto-cli-0.100-executable.jar:Presto CLI to provide users with an interactive terminal window for query. CLI is an executable JAR file, which means that you can use CLI like the UNIX terminal window. After downloading the https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.100/presto-cli-0.100-executable.jar file, rename it to presto, use the chmod + x command to set executable permissions, and execute the command:

The ip and port of the following command are the same as those in config.properties

. / presto-- server 192.168.174.131 catalog hive-- schema default-- debug

Check the table in the hive default library in hive, and the result is as follows

Query the table in hive default library in hive, as shown in the figure:

Query user table information:

At this point, there will also be a corresponding record on the interface:

Exit command: quit or exit

9. Integrate mysql testing

Similar to hive, create a new file: mysql.properties file in the etc/ directory of hd1

Connector.name=mysqlconnection-url=jdbc:mysql://192.168.174.131:3306connection-user=rootconnection-password=123456

Then copy the mysql.properties decibels to the / etc directory of hd2 and hd3 and restart the PrestoServer service.

Connectivity testing:

. / presto-server localhost:18080-catalog mysql-schema test-debug

Commonly used writing methods:

SHOW SCHEMAS FROM mysql;# query database list SHOW TABLES FROM mysql.test;# query data table under the specified database SELECT * FROM mysql.test.user; query specified data table data 10. Integrate jdbc testing

Code connection testing, introducing dependencies in pom.xml:

Com.facebook.prestopresto-jdbc0.100

The main method tests the connection:

Package com.presto.test;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.sql.Statement; public class TestPrestoJdbc {public static void main (String [] args) throws Exception {Class.forName ("com.facebook.presto.jdbc.PrestoDriver"); Connection connection = DriverManager.getConnection ("jdbc:presto://192.168.174.131:18080/hive/default", "root", null); Statement stmt = connection.createStatement (); ResultSet rs = stmt.executeQuery ("show tables") While (rs.next ()) {System.out.println (rs.getString (1));} rs.close (); connection.close ();}}

Running result:

Compare the command line:

The above is the use of clustering and building of presto. Have you learned it?

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.