How to use Presto component to realize Cross-data Source Analysis in OLAP 07/12 Update SLTechnology News&Howtos

How to use Presto component to realize Cross-data Source Analysis in OLAP

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

OLAP how to use Presto components to achieve cross-data source analysis, I believe that many inexperienced people do not know what to do, so this paper summarizes the causes of the problem and solutions, through this article I hope you can solve this problem.

1. Overview of Presto 1. Introduction to Presto

Presto is an open source distributed SQL query engine, which is suitable for interactive analysis and query. The amount of data supports GB to PB bytes. Although Presto has the ability to parse SQL, it does not belong to the standard database category.

Presto supports online data queries, including Hive, relational databases, and proprietary data stores. A Presto query can merge data from multiple data sources and can be analyzed across the organization. Presto is mainly used to deal with scenarios where the response time is less than 1 second to a few minutes.

2. Presto architecture

Presto query engine is a distributed system running on multiple servers based on Master-Slave architecture, which is composed of one Coordinator node and several Worker nodes. Coordinator is responsible for parsing SQL statements, generating execution plans, distributing execution tasks to Worker nodes, and Worker nodes are responsible for actually executing query tasks.

Coordinator node

The Coordinator server is used to parse query statements, execute plan analysis and manage Presto Worker nodes, track the activity of each Work and coordinate the execution of query statements. Coordinator models each query, which contains multiple Stage, each Stage is then converted to Task and distributed to a different Worker for execution, and coordinated communication must have a Coordinator node based on the REST-API,Presto installation.

Worker node

Worker is responsible for performing query tasks and processing data. Data is obtained from Connector, and intermediate data is exchanged between Worker. Coordinator gets the result from Worker and returns the final result to the client. When Worker starts, it broadcasts itself and discovers Coordinator, informing Coordinator of its available status, and coordinating communication usually installs multiple Worker nodes based on REST-API,Presto.

Data source adaptation

Presto can adapt to a variety of different data sources, and can connect and interact with data sources. Presto deals with table,Catalog corresponding data sources through fully qualified names of tables, Schema corresponding databases, and Table corresponding data tables.

The smallest data unit processed in Presto is a Page object, a Page object contains multiple Block objects, each Block object is a byte array, stores several rows of a field, and a row of multiple Block crosscuts is a real row of data.

2. Presto installation 1, installation package management [root@hop01 presto] # pwd/opt/presto [root@hop01 presto] # llpresto-cli-0.196-executable.jarpresto-server-0.189.tar.gz [root@hop01 presto] # tar-zxvf presto-server-0.189.tar.gz2, configuration management

Create an etc folder in the presto installation directory and add the following configuration information:

/ opt/presto/presto-server-0.189/etc

Node attribute

Specific environment configuration for each node: etc/node.properties

[root@hop01 etc] # vim node.propertiesnode.environment=productionnode.id=presto01node.data-dir=/opt/presto/data

Configuration content: environment name, unique ID, data directory.

JVM configuration

Command line options for JVM and a list of command line options for starting the Java virtual machine: etc/jvm.config.

[root@hop01 etc] # vim jvm.config-server-Xmx16G-XX:+UseG1GC-XX:G1HeapRegionSize=32M-XX:+UseGCOverheadLimit-XX:+ExplicitGCInvokesConcurrent-XX:+HeapDumpOnOutOfMemoryError-XX:+ExitOnOutOfMemoryError

Configuration Properti

Presto server configuration, each Presto server can act as a coordinator and worker, if a separate machine is used to perform coordination work can provide the best performance on a larger cluster, where PrestoServer is both a coordinator and a worker node: etc/config.properties.

[root@hop01 etc] # vim config.propertiescoordinator=truenode-scheduler.include-coordinator=truehttp-server.http.port=8083query.max-memory=3GBquery.max-memory-per-node=1GBdiscovery-server.enabled=truediscovery.uri= http://hop01:8083

Here coordinator=true indicates that the current Presto instance acts as a coordinator.

Log configuration

[root@hop01 etc] # vim log.propertiescom.facebook.presto=INFO

Catalog attribute

/ opt/presto/presto-server-0.189/etc/catalog

Configure hive adaptation:

[root@hop01 catalog] # vim hive.propertiesconnector.name=hive-hadoop2hive.metastore.uri=thrift://192.168.37.133:9083

Configure MySQL adaptation:

[root@hop01 catalog] # vim mysql.propertiesconnector.name=mysqlconnection-url=jdbc:mysql://192.168.37.133:3306connection-user=rootconnection-password=123456

3. Run the service

Start command

[root@hop01 /] # / opt/presto/presto-server-0.189/bin/launcher run

Startup log

In this way, presto starts successfully.

Client installation 1. Jar package management [root@hop01 presto-cli] # pwd/opt/presto/presto-cli [root@hop01 presto-cli] # llpresto-cli-0.196-executable.jar [root@hop01 presto-cli] # mv presto-cli-0.196-executable.jar presto-cli.jar2, connection MySQL

Java-jar presto-cli.jar-- server ip:9000-- catalog mysql-- schema sq_export after reading the above, have you mastered how to use Presto components in OLAP to achieve cross-data source analysis? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.