Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to install and use Presto

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to install and use Presto". The editor shows you the operation process through an actual case. The operation method is simple, fast and practical. I hope this article "how to install and use Presto" can help you solve the problem.

What is Presto?

Background knowledge: shortcomings of Hive and background of Presto

Hive uses MapReduce as the underlying computing framework and is designed for batch processing. However, with more and more data, a simple data query using Hive may take several minutes to several hours, which obviously can not meet the needs of interactive query. Presto is a distributed SQL query engine, which is designed for high-speed, real-time data analysis. It supports standard ANSI SQL, including complex queries, aggregation, join, and window functions. There are two points worth exploring, the first is the architecture, and the second is how to achieve low latency to support timely interaction.

What is PRESTO?

Presto is an open source distributed SQL query engine, which is suitable for interactive analytical queries. The amount of data is supported from GB to PB bytes. Presto is designed and written entirely to solve the problem of interactive analysis and processing speed of commercial data warehouses of the size of Facebook.

What can it do?

Presto supports online data queries, including Hive, Cassandra, relational databases and proprietary data stores. A single Presto query can merge data from multiple data sources and can be analyzed across the entire organization. Presto targets the needs of analysts, who expect the response time to be less than 1 second to a few minutes. Presto ends the dilemma of data analysis by using either a fast, expensive business solution or a slow "free" solution that consumes a lot of hardware.

Who is using it?

Facebook uses Presto for interactive queries for multiple internal data stores, including 300PB's data warehouse. Every day, more than 1000 Facebook employees use Presto, execute more than 30000 queries, and scan more data than 1PB. Leading Internet companies, including Airbnb and Dropbox, are using Presto.

II. The architecture of Presto

Presto is a distributed system running on multiple servers. The full installation includes a coordinator and multiple worker. The query is submitted by the client, from the Presto command line CLI to coordinator. Coordinator parses, parses and executes the query plan, and then distributes the processing queue to worker.

The Presto query engine is a Master-Slave architecture, which consists of a Coordinator node, a Discovery Server node, and multiple Worker nodes, and the Discovery Server is usually embedded in the Coordinator node. Coordinator is responsible for parsing SQL statements, generating execution plans, and distributing execution tasks to Worker nodes for execution. The Worker node is responsible for actually performing the query task. After the Worker node starts, it registers with the Discovery Server service, and Coordinator gets the working Worker node from Discovery Server. If Hive Connector is configured, you need to configure a Hive MetaStore service to provide Hive meta-information to Presto, and the Worker node interacts with HDFS to read data.

Third, install Presto Server

Installation media

12presto-cli-0.217-executable.jarpresto-server-0.217.tar.gz

Install and configure Presto Server

1. Extract the installation package

1tar-zxvf presto-server-0.217.tar.gz-C ~ / training/

2. Create an etc directory

12cd ~ / training/presto-server-0.217/mkdir etc

3. The following configuration files need to be included in the etc directory

12345Node Properties: configuration information of node JVM Config: JVM configuration parameter of command line tool Config Properties: configuration parameter of Presto Server Catalog Properties: configuration parameter of data source (Connectors) Log Properties: log parameter configuration

Edit node.properties

1234567cluster name. All Presto nodes in the same cluster must have the same cluster name. Node.environment=production # unique identification of each Presto node. The node.id for each node must be unique. The node.id of each node must remain the same during the restart or upgrade of Presto. If you install multiple Presto instances on one node (for example, multiple Presto nodes on the same machine), each Presto node must have a unique node.id. The location of the node.id=ffffffff-ffff-ffff-ffff-ffffffffffff # data storage directory (path on the operating system). Presto will store dates and data in this directory. Node.data-dir=/root/training/presto-server-0.217/data

Edit jvm.config

Because OutOfMemoryError will cause the JVM to be in an inconsistent state, our general approach to this error is to collect information from the dump headp (for debugging) and then force the process to terminate. Presto compiles queries into bytecode files, so Presto generates a lot of class, so we should increase the size of the Perm area (mainly store class in Perm) and allow Jvm class unloading.

12345678-server-Xmx16G-XX:+UseG1GC-XX:G1HeapRegionSize=32M-XX:+UseGCOverheadLimit-XX:+ExplicitGCInvokesConcurrent-XX:+HeapDumpOnOutOfMemoryError-XX:+ExitOnOutOfMemoryError

Edit config.properties

Configuration of coordinator

12345678 coincidental truenodemurayer12345678 coexistatorfalsehttpmurserver.http.portroom80query.maxMMUFYF5GBquery.maxMUBQUR. MaxMUBQUR. MaxMUBQUR 1GBquery.maxMUTALUTALUTALUTALUTALUTALYUTALUTALUTALUTALYPUTALYPUTALYPUTALUTALUTALUTALUTALYPUTALLY NODEFOR 2GBdiscoveryMusserver.enableddiscovery. Uri = http://192.168.157.226:8080

Configuration of workers

123456 coincidental falsehttpmurserver.http.portcolors 80query.maxmurycolors 5GBquery.maxmurycolors 1GBquery.maxmurycolor 1GBquery.maxMafafaycolor nodeflows2GBdiscovery.uri = http://192.168.157.226:8080

If we want to test on a stand-alone machine and configure both coordinator and worker, please use the following configuration:

12345678 coincidental truenodemurayer12345678 coexistatortruehttpshaver.http.portframes 80query.maxMMUFYF5GBquery.maxMUBQUR. MaxMUBQUR 1GBquery.maxMUTALUTALUTALUTALUTALUTALUTALYMAYPUBLYUBER Node2GBdiscoveryMust.enabledtruediscovery.uri = http://192.168.157.226:8080

Parameter description:

Edit log.properties

Configure the log level.

1com.facebook.presto=INFO

Configure Catalog Properties

Presto accesses data through connectors. These connectors are mounted on the catalogs. Connector can provide all the schema and tables in a catalog. For example, Hive connector maps the database of each hive to a schema, so if the hive connector is mounted to a catalog named hive, and there is a table named clicks in the web of hive, the table can be accessed through hive.web.clicks in Presto. Complete the registration of catalogs by creating a catalog properties file in the etc/catalog directory. If you want to create a connector for a hive data source, you can create an etc/catalog/hive.properties file with the following contents, and finish mounting a hiveconnector on the hivecatalog.

1234567 indicates the configuration file path hive.config.resources=/root/training/hadoop-2.7.3/etc/hadoop/core-site.xml,/root/training/hadoop-2.7.3/etc/hadoop/hdfs-site.xml of the address hive.metastore.uri=thrift://192.168.157.226:9083 # hadoop configured in the version of hadoop connector.name=hive-hadoop2 # hive-site

Note: to access Hive, you need to start the MetaStore of Hive: hive-- service metastore

4. Start Presto Server1./launcher start 5. Run presto-cli

Download: presto-cli-0.217-executable.jar

Rename the jar package and increase execution permissions

12cp presto-cli-0.217-executable.jar presto chmod axix presto

Connect Presto Server

1./presto-- server localhost:8080-- catalog hive-- schema default VI. Use Presto

Use Presto to operate Hive

Web Console using Presto: Port: 8080

Use JDBC to operate Presto

1. Maven dependencies to be included

12345 com.facebook.presto presto-jdbc 0.217

2. JDBC code

*

This is the end of the content about "how to install and use Presto". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report