Example Analysis of big data query engine Presto 04/14 Update SLTechnology News&Howtos

Example Analysis of big data query engine Presto

2025-04-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is to share with you the content of the sample analysis of big data query engine Presto. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Presto is exquisitely designed, can deal with large amounts of data, maximize the use of hardware performance, the calculation is all completed in memory, a good use of high-speed network for data scheduling. The performance is basically 10 times that of Hive.

Support the direct use of presto-jdbc driver to complete Java application development.

Data transmission, node communication, heartbeat sensing, computing monitoring, computing scheduling and computing distribution in Presto clusters are all implemented based on RESTful services, so RESTful services in Presto are the cornerstone of all Presto services.

The process for submitting query statements by the Presto client:

1. Get the SQL statement to be executed from the specified file, command line argument, or Cli window

2. Assemble the resulting SQL statement into a RESTful request, send it to Coordinator, and process the returned response.

3. Cli will continuously cycle to read the query results in batches and display them dynamically on the screen until the query results are fully displayed.

Submit a query to the Presto cluster, and the whole process goes through four stages:

1. Submit query: the client submits a SQL statement to the RESTful service provided by Coordinator

2. Generate a query execution plan: Coordinator generates a response query execution plan based on the passed SQL statement

3. Query scheduling: according to the generated query execution plan, Coordinator schedules Stage and Task in turn.

4. Query execution: in the end, Coordinator will schedule the most idle Worker to execute the corresponding Task for actual computing tasks.

Presto queues are used to control the concurrency of queries and the number of SQL that can be received, and can be customized according to user, submission source, Session and other information.

Presto supports a variety of data sources, Connector, the most commonly used being Hive Connector.

Hive Connector uses Hive metadata, Coordinator nodes load metadata through Hive Metastore, and Presto computing nodes read HDFS data corresponding to Hive tables.

Kafka Connector supports Apache Kafka 0.8 and above, treats the topics in Apache Kafka as a table, and each message in topics is parsed into a row of data in the table in Presto.

Thank you for reading! This is the end of this article on "sample Analysis of big data query engine Presto". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.