In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Introduction: the idea of data virtualization has always been a point of great concern to the Agile big data team, and Moonbox is designed on this basis to provide batch computing service solutions. Today, Moonbox surprised to release the 0.3beta version (review v0.2, please poke here: # Moonbox# Computing Service platform introduction: http://college.creditease.cn/detail/154), read the full text, learn about Moonbox, and see the magic of version 0.3 with the editor.
Project: https://github.com/edp963/moonbox
Release: https://github.com/edp963/moonbox/releases/tag/0.3.0-beta
Documentation: https://edp963.github.io/moonbox/ I, Moonbox positioning
Before we learn about the new version of Moonbox, let's recall the positioning of Moonbox.
Moonbox is a DVtaaS (Data Virtualization as a Service) platform solution. It is based on the design idea of data virtualization and is committed to providing batch computing service solutions. Moonbox is responsible for shielding the physical and usage details of the underlying data sources and providing users with a virtual database-like user experience. Users only need to use a unified SQL language to transparently realize mixed computing and writing across heterogeneous data systems. In addition, Moonbox also provides basic support for data services, data management, data tools, data development and other basic support, which can support more agile and flexible data application architecture and logical warehouse practices.
II. Moonbox function
The idea of data virtualization is a very important design principle of Moonbox. On this basis, Moonbox implements a variety of functions. Let's take a look at what functions Moonbox has:
Multi-tenant
Moonbox established a complete user system and introduced the concept of Organization to divide the user space. The system administrator ROOT account can create multiple Organization and specify the manager (SA) of the Organization in the Organization. SA can be one or more, and SA is responsible for creating and managing ordinary users.
Moonbox abstracts the ability of ordinary users into six attributes, namely, whether you can execute Account management statements, whether you can execute DDL statements, whether you can execute DCL statements, whether you have the ability to authorize other users to execute Account statements, whether you have the ability to authorize other users to execute DDL statements, and whether you have the ability to authorize other users to execute DCL statements. Through the free combination of attributes, we can build a user system model that meets a variety of roles and needs, and realize multi-tenancy.
Extended SQL
Moonbox unifies the query language as Spark SQL, the bottom layer uses Spark for calculation, and extends a set of DDL and DCL statements. It includes the creation, deletion and authorization of users, access authorization of data tables or data columns, mounting and unmounting of physical data sources or data tables, creation and deletion of logical databases, creation and deletion of UDF/UDAF, creation and deletion of scheduled tasks, etc.
Optimization strategy
Moonbox carries out mixed calculation based on Spark, and Spark SQL supports multiple data sources, but Spark SQL only pushes down the project and filter operators when pulling data from the data source, and does not consider the computing power characteristics of the data source.
For example, Elasticsearch is very friendly to aggregation operations. If the aggregation operation can be pushed down to Elasticsearch for calculation, it will be much faster than pulling all the data back to Spark calculation.
For example, if the limit operator is pushed down to the data source, it can greatly reduce the amount of data returned and save the time of pulling data and computing.
Moonbox further optimizes the LogicalPlan after Spark Optimizer optimization, splits the subtree that can be pushed down according to the rules, mapping the subtree into a data source query language, and pulls the push result back to Spark to participate in further calculation.
In addition, if LogicalPlan can push down the calculation as a whole, then Moonbox will not use Spark for calculation, but will directly use the data source client to run the query statement from LogicalPlan mapping, so as to reduce the overhead of starting distributed jobs and save distributed computing resources.
Column permission control
Moonbox defines DCL statements to implement column-level permission control. The Moonbox administrator authorizes the data table or data column to the user through the DCL statement, and Moonbox saves the permission relationship between the user and the table and column in catalog. When a user uses a SQL query, it will be intercepted to analyze whether unauthorized tables or columns are referenced in the parsed LogicalPlan of SQL, and if so, report an error and return it to the user.
Various forms of UDF/UDAF
Moonbox not only supports the creation of UDF/UDAF in the form of jar packages, but also supports creation in the form of source code, including the Java language and Scala language, which brings convenience to UDF development verification.
Scheduled task
Moonbox provides the function of scheduled tasks. Users use DDL statements to define scheduled tasks, define scheduling policies in the form of crontab expressions, and embed quartz in the background to schedule scheduled tasks.
Multiple client
Moonbox supports access by command line tools, JDBC, Rest, ODBC, etc.
Multiple data source support
Moonbox supports multiple data sources, including MySQL, Oracle, SQLServer, Clickhouse, Elasticsearch, MongoDB, Cassandra, HDFS, Hive, Kudu, etc., and supports custom extensions.
Two task modes
Moonbox supports both Batch and Interactive task modes. Batch mode supports Spark Yarn Cluster Mode,Interactive mode supports Spark Local and Spark Yarn Client Mode.
Cluster working mode
Moonbox works as a master-slave cluster and supports master master / slave switching.
III. Moonbox_v0.3 VS v0.2
Moonbox_v0.3 has made several important changes based on v0.2, including:
Get rid of redis dependency
V0.2 writes the query result to Redis and the client gets the result from Redis; v0.3 returns the result directly to the client.
Change the mode of data transmission
The v0.2 client obtains the result data in the way of rest; v0.3 obtains the result data in the way of netty plus protobuf.
Moonbox Master chooses master strategy reconfiguration
Change the Moonbox Master selection from akka singleton to use zk for selection and information persistence.
Decoupling Moonbox Worker from Spark
In v0.2, running Spark APP Driver;v0.3 directly in Worker changes to running Spark APP Driver in a new process, so that Worker is decoupled from Spark, and a Worker node can run multiple Spark APP Driver and other APP.
IV. Typical scenarios of Moonbox
Finally, in order to let you know more about Moonbox, let's introduce several typical Moonbox application scenarios.
Construction of real-time ETL based on DBus, Wormhole, Kudu and Moonbox
DBus writes database changes to the Kafka,Wormhole consumption Kafka in real time for streaming, and other lookup tables on the stream form large wide tables, or execute part of the processing logic to write to Kudu. Use Moonbox to query Kudu to save or display the results.
Batch operation
Batch jobs can be run using batch job scripts provided by Moonbox, asynchronous rest interfaces, or scheduled tasks.
Visualization of ad hoc query based on Davinci and Moonbox
Put the JDBC driver of Moonbox into Davinci lib, and you can query Moonbox just like other databases, and display the results graphically.
SAS query
SAS users can use ODBC to connect to Moonbox for data query, and can push the calculation directly to Moonbox for distributed computing.
Convenient data manipulation toolbox
Because Moonbox can dock a variety of data sources, and can use Spark for mixed calculation between multiple data sources, so you can use Moonbox for a variety of convenient operations. For example, using a SQL, you can import the data of one table from one data source into another data source, compare the data differences between the two tables, and so on.
For more use scenarios, you can experience mining on your own!
As more and more attention has been paid to data virtualization, a reliable tool has become a common need for everyone to explore the world of data virtualization. Moonbox is such a tool. What are you waiting for? use it quickly.
Open source address of the project:
DBus: https://github.com/BriData/DBus
Wormhole: https://github.com/edp963/wormhole
Moonbox: https://github.com/edp963/moonbox
Davinci: https://github.com/edp963/davinci
Author: Wang Hao
Source: agile big data
Yixin Institute of Technology
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.