How to realize permission Control at compile time in MLSQL 07/13 Update SLTechnology News&Howtos

How to realize permission Control at compile time in MLSQL

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article shows you how to achieve compile-time access control in MLSQL, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

The importance of access control to MLSQL can be said to be a lifeline. MLSQL needs to face a variety of resource access, such as MySQL, Oracle,HDFS,Hive,Kafka,Sorl,ElasticSearch,Redis,API,Web, and so on. Different users have different permissions on these data sources (as well as tables and columns).

The traditional model is that each user needs to have a proxy user and then go to each data source to authorize the proxy user. This may seem like a hassle, but in practice, it is basically difficult to implement. Different data sources are in different teams, so the whole application process may take days or even weeks.

If the above questions are discouraging, then for companies that use Hive as positions, access to HIve permissions may be even more desperate. The authorization mode of Hive follows Linux users, that is, whoever the Spark initiates user will have access, which is completely unfeasible for multi-tenant MLSQL applications. For example, it is sparkUser who starts Spark, but the real executor may actually be Zhang San, Li Si, and so on. Hive doesn't know who did it, only that it was sparkUser.

There is another point that you may feel with emotion:

We finally wrote a script, ran for an hour, suddenly the script failed, a look, line 350 where the access to the data source permissions are insufficient. This is really exasperating.

Here comes the problem.

So, how do you know if all the resources involved in the script are authorized before the script runs?

The answer is: yes

Aside from the topic: the title is not rigorous, because MLSQL is essentially an interpretive execution language, does not need to compile, the better title is [access control when parsing].

If MLSQL turns on permission verification, he scans the entire script and then extracts the necessary information, which contains the details of various data sources, so that you can know whether you have accessed the unauthorized database table before running it. So how does MLSQL do it? Let's look at the following information:

Connect jdbc wheredriver= "com.mysql.jdbc.Driver" and url= "jdbc:mysql://$ {ip}: ${host} / db1?$ {MYSQL_URL_PARAMS}" and user= "${user}" and password= "${password}" as db1_ref;load jdbc.`db1 _ ref .people`as people;save append people as jdbc.`db1 _ ref.spam`

Because MLSQL requires any data source to be loaded using the load statement, when parsing the load statement, MLSQL knows that the user is now accessing the data source access based on the JDBC protocol, and he has obtained this information through url:

Db: db1

Table: people

OperateType: load

SourceType: mysql

TableType: JDBC

Of course, the user of this script will also write a spam table, and the information will also be extracted:

Db: db1

Table: people

OperateType: save

SourceType: mysql

TableType: JDBC

Then there is a temporary table people, so the script has a total of three table information, which is then sent to AuthCenter for judgment. AuthCenter will tell MLSQL that table is not authorized for the current user, and if an unauthorized table is found, MLSQL will directly throw an exception. Throughout the process, no physical plan is executed at all, just information extraction from the script.

In MLSQL, we cannot access the hive table in the select statement, we can only load it through the load statement. For example, the following sentence will report an error:

Select * from public.abc as table1

We do not have access to the public.abc library in the select statement. If you need to use it, you can do it in the following ways:

Load hive.`public.abc`as abc;select * from abc as table1

How to implement column level control

When parsing the load statement, MLSQL asks the current user which tables are accessed and which columns are authorized, and then rewrites the last load statement to provide a new view that only the user is authorized.

The above is how to achieve compile-time access control in MLSQL. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.