Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to build big data Exploration platform in big data's Governance

2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article introduces how to build big data exploration platform in big data governance. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

In data governance, the value of data exploration service is often ignored in the initial stage, but with the increase of business and the increase of analysts, the value of data exploration service will become greater and greater.

A successful data management platform should not only provide a variety of data analysis tools and a variety of data sources, but also provide the ability of data exploration.

Why is data Discovery Service important?

Imagine that as a data scientist, he has just been given the new task of building a machine learning model to analyze business problems. The first instinct of people who process data is to find any meaningful information that can help their analysis process. The following problems usually arise in this process:

What kind of data can / should I use?

Where can I find the data?

Who should I ask for data access?

Can I trust the data we have?

What is the real-time and quality of the data we have?

Who else is using the data?

A world without data exploration services

Data scientists spend up to 1/3 of their time on data exploration.

If there is no data exploration service, data scientists need to communicate with colleagues and browse the objects they can access to search. Then make some assumptions to verify whether they made the right choice.

This process is actually very time-consuming because there are no suitable tools to help. You have to keep looking for reliable data. However, with the increase of the amount of data, the increase of data platform users and the increase of data analysis requirements, the number of metadata is also increasing. This process brings great challenges to the search process.

The way data scientists use to find data relevant to their needs can quickly backfire and become unreliable, leading to a lot of frustration, uncertainty and creativity.

The solution to these problems is data exploration service.

Data exploration service

Data exploration service means providing users with a tool to understand the data in the platform and its quality. Let's look at the specific implementation.

Amundsen

Lyft is a car-hailing app based in the United States, which has opened up a large number of technical frameworks, including Amundsen. This is a data discovery service named after the great Norwegian explorer, and Lyft's data discovery service aims to solve the problem by searching for valuable information in metadata. It provides a search interface for user data exploration services.

The Amundsen community is very prosperous and is constantly updating and improving.

Apache Atlas

As a leader in metadata management, atlas is undoubtedly one of the best choices.

Metadata sounds easy to interpret and is used to describe data information. The simplest example is that the data is stored in the table, and the relevant information about the table, such as the table name, is metadata. Without the support of metadata, data exploration service no longer exists.

As a big data metadata management platform, Atlas can capture the metadata information of various components on the platform. Called hooks, for example, metadata can be collected from Kafka,Hive,Hbase. Has security and rich Rest Api.

Atlas relies on Hbase and Solr as distributed data storage, thus realizing the storage and search function of metadata. In this way, a comprehensive metadata directory can be established.

Apache Atlas architecture

In practical application, through the combination of the two, we can fully meet our needs.

In this way, data scientists can find the target data in Amundsen.

But the search is obviously only the first step, and after finding the search results, you can go to the table details page.

You can view information such as description, update time, common users, and so on. And the metadata information is updated in real time.

About big data governance how to build big data exploration platform to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report