In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
In view of how to solve the problem of anonymization of synthetic data in big data science, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.
Now, with the entry into force of GDPR, enterprises must be extra careful when protecting data. Traditional anonymity is usually not true anonymity, and ultimately personal identity can be identified. One way to add an additional level of anonymity to data is to introduce synthetic data.
Since GDPR, an EU-wide data regulation, came into effect in May 2018, many companies with operations in the EU are likely to worry about tort penalties, which can lead to fines of up to 4 per cent of annual global turnover.
Last month, British Airways (British Airways) and Marriott International (Marriott International) were fined a staggering £183 million and £100m respectively, and companies presumably know what a data breach means. This is particularly daunting for large companies such as banks and financial institutions that process large amounts of personal data.
We all know the saying that "data is the new oil". Modern enterprises need to use customer data to better understand customers and train artificial intelligence and machine learning algorithms. But now, in order to avoid data leakage, many companies have strict control over their data and have strict procedures for who can access it and when. Although this is a positive trend in data privacy, it still limits the data flexibility and innovation of organizations.
The problem of traditional anonymization
Smarter companies are now looking for new privacy enhancement technologies to strike a balance between data utility and security, and many are now running data-intensive processes (such as testing and data analysis) on "anonymous" datasets.
There are a variety of anonymization techniques, but one of the most common methods is generalization, changing the particularity of data points (such as the customer's full home address) to a wider range of data points (such as the customer's region or city). By sacrificing a certain degree of utility in the dataset, ensure that the individuals in the dataset are anonymous and unidentifiable.
One of the reasons anonymization has become so popular is that GDPR does not apply to anonymous personal data. But more worryingly, recent research shows that the large amount of anonymization currently used is ineffective in concealing a person's identity. In the vast majority of cases, machine learning models can re-identify individuals.
So, you don't really need personal details to identify them. Therefore, the traditional anonymization technology can not meet the requirements at all.
Complex composite data
In a comprehensive data set, each data point is a fully theorized individual with its own name, age, address, bank account number, tax records, medical records, and any other details required for data analysis. Historically, the main problem with these data is that it is difficult to generate high-quality composite data to meet the needs of advanced data science.
However, these situations will change with the development of artificial intelligence and machine learning. By training algorithms on "real" data, we can now generate composite datasets that retain all the underlying statistics of the original data, but personal or identifiable information is zero.
A simple way is to generate an GANs through Nvidia, which is the technology behind the This Person Does Not Exist site. The site uses real celebrity face data sets to generate surreal images of people who don't exist. In essence, this is synthetic data, and everyone has many attributes that can be analyzed (such as eye color, hair color, skin color), but these data cannot be destroyed because they do not belong to real people.
If you apply this technology to customer data, you can have data that can be shared across the data science team and used for a variety of modeling without too much management and privacy risks. At the same time, your "real" customer data can be stored on a secure server, and few people need to access it.
Write at the end
As more and more companies want to adopt integrated data strategies, there is no doubt that all industries will have a chain reaction. Equipped with the necessary tools to unleash the potential of their data, organizations will be able to leverage their customer data while avoiding risk and accountability.
With data science, advanced machine learning and a variety of new technologies, the data economy is about to be reshaped and a new era of data innovation is coming.
The emergence of social media has brought great leaps in the field of artificial intelligence, but few people pay attention to the security of data. Now, with synthetic data, we can move forward along the path of data science. But this time, while sticking to the rules, you also need to be more cautious about the data.
The answer to the question about how to solve the problem of anonymization of synthetic data in big data's science is shared here. I hope the above content can be of some help to you, if you still have a lot of doubts to be solved. You can follow the industry information channel for more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.