In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
I. the five basic aspects of big data's analysis.
1. Visual analysis
The users of big data's analysis are big data analysts and ordinary users, but their most basic requirement for big data's analysis is visual analysis, because visual analysis can intuitively show the characteristics of big data. At the same time, it can be easily accepted by readers, as simple and clear as looking at pictures.
2. Data mining algorithm
The theoretical core of big data's analysis is the data mining algorithm. Based on different data types and formats, various data mining algorithms can more scientifically show the characteristics of the data itself. It is precisely because these statistical methods (which can be called truth) recognized by statisticians all over the world can go deep into the data and dig out the recognized value. On the other hand, it is because of these data mining algorithms that we can deal with big data more quickly. If an algorithm takes several years to come to a conclusion, then the value of big data is out of the question.
Here I still want to recommend the big data Learning Exchange Group I built myself: 529867072, all of them are developed by big data. If you are studying big data, the editor welcomes you to join us. Everyone is a software development party. Irregularly share practical information (only related to big data software development), including the latest big data advanced materials and advanced development tutorials sorted out by myself. Welcome to join us if you want to go deep into big data.
3. Predictive analysis ability
One of the final application areas of big data analysis is predictive analysis, mining characteristics from big data, through the scientific establishment of the model, and then we can bring in new data through the model, so as to predict the future data.
4. Semantic engine
Big data analysis is widely used in network data mining, which can analyze and judge users' needs from users' search keywords, tag keywords, or other input semantics, so as to achieve better user experience and advertising matching.
5. Data quality and data management
Big data analysis is inseparable from data quality and data management, high-quality data and effective data management, whether in academic research or in business applications, can ensure the authenticity and value of the analysis results. Big data's analysis is based on the above five aspects. Of course, if there is a more in-depth analysis of big data, there are many more characteristic, more in-depth, and more professional big data analysis methods.
Second, how to choose the appropriate data analysis tool
To understand what data to analyze, there are four main types of data that big data wants to analyze:
1. Transaction data (TRANSACTION DATA)
The big data platform can access a larger time span and a larger amount of structured transaction data, so that it can analyze a wider range of transaction data types, including not only POS or e-commerce shopping data, but also behavioral transaction data, such as Internet clickstream data logs recorded by Web servers.
2. Artificial data (HUMAN-GENERATED DATA)
Unstructured data is widespread in e-mail, documents, pictures, audio, video, and data streams generated through blogs, wikis, and especially social media. These data provide a rich data source for using text analysis function for analysis.
3. Mobile data (MOBILE DATA)
Smartphones and tablets with Internet access are becoming more and more common. App on these mobile devices can track and communicate numerous events, from transaction data within the App (such as logging events for search products) to personal information or status report events (such as location changes, that is, reporting a new geocode).
4. Machine and sensor data (MACHINE AND SENSOR DATA)
This includes data created or generated by functional devices, such as smart meters, intelligent temperature controllers, factory machines, and Internet-connected household appliances. These devices can be configured to communicate with other nodes in the Internet and can automatically transmit data to a central server so that the data can be analyzed. Machine and sensor data are the main examples from the emerging Internet of things (IoT). Data from the Internet of things can be used to build analytical models, continuously monitor predictive behavior (such as identification when sensor values indicate a problem), and provide prescribed instructions (such as alerting technicians to check the equipment before something really goes wrong).
What are the requirements and goals of data analysis tools?
Can apply advanced analysis algorithms and models to provide analysis with big data platform as the engine, such as Hadoop or other high-performance analysis systems that can be applied to structured and unstructured data from a variety of data sources. With the increase of data used for analysis models, the analysis model can be extended, or has been integrated into data visualization tools to integrate with other technologies.
In addition, the tool must include some of the necessary features, including integrated algorithms and supporting data mining techniques, including, but not limited to:
Clustering and subdivision: dividing a large entity into small groups that share common characteristics. For example, analyze the collected customers and identify more subdivided target markets. Classification: organize data into predetermined categories. For example, according to the segmentation model to determine how to classify customers. Restore: used to restore the relationship between dependent variables and one or more independent variables to help determine how dependent variables change according to changes in independent variables. For example, geographic data, net income, average summer temperature and floor space are used to predict the future direction of property. Federation and item set mining: look for the correlation between variables in big data set. For example, it can help call center representatives provide more accurate information based on caller customer segmentation, relationships, and complaint types. Similarity and association: for indirect clustering algorithms. The similarity integral algorithm can be used to determine the similarity of entities in the standby cluster. Neural network: indirect analysis for machine learning.
What do people know through data analysis tools?
Data scientists, who want to use more complex data types to achieve more complex analysis, are familiar with how to design and apply basic models to assess inherent tendencies or deviations. Business analysts, who are more like casual users, want to use data to achieve active data discovery, or to visualize existing information and partial predictive analysis. Business managers, they want to know the models and conclusions. IT developers who provide support for all of these categories of users.
How to choose the most suitable big data analysis software?
The professional knowledge and skills of analysts. Some tools are aimed at novice users, some are professional data analysts, and some are designed for both audiences.
Analyze diversity. Depending on user cases and applications, enterprise users may need to support different types of analysis functions, using specific types of modeling (such as regression, clustering, segmentation, behavioral modeling, and decision trees). These functions have been able to support a wide range of high-level, different forms of analytical modeling, but there are still some manufacturers who have invested decades of effort to adjust different versions of algorithms to add more advanced functions. It is important to understand which models are most relevant to the problems facing the enterprise and to evaluate the product based on how the product best meets the business needs of the user. Data range analysis. The scope of data to be analyzed involves many aspects, such as structured and unstructured information, traditional local databases and data warehouses, cloud-based data sources, data management on big data platforms such as Hadoop, and so on. However, different products provide different levels of support for data management on non-traditional data lakes (within Hadoop or other NoSQL data management systems used to provide scale-out). How to choose a product, the enterprise must consider the specific requirements of obtaining and processing the amount and type of data. Collaborate. The larger the enterprise, the more likely it is to share analysis, models, and applications across departments and among many analysts. If an enterprise has many analysts distributed in various departments, how to interpret and analyze the results may need to add more shared models and collaborative methods. License and maintenance budget. Almost all manufacturers have different versions of their products, with different purchase costs and overall operating costs. License fees are proportional to features, functions, the amount of analytical data, or the number of nodes that can be used by the product. Easy to use. Can business analysts without statistical background easily develop analysis and applications? Determine whether the product provides a visual method that facilitates development and analysis. Unstructured data usage. Confirm that the product can use different types of unstructured data (documents, emails, images, videos, presentations, social media channel information, etc.) and can parse and utilize the information received. Scalability and scalability. With the continuous growth of the amount of data and the continuous expansion of the data management platform, it is necessary to evaluate how different analysis products grow with the growth of processing and storage capacity.
Third, how to distinguish between the three hot occupations of big data-- data scientist, data engineer and data analyst.
With the increasing popularity of big data, big data's career has also become popular, bringing a lot of opportunities for talent development. Data scientist, data engineer and data analyst have become the hottest positions in big data industry. How are they defined? What kind of work do you do exactly? What skills are required? Let's have a look together.
How are these three occupations positioned?
What kind of existential data scientist is a data scientist who can digitally reproduce and recognize complex and large amounts of information such as numbers, symbols, text, web sites, audio or video by using scientific methods and data mining tools? an engineer or expert who can find new data insights (different from statisticians or analysts). How data engineers are defined is generally defined as "star software engineers with a deep understanding of statistical disciplines". If you are worrying about a business problem, you need a data engineer. Their core value lies in their ability to create data pipelines with clear data. A full understanding of file systems, distributed computing and databases is a necessary skill to be a good data engineer. Data engineers have a good understanding of algorithms. Therefore, the data engineer should be able to run the basic data model. The high-end demand of business has given birth to the demand of high complexity of calculus. In many cases, these requirements are beyond the scope of the knowledge of the data engineer, and you need to call the data scientist for help. How should data analysts understand that data analysts refer to professionals in different industries who specialize in collecting, collating and analyzing industry data, and make industry research, evaluation and forecasts based on the data. They know how to ask the right questions and are very good at data analysis, data visualization and data presentation.
What are the specific responsibilities of these three occupations?
The job responsibilities of data scientists data scientists tend to look at the world around them by exploring data. Turn a large amount of scattered data into structured data that can be analyzed, find rich data sources, integrate other possibly incomplete data sources, and clean up the resulting dataset. In the new competitive environment, challenges are constantly changing and new data is constantly flowing in. Data scientists need to help decision makers shuttle through a variety of analyses, from temporary data analysis to continuous data interaction analysis. When they find something, they communicate their findings and suggest new business directions. They creatively display visual information and make the patterns they find clear and persuasive. Suggest the rules contained in the data to the Boss, thereby affecting products, processes, and decisions. The job responsibility of a data engineer is to analyze history, predict the future, and optimize choices, which are the three most important tasks for big data engineers when "playing with data". Through these three work directions, they help enterprises make better business decisions. A very important job of engineer big data is to find out the characteristics of past events by analyzing the data. For example, Tencent's data team is building a data warehouse to sort out a large number of irregular data information on all the company's network platforms and summarize the characteristics that can be queried to support the data needs of all kinds of businesses of the company. including advertising, game development, social networks and so on. To find out the characteristics of past events, the greatest role is to help enterprises better understand consumers. By analyzing the trajectory of a user's past behavior, we can understand this person and predict his behavior. By introducing key factors, engineer big data can predict future consumption trends. On Alimama's marketing platform, engineers are trying to help Taobao sellers do business by introducing weather data. For example, if it is not hot this summer, it is likely that some products will not sell well last year, except for air conditioners, electric fans, vests, swimsuits and so on. Then we will establish the relationship between meteorological data and sales data, find the relevant categories, and warn the seller to turn around the inventory in advance. According to the business nature of different enterprises, big data engineers can achieve different purposes through data analysis. In the case of Tencent, the simplest and most straightforward example that reflects the work of big data engineers is option testing (AB Test), which helps product managers choose between An and B. In the past, decision makers could only make judgments based on experience, but now big data engineers can use a wide range of real-time tests-for example, in the case of social networking products, half of the users see interface An and the other half use interface B. observe and count the click-through rate and conversion rate over a period of time to help the marketing department make the final choice. The job responsibilities of data analysts the Internet itself has the characteristics of digitization and interaction, which has brought revolutionary breakthroughs to data collection, collation and research. In the past, data analysts in the "atomic world" spent a high cost (capital, resources and time) to obtain the data to support research and analysis, and the richness, comprehensiveness, continuity and timeliness of the data were much worse than those in the Internet era. Compared with traditional data analysts, data analysts in the Internet era face not a lack of data, but a glut of data. Therefore, data analysts in the Internet era must learn to use technical means for efficient data processing. More importantly, data analysts in the Internet era should constantly make innovations and breakthroughs in the methodology of data research. As far as the industry is concerned, the value of data analysts is similar. As far as the press and publishing industry is concerned, whether the media operators can accurately, detailedly and timely understand the audience situation and changing trends in any era is the key to the success or failure of the media.
In addition, for the content industry such as press and publication, what is more critical is that data analysts can play the function of content consumer data analysis, which is a key function to support news and publishing organizations to improve customer service.
What skills do you need to master if you want to pursue these three professions?
a. Skills that data scientists need to master
1. Computer science
Generally speaking, most data scientists require professional background in programming and computer science. To put it simply, it is the machine learning-related skills of hadoop, Mahout and other massively parallel processing technologies necessary to deal with big data.
2. Mathematics, statistics, data mining, etc.
In addition to mathematical and statistical literacy, it is also necessary to have the skills to use mainstream statistical analysis software such as SPSS and SAS. Among them, the open source programming language for statistical analysis and its running environment "R" have attracted much attention recently. The strength of R is that it not only contains a rich statistical analysis library, but also has the high-quality chart generation function to visualize the results, and can be run through simple commands. In addition, it has a package extension mechanism called CRAN (The Comprehensive R Archive Network), which allows you to use functions and datasets that are not supported in standard state by importing extension packages.
3. Data visualization (Visualization)
The quality of information depends to a large extent on its expression. It is one of the most important skills for data scientists to analyze the meaning of the data composed of digital lists, develop Web prototypes, and use external API to unify charts, maps, Dashboard and other services, so as to visualize the analysis results.
b. Skills that data engineers need to master
1. Background related to mathematics and statistics
The requirements for big data engineers are master's or doctoral degrees with a background in statistics and mathematics. Data workers without a theoretical background are more likely to enter a skill danger zone (Danger Zone)-a bunch of numbers that can always correct some results according to different data models and algorithms, but if you don't know what that means, it's not really meaningful, and that result can easily mislead you. Only with certain theoretical knowledge can we understand the model, reuse model and even innovative model to solve practical problems.
2. Computer coding ability
Actual development capabilities and large-scale data processing capabilities are necessary for big data as an engineer. Because the value of a lot of data comes from the process of mining, you have to do it yourself to discover the value of gold. For example, many of the records people generate on social networks are unstructured data. How to grab meaningful information from these clueless text, voice, images and even videos requires engineer big data to dig himself. Even in some teams, big data engineers are mainly responsible for business analysis, but they should also be familiar with the way computers deal with big data.
3. Knowledge of specific application areas or industries
Big data engineer's role is very important, can not be separated from the market, because big data can only be combined with specific areas of application to generate value. Therefore, experience in one or more vertical industries can accumulate knowledge of the industry for candidates, which is of great help to become big data engineer later, so this is also a more convincing addition when applying for this position.
c. Skills that data analysts need to master
1. Understand business. The premise of engaging in data analysis work will need to understand the business, that is, familiar with the industry knowledge, the company's business and processes, it is best to have their own unique views, if separated from the industry cognition and the company's business background, the result of the analysis will only be kites that are off-line. it doesn't have much use value.
2. Understand management. On the one hand, it is the requirement to build a data analysis framework, for example, to determine the analysis ideas need to use marketing, management and other theoretical knowledge to guide, if you are not familiar with management theory, it is difficult to build a data analysis framework, follow-up data analysis is also very difficult. On the other hand, the function is to put forward instructive suggestions for the conclusions of data analysis.
3. Know how to analyze. Means to master the basic principles of data analysis and some effective data analysis methods, and can be flexibly applied to practical work, in order to effectively carry out data analysis. The basic analysis methods are: comparative analysis, grouping analysis, cross analysis, structure analysis, funnel chart analysis, comprehensive evaluation analysis, factor analysis, matrix correlation analysis and so on. The advanced analysis methods are: correlation analysis, regression analysis, cluster analysis, discriminant analysis, principal component analysis, factor analysis, correspondence analysis, time series and so on.
4. Know the tools. It refers to mastering common tools related to data analysis. The data analysis method is the theory, and the data analysis tool is the tool to realize the data analysis method theory. in the face of more and more huge data, we can not rely on the calculator to carry on the analysis. we must rely on powerful data analysis tools to help us complete the data analysis work.
5. know how to design. Understanding design refers to the use of charts to effectively express the analytical views of data analysts, so that the analysis results are clear at a glance. Chart design is a great knowledge, such as the choice of graphics, layout design, color matching and so on, all need to master certain design principles.
4. A 9-step development plan from a rookie to a data scientist
First of all, each company has a different definition of data scientist, and there is no unified definition at present. But in general, a data scientist combines the skills of a software engineer and a statistician and invests a lot of industry knowledge in the field in which he or she wants to work.
About 90% of data scientists have at least a college education, even a doctorate and a doctorate, of course, in a wide range of fields. Some recruiters even find that people majoring in the humanities have the creativity they need to teach others some key skills.
So, excluding a degree program in data science (famous universities around the world are springing up after a spring rain), what steps do you need to take to become a data scientist?
1. Review your math and statistical skills.
A good data scientist must be able to understand what the data tells you, and to do this, you must have a solid basic linear algebra and an understanding of algorithms and statistical skills. Advanced mathematics may be needed on certain occasions, but this is a good place to start.
2. Understand the concept of machine learning.
Machine learning is the next new word, but it is inextricably linked with big data. Machine learning uses artificial intelligence algorithms to convert data into value without explicit programming.
3. Learn the code.
Data scientists must know how to adjust the code to tell the computer how to analyze the data. Start with an open source language such as Python.
4. Understand databases, datapools and distributed storage.
Data is stored in databases, datapools, or the entire distributed network. And how to build a repository of this data depends on how you access, use, and analyze the data. If you don't have an overall architecture or advance planning when you build your data store, the impact on you will be profound.
5. Learn data modification and data cleaning techniques.
Data modification is the transfer of raw data to another format that is easier to access and analyze. Data cleanup helps eliminate duplicates and "bad" data. Both are essential tools in the toolbox of data scientists.
6. Basic knowledge of good data visualization and reporting.
You don't have to be a graphic designer, but you do need to know how to create data reports so that laymen such as your manager or CEO can understand.
7. Add more tools to your toolbox.
Once you've mastered the above skills, it's time to expand your data science toolbox, including Hadoop, R, and Spark. Your experience and knowledge of using these tools will put you on top of a large number of data science candidates.
8. Practice.
How do you practice being a data scientist before you have a job in a new field? Use open source code to develop a project you like, participate in a competition, become a web working data scientist, attend training camp, volunteer, or intern. The best data scientists will have experience and intuition in the data field and will be able to showcase their work to become candidates.
9. Become a member of the community.
Follow the thought leaders in the same industry, read industry blogs and websites, participate, ask questions, and keep abreast of current events and theories.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.