In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article will explain in detail how to get started with python data analysis. The quality of the article is high, so Xiaobian shares it with you as a reference. I hope you have a certain understanding of relevant knowledge after reading this article.
With the improvement of Python's own functions and the expansion of its ecosystem, Python has gradually emerged in Web development, web crawler, data analysis and data mining, artificial intelligence and other applications. Looking back at the evolution history of phthon, it is mainly as follows:
Django and Flask-led WEB development models
2. Web crawler
3. Automated operation and maintenance
4. Data analysis and scientific calculation
As a database origin, I learn to data analysis and mining as the main direction, so the next round of these contents briefly summarized. Numpy, scipy, released in 2008, and pandas, released in 2009, are the three musketeers of data analysis and scientific computing. Therefore, in this learning process, I often use Numpy, pandas and sklearn toolkits around the direction of data analysis to conduct learning tests.
I. Installation environment
Self-study to start with python3, installation environment recommended to use Anaconda, about anaconda can see this article: Anaconda Complete Getting Started Guide, from https://www.jianshu.com/p/eaee1fadc1e9>
Although there are powerful PyCharm development tools, it is recommended to use the anaconda environment at the beginning, especially the Spyder graphical development interface, which is suitable for beginners to load various packages and view variable values.
Python syntax basics
Python uses a weak variable mechanism, that is, variable types do not need to be explicitly specified. There are not many data types in python, such as tuples, lists, dictionaries, etc. A data type refers to a specific type and an operation set of the type of data. For example, tuples are parenthesized and can be created, queried, etc., but cannot be modified. List is represented by square brackets and can be modified, similar to a one-dimensional array. However, lists can also contain binary tuples (or more), etc., as well as for /while, lambd functions, etc., which implement branching and flow control.
Data analysis and mining, Python basic syntax alone is not enough, so later there are Numpy and scipy packages, mainly used to process arrays (Ndarry) and various types of data calculations. Why array processing, because when learning machine learning we will find that matrix is the basis of pattern recognition or similarity analysis, matrix is actually a multidimensional array, so it is inevitable to calculate the array.
Python data analysis
The real world is not so simple, so later there are pandas, mainly used for table data (in fact, matrix) processing Datafram and series of data, through the various operation methods of pndas, data analysis initial data cleaning and preprocessing work can be basically completed. If you are familiar with the database, the early data processing (data warehouse) can be handled with SQL, then pandas can also skip over, but after learning pandas, we will find that some data processing operations are more efficient and easy to use than SQL, so we still need to learn. Datafram for pandas corresponds to tables in the database or on Exell. Pandas allows us to merge and sort data sets that are particularly needed. Basic statistics, grouping, distribution, crossover, correlation analysis, etc. can also be performed.
Of course, these analyses cannot be separated from the implementation of data import and export functions. According to personal experience, these import and export functions and various database links are implemented smoothly and easily in Python. The link import and export tests of Excell, CVS, text, mysql, etc. can generally be successful at one time. For Oracle database links, because local computers need to install Oracle client reasons, relatively troublesome, the relevant process can be seen in this blog article "Python environment links Oracle database": http://blog.itpub.net/18841027/viewspace-2655148/.
Analyzed data also needs various icons to display, so Python also needs data visualization tools, so Matplotlib appears, the most widely used of which is called matplotlib.pyplot module.
Python Data Mining
In addition to simple analysis, we also need to mine and adopt machine learning algorithms, so there is a machine learning common library scikit-learn(sklearn). Sklearn integrates multiple libraries of machine learning algorithms to quickly build models during data analysis. The essence of machine learning is to first divide the data into training set and test set, build a specific model according to the training set and its target characteristics, and then test and evaluate the test data. After the evaluation results meet the expected standards, the model will be deployed to the production environment, and the input production data (data sets with unknown characteristics) will be used for prediction and judgment. As shown in Figure 1 below:
Figure 1 Machine learning process diagram
Data mining is a professional course with a lot of content, but the implementation in sklearn can be summarized as shown in figure 2 below. As can be seen from the figure, the essence of machine learning is also a classification process. Data is divided into character data (discontinuous) and numerical data (continuous). The classification of character data is still called classification, while the classification of numerical data is called regression. Another classification method is called clustering, and dimensionality reduction is used to reduce computational and spatial complexity.
Figure 2. Sklearn algorithm blueprint
V. Application and actual combat
Unless we do scientific work, our work is basically focused on project applications and engineering. Therefore, with the above foundation, we have to do some code writing or testing of existing code according to specific requirements. Once you understand the ready-made code in these books and have successfully tested it, the next step is to make some modifications to the code, and review the previous content, the so-called review and learn new stage. Wendu stage to master those grammars, when we read again to understand the knowledge point must be different from the first learning stage, is also a necessary stage.
After these phases, we started experimenting with the requirements of the job. Maybe it's not a good start or a good start. But remember that only problem-solving-oriented learning is truly effective. Therefore, although this challenging stage is difficult at the beginning, it gives me a sense of accomplishment and a critical stage in which learning is useful. I myself am ready to begin this phase, so it is too early to conclude now. When there is a harvest, I will gradually share it.
Python is easy to use and has a large number of common libraries, which can be used in a philosophical way, but as a programming language, to master the language to adapt real-world requirements into computer world models and program code, we cannot leave the computational (machine) thinking. What is computational thinking? The essence of computational thinking is abstraction and automation, that is, abstraction at different levels and the "mechanization" of these abstractions. Domestic scholars and experts hold that computational thinking is the third kind of thinking that human beings should possess:
1, experimental thinking: experiment è observation è discovery, inference and summary. --Observation and induction
2. Theoretical thinking: hypothesis/presupposition è definition/property/theorem è proof. ---Reasoning and deduction
Computational thinking: design, construction and computation. ---Design and construction
Computational thinking focuses on the parts of human thinking that are feasible, constructible, and evaluable. In the current environment, theory and experiment methods are inevitably assisted by computational methods in the face of large-scale data.
Just as the foundations of the computer world are composed of 0 and 1, programs, and recursion, Python learning is inseparable from these, especially recursion and loops. For those who are not computer professionals, the study of these basic concepts is very helpful for subsequent code understanding.
About python data analysis entry is how to share here, I hope the above content can be of some help to everyone, you can learn more knowledge. If you think the article is good, you can share it so that more people can see it.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.