In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
Python is an object-oriented, literal translation computer programming language. Because of its simplicity, easy to learn, free open source, portability, expansibility and other characteristics, Python is also called glue language. The figure below shows the popular trend of major programming languages in recent years, and the popularity of Python has skyrocketed.
Because Python has a very rich library, it is also widely used in the field of data analysis.
First, why use Python for data analysis? In my opinion, there are probably three main reasons.
Breadth: all industries have their own business scenarios, and each industry needs to use data to assist in decision-making. In the face of the situation where everyone is talking about big data, data analysis is a skill you have to know.
Precision: Python is a programming language. Maybe you used to rely entirely on the default settings of excel to generate charts, never thinking about why you made a data graph, but if you use programming tools, you have to think about the reasons for forming each step from the length and width of the chart in order to better understand the data.
Efficient: traditional data work involves a large number of repetitive mental operations, such as synthesizing daily tables into weekly tables, such as batch deletion of a field, such as batch deletion of null values. These tasks cannot be compiled into workflows through mouse-click software, but they can be automated by python programming, saving a lot of time.
Summary of basic library
Here is a brief summary of the important libraries you will often come into contact with:
NumPy: has a lot of core functions of scientific computing. Because its internal operations are implemented in C, it is much faster than the same function written in Python. But it's not the most user-friendly package.
SciPy: very similar to NumPy, but there are more ways to sample from the distribution, calculate test statistics, and so on.
MatPlotLib: the main drawing framework. Not very likeable, but it is a must-have bag.
Pandas: basically lightweight packaging of NumPy/SciPy to make them more user-friendly. It is ideal for interacting with table data, which is called DataFrame in Pandas. There is also some packaging for the drawing function, which makes it possible to quickly implement drawing without using MPL (Meta-Programming Library, meta programming library). I use Pandas instead of other tools to manipulate data.
Machine Learning and computer Vision
Crab: a flexible and fast recommendation engine
Gensim: a humanized topic modeling library
Hebel:GPU accelerated Deep Learning Library
NuPIC: intelligent Computing Numenta platform
Pattern:Python web mining module
PyBrain: another Python machine learning library
Pylearn2: a Machine Learning Library based on Theano
Python-recsys: a Python library for implementing recommendation systems
Scikit-learn: a Machine Learning Python Module based on SciPy
Pydeep:Python Deep Learning Library
Vowpalporpoise: Python encapsulation of lightweight Vowpal Wabbit
Skflow: a simplified interface to TensorFlow (imitating scikit-learn)
Caffe: a python interface for Caffe
OpenCV: open source computer visual library
Packaging libraries for pyocr:Tesseract and Cuneiform
Another packaging library for pytesseract:Google Tesseract OCR
SimpleCV: an open source framework for creating computer vision applications
The above list is just some of them, and there are many more. Of course, many of them are not implemented in Python, but they all provide a common Python interface, and several even regard Python as a first-class citizen (First-Class).
I'm not here to say that the language Python is powerful or complex, but on the contrary, thanks to the simplicity and inclusiveness of Python. To make it have such a position in the field of data mining.
II. Python data analysis process
1. Data acquisition: public data, Python crawler
There are two main ways to obtain external data.
The first is to obtain external public data sets, some scientific research institutions, enterprises, governments will open some data, you need to go to specific websites to download this data. These data sets are usually relatively perfect and of relatively high quality.
Another way to get external data is crawlers.
For example, you can get the recruitment information for a position on the recruitment website through a crawler, the rental information of a city on the rental website, the list of movies with the highest score on Douban, and the ranking of Zhihu likes and NetEase Yun music reviews. Based on the data crawled on the Internet, you can analyze certain industries and certain groups of people.
Commonly used e-commerce websites, Q & A sites, second-hand trading sites, marriage sites, recruitment sites and so on, can climb to very valuable data.
Python is flexible and easy to use, easy to read and write, and it can call database and local data very conveniently. At the same time, Python is also the preferred tool for web crawlers.
Scrapy
A fast, high-level screen capture and web crawl framework developed by Python for crawling web sites and extracting structured data from pages. Scrapy has a wide range of uses and can be used for data mining, monitoring and automated testing.
2. Data collation
NumPy (Numeric Python)
Many advanced numerical programming tools are provided, such as matrix data types, vector processing, and sophisticated operation libraries. Designed for strict digital processing. It is often used by many large financial companies, as well as core scientific computing organizations such as: Lawrence Livermore,NASA to handle some of the tasks originally done using Clearing magic Fortran or Matlab.
Pandas (Python Data Analysis Library)
Pandas is a tool based on NumPy, which is created to solve data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to manipulate large datasets efficiently. Pandas provides a large number of functions and methods that enable us to process data quickly and easily. You will soon find that it is one of the important factors that make Python a powerful and efficient data analysis environment.
3. Modeling and analysis.
Every computer programming language seems to have its own field of fame or application.
In this era when everyone is talking about cloud computing, big data, and deep learning, let's take a look at the representatives in these fields.
Irresponsibly, Python has become the de facto standard language in the field of data analysis.
Scikit-learn
It provides and summarizes the common algorithms and solving problems in the field of data analysis, such as classification problem, regression problem, clustering problem, dimension reduction, model selection and feature engineering.
4. Data visualization
Matplotlib: a Python 2D drawing library
Bokeh: interactive web drawing with Python
Python version of API provided by ggplot:ggplot2 to R
Plotly: web drawing Library working with Python and matplotlib
Pyecharts: data Visualization Library based on Baidu Echarts
Pygal: a Python SVG chart creation tool
Python interface of pygraphviz:Graphviz
PyQtGraph: interactive real-time 2D/3D/ image rendering and science / engineering components
SnakeViz: a browser-based tool for viewing the output results of Python's cProfile modules
Vincent: a conversion tool for converting Python to Vega syntax
VisPy: a High performance Scientific Visualization tool based on OpenGL
If you look at visualization in Python, you might think of Matplotlib. In addition, Seaborn is a similar package, which is used for statistical visualization. You can make very complicated diagrams and some code. There is also Bokeh, which has a lot of interactive functions and can do many different types of pictures. Similar to Bokeh is Plotly. It presents the graph in the browser and can be visualized interactively. Although the drawing function of Python is not as powerful as R, I am optimistic about its development prospect.
III. Summary
At the beginning, you may not be very thoughtful, and you will always encounter all kinds of problems, such as the following:
1. Environment configuration, tool installation, environment variables, too unfriendly to rookies
two。 Lack of a reasonable learning path, come to Python, HTML all kinds of study, extremely easy to give up
3.Python has many packages and frameworks to choose from. I don't know which one is more friendly.
4. There is no solution to the problem, and learning is stagnant.
5. The information on the Internet is very scattered and unfriendly to rookies, and many of them seem to be in the clouds.
6. Know the skills, but unable to think and analyze systematically in the face of specific problems
But with the accumulation of your experience, you will gradually find the direction of analysis, what are the general dimensions of analysis, such as Top list, average, regional distribution, year-on-year comparison, correlation analysis, future trend prediction, and so on. As you gain experience, you will have some of your own feelings about data, which is what we usually call data thinking.
If you really want to work in the data field, or even want to pursue a career in data science. Please have confidence in Python. It's worth your time. Want to take the road of machine learning, Scikit-learn is your best choice, while operating examples, while reading documents, and then with the relevant theoretical basis, hold on for a few days, the great cause can also be achieved.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.