In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge of "how to use the Python library". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
The five important steps in data science include:
Get data
Clean up data
Exploration data
Build data
Present data
These five steps are just experience, not a standard answer. But if you think about it carefully, you will find that these five steps are very reasonable.
1. Get data
Obtaining data is a key step to solve the problem of data science. You need to ask a problem and finally solve it. It depends on how and where you get the data. The best way to get data is to download it from Kaggle or grab it from the Internet.
Of course, you can also use appropriate methods and tools to crawl data from the network.
The most important and commonly used libraries for network data crawling include:
Beautiful Soup
Requests
Pandas
Beautiful Soup is a Python library that extracts data from HTML and XML files. Readers are recommended to read the official documentation of the Beautiful Soup library.
If Python is already installed, simply enter the following command to install Beautiful Soup. The installation methods are given for all the libraries involved in this paper. But I recommend that readers use Google Colab to make it easier to practice the code. In Google Colab, there is no need to install manually, just type "importlibrary_name" and Colab will be installed automatically.
Pip install beautifulsoup4
Import the Beautiful Soup library:
From bs4 import BeautifulSoupBeautifulSoupSoup = BeautifulSoup (page_name.text, 'html.parser')
Python's Requests library sends HTTP requests in a more user-friendly manner. There are many methods in the Requests library, the most common of which is request.get (). If URL forwarding succeeds or fails, request.get () can return the URL forwarding status.
Install Requets:
Pip install requests
Import the Requests library:
Import requestsrequestspaga_name = requests.get ('url_name')
Pandas is not only an easy-to-use high-performance data structure, but also a Python programming language analysis tool. Pandas provides a data framework that can store data clearly and succinctly.
Install Pandas:
Pip install pandas
Import the Pandas library:
Import pandas as pd
two。 Clean up data
There are many important steps to clean up data, which often include clearing duplicate rows, clearing outliers, finding missing and null values, and converting object values to null values and charting them.
Common libraries for data cleaning include:
Pandas
NumPy
Pandas can be said to be the panacea of data science; it's available everywhere.
NumPy, or Numeric Python, is a Python library that supports scientific computing. It is well known that Python itself does not support matrix data structures, while the NumPy library in Python supports creating and running matrix calculations.
Run the following command to download NumPy (make sure Python is installed):
Python-m pip install-- user numpy scipy matplotlib ipython jupyter pandas sympy nose
Import the NumPy library:
Import numpy as np
3. Exploration data
Exploratory data Analysis (Exploratory Data Analysis, EDA) is a tool used to enhance the understanding of information indexing, which is realized by regularly deleting and charting the basic features of the index. Using EDA can help users explore the data more deeply and clearly, and show the release or situation of important information collection.
Common libraries that run EDA include:
Pandas
Seaborn
Matplotlib.pyplot
Seaborn is a Python data visualization library that provides an advanced interface for drawing data charts. Install the latest version of Seaborn:
Pip install seaborn
With Seaborn, you can easily draw bar charts, scatter charts, thermal maps, and so on. Import Seaborn:
Import seaborn as sns
Matplotlib is a Python 2D graphics library that can draw charts in a variety of environments, replacing Seaborn. In fact, Seaborn is based on Matplotlib.
Install Matplotlib:
Python-m pip install-U matplotlib
Import the Matplotlib.pyplot library:
Import matplotlib.pyplot as plt
4. Build a model
Modeling is a key step in data science. Because this step requires building a machine learning model based on the problem to be solved and the data obtained, it is more difficult than other steps. In this step, the problem statement is critical because it affects the definition of the problem and the proposed solution. Most of the open data sets on the network are collected based on a particular problem, so the ability to solve the problem is particularly important. Moreover, since there is no specific algorithm that is most suitable for you, you need to choose among a variety of algorithms, considering whether the data is suitable for regression, classification, clustering or dimensionality reduction.
Choosing an algorithm is often a headache. Readers can use the SciKit learn algorithm to select a path map to record which algorithm has the best performance. The following figure shows a path map of SciKit learn:
It is not difficult to guess that the most commonly used libraries for modeling are:
(1) SciKit learn
SciKit learn is an easy-to-use library for building machine learning models in Python. It is based on NumPy, SciPy, and Matplotlib.
Import scikit learn:
Import sklearn
Install scikit learn:
Pip install-U scikit-learn
5. Present data
This is the last step in data science, and one that many people don't want to do-after all, no one wants to publish their data findings publicly. There is also a way to present the data, and this method is extremely important, because in any case, the results will eventually be shown to people. And because people don't care about the algorithms used, they only care about the results, so the presentation needs to be concise and clear.
At the same time, install the following instructions to equip notebook with display options:
This is the end of pip install RISE's "how to use the Python Library". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.