In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the knowledge about "what functions the practical data science Python library has". In the actual case operation process, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
1. obtain data
Getting data is a critical step in solving data science problems. You need to ask a question and finally solve it. It depends on how and where you get your data. A good way to get data is to download it from Kaggle or grab it from the web.
Of course, you can also use appropriate methods and tools to grab data from the network.
The most important and commonly used libraries for web crawling include:
1.Beautiful Soup
2.Requests
3.Pandas
Beautiful Soup is a Python library that extracts data from HTML and XML files. Readers are recommended to read the Beautiful Soup Library official documentation.
If Python is already installed, simply type the following command to install Beautiful Soup. All libraries involved in this paper give installation methods. But I recommend that readers use Google Colab to practice the code. In Google Colab, there is no need to install manually, just type "importlibrary_name" and Colab will install automatically.
pip install beautifulsoup4
Import the Beautiful Soup library:
from bs4 import BeautifulSoupSoup = BeautifulSoup(page_name.text, ‘html.parser’)
Python's Requests library makes sending HTTP requests easier. There are many methods in the Requests library, the most common of which is request.get(). request.get() returns the status of the URL forward, either successfully or unsuccessfully.
Install Requiets:
pip install requests
Import Requests Library:
import requestspaga_name = requests.get('url_name')
Pandas is an easy-to-use, high-performance data structure and Python programming language analysis tool. Pandas provide a data framework that stores data cleanly and concisely.
Install Pandas:
pip install pandas
Import Pandas library:
import pandas as pd
2. cleaning data
There are many important steps to cleaning up data, often including eliminating duplicate rows, eliminating outliers, finding missing and null values, and converting object values to null values and plotting them.
Common libraries for data cleansing include:
1.Pandas
2.NumPy
Pandas are the "cure-all" of data science--available everywhere. Pandas are described above and will not be repeated here.
NumPy, Numeric Python, is a Python library that supports scientific computation. Python is known to not natively support matrix data structures, while the NumPy library in Python supports creating and running matrix calculations.
Run the following command to download NumPy(make sure Python is installed):
python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose
Import NumPy library:
import numpy as np
3. explore data
Exploratory Data Analysis (EDA) is a tool used to enhance the understanding of information indexing by regularly deleting and graphing the basic features of the index. Using EDA can help users explore data more deeply and clearly, showing the release or situation of important information collection.
Common libraries for running EDA include:
1.Pandas
2.Seaborn
3.Matplotlib.pyplot
Pandas: See above.
Seaborn is a Python data visualization library that provides a high-level interface for plotting data graphs. Install the new version of Seaborn:
pip install seaborn
Seaborn official documentation: seaborn.pydata.org/examples/index.html? source=post_page-----a58e90f1b4ba----------------------#example-gallery
Using Seaborn, you can easily draw bar charts, scatter plots, heat charts, and more. Import Seaborn:
import seaborn as sns
Matplotlib is a Python 2D graphics gallery that can be used to plot graphs in a variety of environments, replacing Seaborn. In fact, Seaborn is based on Matplotlib.
Install Matplotlib:
python -m pip install -U matplotlib
Recommended reading Matplotlib official documentation: matplotlib.org/users/index.html? source=post_page-----a58e90f1b4ba----------------------
Import Matplotlib.pyplot library:
import matplotlib.pyplot as plt
4. build models
Building models is a critical step in data science. Because this step requires building a machine learning model based on the problem to be solved and the data obtained, it is more difficult than the other steps. The problem statement is crucial in this step because it affects the definition of the problem and the proposed solution. Most of the open data sets on the web are collected based on a single problem, so the ability to solve problems is especially important. Also, since no particular algorithm is best for you, you need to choose among multiple algorithms, considering whether regression, classification, clustering, or dimensionality reduction is appropriate for your data.
Choosing algorithms is often a headache. Readers can use SciKit learn algorithm selection path graphs to track which algorithm performs best. The following diagram shows a SciKit learn roadmap:
It is not difficult to guess that the most commonly used libraries for modeling are:
1.SciKit learn
SciKit learn is an easy-to-use library in Python for building machine learning models. It is based on NumPy, SciPy and Matplotlib. SciKit learn library official documentation is as follows: scikit-learn.org/stable/? source=post_page-----a58e90f1b4ba----------------------
Import scikit learn:
import sklearn
Install scikit learn:
pip install -U scikit-learn
5. presentation data
This is the final step in data science and one that many people don't want to do-after all, no one wants to publish their data findings publicly. There is a way to present data, and this is extremely important, because the results will eventually be presented to people anyway. And because people don't care about the algorithm they use, they care about the result, so the presentation has to be concise.
Also, install the following command to equip notebooks with presentation options:
pip install RISE
"Practical data science Python library what functions" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.