What are the common Python libraries of data science? 04/19 Update SLTechnology News&Howtos

What are the common Python libraries of data science?

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "what are the common data science Python libraries?". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Pandas

The Pandas library [3] is an essential library for data scientists dedicated to exploratory data analysis. As the name implies, it uses pandas to analyze your data, or more specifically, pandas data frames.

Here are some features you can access and view from HTML reports:

Type inference

Unique value

Missing value

Quantile statistics (for example, median)

Descriptive statistics

Histogram

Relevance (eg Pearson)

Text analysis

How to install it?

Use pip:

Pip install-U pandas-profiling [notebook] jupyter nbextension enable-- py widgetsnbextension works for me, too: pip install pandas-profiling import pandas_profiling

Example:

The following is one of the visual examples that we can access from the profile report feature. You can see an easy-to-understand color correlation visualization map.

Limitations:

If you have a large dataset, this summary report may take quite a long time. My solution is to either simply use a smaller dataset or sample the entire dataset.

NLTK

The term commonly associated with nltk is NLP, or natural language processing, a branch of data science (and other disciplines) that more easily includes text processing. After importing nltk, you can analyze the text more easily.

Here are some features you can access using nltk:

Tagged text (for example, ["tagged", "text"])

Part of speech marker

Stem extraction and morphological restoration

How to install:

Pip install nltk import nltk

Example:

Import nltk thing_to_tokenize = "a long sentence with words" tokens = nltk.word_tokenize (thing_to_tokenize) tokens returns: ["a", "long", "sentence", "with", "words"]

We need to separate each word in order to analyze it.

In some cases, words need to be separated. They can then be marked and counted, and new metrics of machine learning algorithms can use these inputs to create predictions. Another useful feature of taking advantage of nltk is that text can be used for emotional analysis. Emotional analysis is very important in many enterprises, especially those with customer reviews. Now that we talk about emotional analysis, let's take a look at another library that helps with rapid emotional analysis.

TextBlob

TextBlob [8] has many of the same advantages as nltk, but its emotional analysis function is excellent. In addition to analysis, it also has the function of using naive Bayes and decision tree to support classification.

Here are some features you can access using TextBlob:

Tagging

Part of speech tagging

classification

Spelling correction

Affective analysis

How to install:

Pip install textblob from textblob import TextBlob

Example:

Emotional analysis:

Review = TextBlob ("here is a great text blob about wonderful Data Science") review.sentiment returns: Sentiment (polarity=0.80, subjectivity = 0.44)

The normal floating point range is [- 1.0 ~ 1.0], while the positive emotion is between [0. 0 ~ 1.0].

Classification:

From textblob.classifiers import NaiveBayesClassifier training_data = [('sentence example good one',' pos'), ('sentence example great two',' pos'), ('sentence example bad three',' neg'), ('sentence example worse four',' neg')] testing_data = [('sentence example good',' pos'), ('sentence example great',' pos')] cl = NaiveBayesClassifier (training_data)

You can use this classifier to classify text, which will return "pos" or "neg" output.

This simple code from textblob provides very powerful and useful emotion analysis and classification.

PyLDAvis

Another tool that uses NLP is pyLDAvis [10]. It is a library of interactive topic model visualization tools. For example, when I use LDA (latent Dirichlet Distribution) to execute a topic model, I usually see the topic output in the cell, which can be difficult to read. However, when it appears in a good visual summary, it is more beneficial and easier to digest, just like pyLDAvis.

Here are some features you can access using pyLDAvis:

Shows the top 30 most prominent terms

There is an interactive regulator that allows you to slide the correlation measure

Hot topics showing PC1 on the x-axis and PC2 on the y-axis

Display topics corresponding to the size

Overall, this is an impressive way to visualize themes that no other library can do.

How to install:

Pip install pyldavis import pyldavis

Example:

To see the best example, here is a Jupyter Notebook [11] reference that shows many unique and useful features of this data science library: https://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/master/notebooks/pyLDAvis_overview.ipynb

NetworkX

This data science package NetworkX [13] focuses its advantages on the visualization of biological, social, and infrastructure networks.

Here are some features you can access using NetworkX:

Create shapes, nodes, and edges

Check the elements of the diagram

Graph structure

Properties of a graph

Multiple graph

Graphic generators and operations

How to install:

Pip install networkx import networkx

Example:

Create a drawin

Import networkx graph = networkx.Graph ()

You can collaborate with other libraries, such as matplotlib.pyplot, or you can create visualization of graphics (in a way that data scientists are used to seeing).

This is the end of the content of "what are the Python libraries of common data science"? thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.