In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "what are the common data science Python libraries?". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Pandas
The Pandas library [3] is an essential library for data scientists dedicated to exploratory data analysis. As the name implies, it uses pandas to analyze your data, or more specifically, pandas data frames.
Here are some features you can access and view from HTML reports:
Type inference
Unique value
Missing value
Quantile statistics (for example, median)
Descriptive statistics
Histogram
Relevance (eg Pearson)
Text analysis
How to install it?
Use pip:
Pip install-U pandas-profiling [notebook] jupyter nbextension enable-- py widgetsnbextension works for me, too: pip install pandas-profiling import pandas_profiling
Example:
The following is one of the visual examples that we can access from the profile report feature. You can see an easy-to-understand color correlation visualization map.
Limitations:
If you have a large dataset, this summary report may take quite a long time. My solution is to either simply use a smaller dataset or sample the entire dataset.
NLTK
The term commonly associated with nltk is NLP, or natural language processing, a branch of data science (and other disciplines) that more easily includes text processing. After importing nltk, you can analyze the text more easily.
Here are some features you can access using nltk:
Tagged text (for example, ["tagged", "text"])
Part of speech marker
Stem extraction and morphological restoration
How to install:
Pip install nltk import nltk
Example:
Import nltk thing_to_tokenize = "a long sentence with words" tokens = nltk.word_tokenize (thing_to_tokenize) tokens returns: ["a", "long", "sentence", "with", "words"]
We need to separate each word in order to analyze it.
In some cases, words need to be separated. They can then be marked and counted, and new metrics of machine learning algorithms can use these inputs to create predictions. Another useful feature of taking advantage of nltk is that text can be used for emotional analysis. Emotional analysis is very important in many enterprises, especially those with customer reviews. Now that we talk about emotional analysis, let's take a look at another library that helps with rapid emotional analysis.
TextBlob
TextBlob [8] has many of the same advantages as nltk, but its emotional analysis function is excellent. In addition to analysis, it also has the function of using naive Bayes and decision tree to support classification.
Here are some features you can access using TextBlob:
Tagging
Part of speech tagging
classification
Spelling correction
Affective analysis
How to install:
Pip install textblob from textblob import TextBlob
Example:
Emotional analysis:
Review = TextBlob ("here is a great text blob about wonderful Data Science") review.sentiment returns: Sentiment (polarity=0.80, subjectivity = 0.44)
The normal floating point range is [- 1.0 ~ 1.0], while the positive emotion is between [0. 0 ~ 1.0].
Classification:
From textblob.classifiers import NaiveBayesClassifier training_data = [('sentence example good one',' pos'), ('sentence example great two',' pos'), ('sentence example bad three',' neg'), ('sentence example worse four',' neg')] testing_data = [('sentence example good',' pos'), ('sentence example great',' pos')] cl = NaiveBayesClassifier (training_data)
You can use this classifier to classify text, which will return "pos" or "neg" output.
This simple code from textblob provides very powerful and useful emotion analysis and classification.
PyLDAvis
Another tool that uses NLP is pyLDAvis [10]. It is a library of interactive topic model visualization tools. For example, when I use LDA (latent Dirichlet Distribution) to execute a topic model, I usually see the topic output in the cell, which can be difficult to read. However, when it appears in a good visual summary, it is more beneficial and easier to digest, just like pyLDAvis.
Here are some features you can access using pyLDAvis:
Shows the top 30 most prominent terms
There is an interactive regulator that allows you to slide the correlation measure
Hot topics showing PC1 on the x-axis and PC2 on the y-axis
Display topics corresponding to the size
Overall, this is an impressive way to visualize themes that no other library can do.
How to install:
Pip install pyldavis import pyldavis
Example:
To see the best example, here is a Jupyter Notebook [11] reference that shows many unique and useful features of this data science library: https://nbviewer.jupyter.org/github/bmabey/pyLDAvis/blob/master/notebooks/pyLDAvis_overview.ipynb
NetworkX
This data science package NetworkX [13] focuses its advantages on the visualization of biological, social, and infrastructure networks.
Here are some features you can access using NetworkX:
Create shapes, nodes, and edges
Check the elements of the diagram
Graph structure
Properties of a graph
Multiple graph
Graphic generators and operations
How to install:
Pip install networkx import networkx
Example:
Create a drawin
Import networkx graph = networkx.Graph ()
You can collaborate with other libraries, such as matplotlib.pyplot, or you can create visualization of graphics (in a way that data scientists are used to seeing).
This is the end of the content of "what are the Python libraries of common data science"? thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.