How to use Python to realize data Import and Visualization 04/16 Update SLTechnology News&Howtos

How to use Python to realize data Import and Visualization

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article will explain in detail how to use Python to achieve data import and visualization. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

Data import and visualization

Usually, the first step in data analysis consists of obtaining data and importing it into our work environment. We can simply download the data using the following Python code:

Python

Import urllib2

Url = 'http://aima.cs.berkeley.edu/data/iris.csv'

U = urllib2.urlopen (url)

LocalFile = open ('iris.csv'', 'w')

LocalFile.write (u.read ())

LocalFile.close ()

In the code snippet above, we used the urllib2 class library to get a file from the Berkeley website and saved it to the local disk using the File object provided by the standard class library. The data contains the iris data set, which is a multivariate data set containing 50 data samples each of three species of irises (mountain irises, Virginia irises and discolored irises), each with four characteristics (or variables), namely the length and width of calyx and petal. In centimeters.

The dataset is stored in CSV (comma split values) format. CSV file can be easily converted and the information in it can be stored as a suitable data structure. This dataset has five columns, the first four columns contain eigenvalues, and the last column represents the sample type. The CSV file can be easily parsed by the genfromtxt method of the numpy class library:

Python

From numpy import genfromtxt, zeros

# read the first 4 columns

Data = genfromtxt ('iris.csv',delimiter=',',usecols= (0min1pm 2p3))

# read the fifth column

Target = genfromtxt ('iris.csv',delimiter=',',usecols= (4), dtype=str)

In the above example, we create a matrix containing eigenvalues and a vector containing sample types. We can confirm the size of the dataset by looking at the shape value of the data structure we loaded:

Python

Print data.shape

(150,4)

Print target.shape

(150,)

We can also see how many sample types we have and their names:

Python

Print set (target) # build a collection of unique elements

Set (['setosa',' versicolor', 'virginica'])

When we deal with new data, a very important task is to try to understand the information contained in the data and its organizational structure. Visualization can display the data flexibly and vividly, which helps us to understand the data deeply.

Using the plotting method of the pylab class library (the interface of matplotlib), we can create a two-dimensional scatter graph that allows us to analyze the two eigenvalues of the dataset in two dimensions:

Python

From pylab import plot, show

Plot (data [target = = 'setosa',0], data [target = =' setosa',2], 'bo')

Plot (data [target = = 'versicolor',0], data [target = =' versicolor',2], 'ro')

Plot (data [target = = 'virginica',0], data [target = =' virginica',2], 'go')

Show ()

The above code uses the first and third dimensions (the length and width of the calyx), and the result is as follows:

There are 150 dots in the picture above, and different colors represent different types; blue dots represent mountain kites, red dots represent discolored irises, and green dots represent Virginia irises.

Another common way to view data is to draw a histogram by property. In this case, since the data is divided into three categories, we can compare the distribution characteristics of each category. The following code can draw the first property (the length of the calyx) of each type in the data:

Python

From pylab import figure, subplot, hist, xlim, show

Xmin = min (data [:, 0])

Xmax = max (data [:, 0])

Figure ()

Subplot # distribution of the setosa class (1st, on the top)

Hist (data [target = = 'setosa',0], color='b',alpha=.7)

Xlim (xmin,xmax)

Subplot # distribution of the versicolor class (2nd)

Hist (data [target = = 'versicolor',0], color='r',alpha=.7)

Xlim (xmin,xmax)

Subplot (413) # distribution of the virginica class (3rd)

Hist (data [target = = 'virginica',0], color='g',alpha=.7)

Xlim (xmin,xmax)

Subplot # global histogram (4th, on the bottom)

Hist (data [:, 0], color='y',alpha=.7)

Xlim (xmin,xmax)

Show ()

The result is as follows:

According to the histogram of the above figure, we can distinguish and understand the characteristics of the data according to the data type.

On "how to use Python to achieve data import and visualization" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.