In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you how to use Python and GNU Octave to draw data, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something. Learn how to use Python and GNU Octave to accomplish a common data science task.
Data science is a field of knowledge that spans programming languages. Some languages are famous for solving problems in this field, while others are little known. This article will help you familiarize yourself with data science work in some popular languages.
Choose Python and GNU Octave to do data science work
I often try to learn a new programming language. Why? This is both the boredom of the old way and the curiosity of the new way. When I started to learn programming, the only language I knew was C. Those years of programming were difficult and dangerous because I had to allocate memory manually, manage pointers, and remember to free memory.
Then a friend suggested that I try Python, and now my programming life is much easier. Although the program has become much slower, I don't have to suffer by writing analysis software. However, I soon realized that each language has its own application scenarios that are more suitable than other languages. Later, I learned some other languages, and each language brought me some new inspiration. Discovering the new programming style allows me to port some solutions to other languages, which makes everything much more interesting.
To learn something about a new programming language (and its documentation), I always start by writing sample programs that perform tasks that I am familiar with. To do this, I'll explain how to write a program in Python and GNU Octave to accomplish a special task that you can classify as data science. If you are already familiar with one of the languages, start with it, and then look for similarities and differences in other languages. This article is not a detailed comparison of programming languages, but a small demonstration.
All programs should be run on the command line instead of using a graphical user interface (GUI). The complete example can be found in the polyglot_fit repository.
Programming task
The programs you will write in this series:
Read data from a CSV file
Insert data with a straight line (for example, f (x) = m ⋅ x + Q)
Generate the result into an image file
This is a common situation encountered by many data scientists. The sample data is the first set of Anscombe's quartet, as shown in the following table. This is a set of artificially constructed data that gives the same result when fitted with a straight line, but their curves are very different. A data file is a text file with tabs as column delimiters and the first few lines as headings. This task will use only the first group (that is, the first two columns).
Python mode
Python is a general-purpose programming language and one of the most popular languages today (based on TIOBE index, RedMonk programming language ranking, programming language popularity index, GitHub Octoverse status, and other sources). It is an interpreted language; therefore, the source code is read and evaluated by the program that executes the instruction. It has a comprehensive library of standards and is generally very useful (I have no proof of this last sentence; this is just my humble opinion).
Installation
To develop with Python, you need an interpreter and some libraries. The minimum requirements are:
NumPy is used to simplify the operation of arrays and matrices
SciPy is used in data Science
Matplotlib is used for drawing
It is easy to install them on Fedora:
Sudo dnf install python3 python3-numpy python3-scipy python3-matplotlib code comments
In Python, comments are achieved by adding a # at the beginning of the line, and the rest of the line is discarded by the interpreter:
# this is a comment ignored by the interpreter.
The fitting_python.py example uses comments to insert license information into the source code, with the first line being a special comment that allows the script to execute on the command line:
#! / usr/bin/env python3
This line informs the command line interpreter that the script needs to be executed by the program python3.
Required libraries
In Python, libraries and modules can be imported as an object (such as the first line in the example) that contains all the functions and members of the library. You can rename them with custom tags by using as:
Import numpy as npfrom scipy import statsimport matplotlib.pyplot as plt
You can also decide to import only one submodule (such as the second and third lines). There are two (basically) equivalent ways of syntax: import module.submodule and from module import submodule.
Define variable
The variable for Python is declared the first time it is assigned:
Input_file_name = "anscombe.csv" delimiter = "\ t" skip_header = 3column_x = 0column_y = 1
The variable type is inferred from the value assigned to the variable. There are no variables with constant values unless they are declared in the module and can only be read. Traditionally, variables that should not be modified should be named in uppercase letters.
Printout
Running the program from the command line means that the output can only be printed on the terminal. Python has a print () function, which by default prints its arguments and adds a newline character to the end of the output:
Print ("# Anscombe's first set with Python #")
In Python, you can combine the print () function with the formatting capabilities of string classes. The string has a format method that can be used to add some formatted text to the string itself. For example, you can add formatted floating-point numbers, such as:
Print ("Slope: {: F}" .format (slope)) reads data
It is very easy to read the CSV file using NumPy and the function genfromtxt (), which generates an array of NumPy:
Data = np.genfromtxt (input_file_name, delimiter = delimiter, skip_header = skip_header)
In Python, a function can have a variable number of parameters, and you can pass a subset of parameters by specifying the required parameters. Arrays are very powerful matrix objects that can be easily split into smaller arrays:
X = data [:, column_x] y = data [:, column_y]
The colon selects the entire range, or it can be used to select a subrange. For example, to select the first two rows of an array, you can use:
First_two_rows = data [0:1,:] fitting data
SciPy provides convenient data fitting functions, such as the linregress () function. This function provides some important values related to the fit, such as the slope, intercept, and the correlation coefficient of the two data sets:
Slope, intercept, r_value, p_value, std_err = stats.linregress (x, y) print ("Slope: {: F}" .format (slope)) print ("Intercept: {: F}" .format (intercept)) print ("Correlation coefficient: {: f}" .format (r_value))
Because linregress () provides several pieces of information, the results can be saved to several variables at the same time.
Drawing
The Matplotlib library only draws data points, so you should define the coordinates of the points to be drawn. The x and y arrays have been defined, so you can draw them directly, but you also need data points that represent straight lines.
Fit_x = np.linspace (x.min ()-1, x.max () + 1,100)
The linspace () function can easily generate a set of equidistant values between two values. The ordinate can be easily calculated with the powerful NumPy array, which can be used in formulas like ordinary numeric variables:
Fit_y = slope * fit_x + intercept
The formula is applied element by element in the array; therefore, the result has the same number of entries in the initial array.
To draw, first, define a drawing object that contains all drawings:
Fig_width = 7 # inchfig_height = fig_width / 16 * 9 # inchfig_dpi = 100fig = plt.figure (figsize = (fig_width, fig_height), dpi = fig_dpi)
You can draw several graphs of a figure; in Matplotlib, these graphs are called axes. This example defines a single-axis object to draw data points:
Ax = fig.add_subplot (111l) ax.plot (fit_x, fit_y, label = "Fit", linestyle ='-') ax.plot (x, y, label = "Data", marker ='.', linestyle ='') ax.legend () ax.set_xlim (min (x)-1, max (x) + 1) ax.set_ylim (min (y)-1, max (y) + 1) ax.set_xlabel ('x') ax.set_ylabel ('y')
Save the diagram to a PNG drawing file, including:
Fig.savefig ('fit_python.png')
If you want to display (rather than save) the drawing, call:
Plt.show ()
This example references all the objects used in the drawing section: it defines the object fig and the object ax. This is technically unnecessary because plt objects can be used directly to draw datasets. The Matplotlib tutorial shows an interface like this:
Plt.plot (fit_x, fit_y)
Frankly, I don't like this approach because it hides the important interactions that take place between various objects. Unfortunately, sometimes official examples are a bit confusing because they tend to use different methods. In this simple example, referencing graphical objects is unnecessary, but in more complex examples (such as when embedding graphics in a graphical user interface), referencing graphical objects becomes important.
Result
Enter on the command line:
# Anscombe's first set with Python # Slope: 0.500091Intercept: 3.000091Correlation coefficient: 0.816421
This is the image produced by Matplotlib:
Plot and fit of the dataset obtained with Python
GNU Octave mode
GNU Octave language is mainly used for numerical calculation. It provides a simple syntax for manipulating vectors and matrices, and has some powerful drawing tools. This is an interpretive language like Python. Because Octave's syntax is almost compatible with MATLAB, it is often described as a free alternative to MATLAB. Octave is not listed as the most popular programming language, while MATLAB is, so Octave is quite popular in a sense. MATLAB predates NumPy, and I think it is inspired by the former. When you look at this example, you will see the similarities.
Installation
The fitting_octave.m example requires only a basic Octave package, which is fairly easy to install in Fedora:
Sudo dnf install octave code comments
In Octave, you can annotate the code with the percentage sign (%), or you can use # if you don't need to be compatible with MATLAB. The option to use # allows you to write special comment lines like the Python example to execute the script directly on the command line.
Necessary library
Everything used in this example is included in the base package, so you don't need to load any new libraries. If you need a library, the syntax is pkg load module. This command adds the functionality of the module to the list of available features. In this regard, Python has more flexibility.
Define variable
The definition of a variable is basically the same as the syntax of Python:
Input_file_name = "anscombe.csv"; delimiter = "\ t"; skip_header = 3 columnism x = 1 tern column y = 2
Note that there is a semicolon at the end of the line; this is not required, but it suppresses the output of the result of the line. If there is no semicolon, the interpreter prints the result of the expression:
Octave:1 > input_file_name = "anscombe.csv" input_file_name = anscombe.csvoctave:2 > sqrt (2) ans = 1.4142 printout result
The powerful function printf () is used to print on the terminal. Unlike Python, the printf () function does not automatically add a newline to the end of the printed string, so you have to add it. The first argument is a string that can contain formatting information for other parameters to be passed to the function, such as:
Printf ("Slope:% f\ n", slope)
In Python, the format is built into the string itself, but in Octave, it is specific to the printf () function.
Read data
The dlmread () function reads text contents similar to CSV files:
Data = dlmread (input_file_name, delimiter, skip_header, 0)
The result is a matrix object, which is one of the basic data types in Octave. Matrices can be sliced in a syntax similar to Python:
X = data (:, column_x); y = data (:, column_y)
The fundamental difference is that the index starts at 1, not 0. Therefore, in this example, column x is the first column.
Fitting data
To fit the data with a straight line, you can use the polyfit () function. It uses a polynomial to fit the input data, so you only need to use a first-order polynomial:
P = polyfit (x, y, 1); slope = p (1); intercept = p (2)
The result is a matrix with polynomial coefficients; therefore, it selects the first two indexes. To determine the correlation coefficient, use the corr () function:
R_value = corr (x, y)
Finally, print the result using the printf () function:
Printf ("Slope:% f\ n", slope); printf ("Intercept:% f\ n", intercept); printf ("Correlation coefficient:% f\ n", r_value); drawing
As with the Matplotlib example, you first need to create a dataset that represents the fit line:
Fit_x = linspace (min (x)-1, max (x) + 1,100); fit_y = slope * fit_x + intercept
The similarity to NumPy is also obvious because it uses the linspace () function, which behaves like the equivalent of Python.
Again, like Matplotlib, first create a graph object, and then create an axis object to hold the graphs:
Fig_width = 7;% inchfig_height = fig_width / 16 * 9;% inchfig_dpi = 100; fig = figure ("units", "inches", "position", [1,1, fig_width, fig_height]); ax = axes ("parent", fig); set (ax, "fontsize", 14); set (ax, "linewidth", 2)
To set the properties of the axis object, use the set () function. However, the interface is rather confusing because the function requires a comma-separated list of properties and value pairs. These pairs are just a string representing the name of the property and a second object representing the value of the property. There are other functions that set various properties:
Xlim (ax, [min (x)-1, max (x) + 1]); ylim (ax, [min (y)-1, max (y) + 1]); xlabel (ax,'x'); ylabel (ax,'y')
The drawing is implemented using the plot () function. The default behavior is to reset the axes with each call, so you need to use the function hold ().
Hold (ax, "on"); plot (ax, fit_x, fit_y, "marker", "none", "linestyle", "-", "linewidth", 2); plot (ax, x, y, "marker", ".", "markersize", 20, "linestyle", "none"); hold (ax, "off")
In addition, you can add attribute and value pairs to the plot () function. The legend must be created separately, and the label should be declared manually:
Lg = legend (ax, "Fit", "Data"); set (lg, "location", "northwest")
Finally, save the output to the PNG image:
Image_size = sprintf ("- S% fje% f", fig_width * fig_dpi, fig_height * fig_dpi); image_resolution = sprintf ("- r% fje% f", fig_dpi); print (fig, 'fit_octave.png','-dpng', image_size, image_resolution)
Confusingly, in this case, the option is passed as a string with the property name and value. Because there is no formatting tool for Python in the Octave string, you must use the sprintf () function. It behaves like the printf () function, but its result is not printed, but returned as a string.
In this example, as in Python, graphical objects are clearly referenced to maintain interaction between them. If Python's documentation in this area is a little messy, Octave's documentation is even worse. Most of the examples I find don't care about reference objects; instead, they rely on drawing commands to act on the currently active drawing. The global root drawing object tracks existing drawings and axes.
Result
The result output on the command line is:
# Anscombe's first set with Octave # Slope: 0.500091Intercept: 3.000091Correlation coefficient: 0.816421
It shows the resulting image generated with Octave.
Plot and fit of the dataset obtained with Octave
Next
Both Python and GNU Octave can plot the same information, although they are implemented differently. If you want to explore other languages to accomplish similar tasks, I strongly recommend that you take a look at Rosetta Code. This is a great resource to see how to solve the same problem in multiple languages.
The above content is how to use Python and GNU Octave to draw data. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.