Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to build Jupyter and data Science Environment on Fedora

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to build Jupyter and data science environment on Fedora". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's ideas to study and learn "how to build Jupyter and data science environment on Fedora".

Jupyter IDE

Most modern data scientists work with Python. An important part of their work is exploratory data analysis Exploratory Data Analysis (EDA). EDA is a manual and interactive process, including extracting data, exploring data features, finding correlation, visualizing data by drawing graphics and understanding the distribution characteristics of data, and implementing a prototype prediction model.

Jupyter is a web application that is qualified for the job. The Notebook file used by Jupyter supports rich text, including beautifully rendered mathematical formulas (thanks to mathjax), code blocks, and code output (including graphic output).

The suffix of the Notebook file is .ipynb, which means "interactive Python Notebook".

Set up and run Jupyter

First, install the Jupyter core package using sudo:

$sudo dnf install python3-notebook mathjax sscg

You may need to install some additional optional modules commonly used by data scientists:

$sudo dnf install python3-seaborn python3-lxml python3-basemap python3-scikit-image python3-scikit-learn python3-sympy python3-dask+dataframe python3-nltk

Set a password to log in to Notebook's web interface to avoid using lengthy tokens. You can run the following command anywhere in the terminal:

$mkdir-p $HOME/.jupyter$ jupyter notebook password

Then enter your password, and the $HOME/.jupyter/jupyter_notebook_config.json file will be created automatically, containing the encrypted version of your password.

Next, generate a self-signed HTTPS certificate for Jupyter's web server by using SSLby:

$cd $HOME/.jupyter; sscg

The * * step in configuring Jupyter is to edit the $HOME/.jupyter/jupyter_notebook_config.json file. Edit the file according to the template below:

{"NotebookApp": {"password": "sha1:abf58...87b", "ip": "*", "allow_origin": "*", "allow_remote_access": true, "open_browser": false, "websocket_compression_options": {}, "certfile": "/ home/aviram/.jupyter/service.pem" "keyfile": "/ home/aviram/.jupyter/service-key.pem", "notebook_dir": "/ home/aviram/Notebooks"}

/ home/aviram/ should be replaced with your folder. The sha1:abf58...87b section is automatically generated after you have created the password. Service.pem and service-key.pem are encryption-related files generated by sscg.

Next, create a folder to store Notebook files, which should be the same as notebook_dir in the above configuration:

$mkdir $HOME/Notebooks

You have completed the configuration. You can now start Jupyter Notebook anywhere in the system with the following command:

$jupyter notebook

Or add the following line of code to the $HOME/.bashrc file to create a shortcut command called jn:

Alias jn='jupyter notebook'

After running the jn command, you can access it through any browser within the network (LCTT translation: please replace the domain name with the server's domain name), you can see the Jupyter user interface, you need to log in with the password set above. You can try typing some Python code and markup text that looks like this:

Jupyter with a simple notebook

In addition to the IPython environment, the installation process generates a web-based Unix terminal provided by terminado. Some people think this is very practical, while others think it is not very safe. You can disable this feature in the configuration file.

JupyterLab: the next Generation Jupyter

JupyterLab is the next generation of Jupyter, with a better user interface and more manipulative workspaces. At the time of this writing, JupyterLab does not have a RPM package available, but you can easily install it using pip:

$pip3 install jupyterlab-user$ jupyter serverextension enable-py jupyterlab

Then run the jupiter notebook command or the jn shortcut command. Access (LCTT translation: replace the domain name with the server's domain name) and you can use JupyterLab.

Tools used by data scientists

In the following section, you will learn about some of the tools used by data scientists and their installation methods. Unless otherwise noted, these tools should already have a version of the Fedora package and have been installed as required packages for the previous components.

Numpy

Numpy is a high-level library optimized for the C language to handle large in-memory datasets. It supports high-level multidimensional matrix and its operations, and includes mathematical functions such as log (), exp (), trigonometric functions and so on.

Pandas

In my opinion, it was Pandas that made Python a platform for data science. Pandas is built on top of Numpy, which makes data preparation and data presentation much easier. You can think of it as a spreadsheet program without a user interface, but it can handle a much larger dataset. Pandas supports some of the functions of extracting data from SQL databases or files in CSV format, operating by column or row, filtering data, and realizing data visualization through Matplotlib.

Matplotlib

Matplotlib is a library for drawing 2D and 3D data images, providing pretty good support for image annotations, labels, and overlay layers.

Matplotlib pair of graphics showing a cost function searching its optimal value through a gradient descent algorithm

Seaborn

Seaborn is built on Matplotlib, and its drawing function is optimized to make it more suitable for statistical research of data. For example, it can automatically display the approximate regression line or normal distribution curve of the drawn data.

Linear regression visualised with SeaBorn

StatsModels

StatsModels provides algorithmic support for data analysis problems in statistics and econometrics, such as linear regression and logical regression, as well as the classical time series algorithm family ARIMA.

Normalized number of passengers across time\ (blue\) and ARIMA-predicted number of passengers\ (red\)

Scikit-learn

As the core component of the machine learning ecosystem, Scikit provides prediction algorithms for different types of problems, including regression problems (algorithms including Elasticnet, Gradient Boosting, random forests, etc.), classification problems and clustering problems (algorithms including K-means and DBSCAN, etc.), and has a well-designed API. Scikit also defines specialized Python classes to support advanced techniques for data manipulation, such as splitting data sets into training sets and test sets, dimensionality reduction algorithms, data preparation pipeline processes, and so on.

XGBoost

XGBoost is a regression and classifier that can be used at present. It is not part of Scikit-learn, but it follows Scikit's API. XGBoost does not have a package for Fedora, but it can be installed using pip. Using Nvidia graphics card can improve the performance of XGBoost algorithm, but this can not be achieved through pip software package. If you want to use this feature, you can compile for CUDA (parallel computing platform developed by Nvidia) yourself. Use the following command to install XGBoost:

$pip3 install xgboost-userImbalanced Learn

Imbalanced-learn is a tool to solve the problem of undersampling and oversampling of data. For example, in the anti-fraud problem, the amount of fraud data is very small compared to normal data, so it is necessary to enhance the fraud data so that the predictor can better adapt to the data set. Install using pip:

$pip3 install imblearn-userNLTK

Natural Language toolkit (NLTK for short) is a tool for processing human language data. For example, it can be used to develop a chat robot.

SHAP

Machine learning algorithms have strong prediction ability, but they can not well explain why they make predictions of one kind or another. SHAP can solve this problem by analyzing the trained model.

Where SHAP fits into the data analysis process

Install using pip:

$pip3 install shap-userKeras

Keras is a library of deep learning and neural network models, installed using pip:

$sudo dnf install python3-h6py$ pip3 install keras-userTensorFlow

TensorFlow is a very popular neural network modeling tool, which is installed using pip:

Pip3 install tensorflow-user Thank you for your reading, the above is the content of "how to build Jupyter and data science environment on Fedora". After the study of this article, I believe you have a deeper understanding of how to build Jupyter and data science environment on Fedora, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report