Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the problems with Jupyter Notebook?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what are the problems of Jupyter Notebook". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The problem with Jupyter Notebook

Usually, if you want to develop Jupyter prototypes, please climb to another tall building, and things may not turn out as you expected. These are some of the situations that the author encountered when using this tool, which you should also be familiar with:

Once all objects (functions or classes) are defined and instantiated, maintainability becomes very difficult: even if you want to make small changes to the function, you have to fix it somewhere in the notebook and rerun the recoding. You don't want this to happen. Isn't it easier to separate logic and processing functions into external scripts?

Because of its interactivity and instant feedback, jupyternotebook encourages data scientists to declare variables in the global namespace instead of using functions. This is a bad practice in python development, and it limits effective code reuse.

Because the laptop becomes a large state machine that holds all variables, it also harms its repeatability. In this configuration, you must remember which results are cached and which are not, and you must expect other users to follow your unit execution order.

The way notebooks are formatted in the background (JSON objects) makes code versioning difficult. This is why I rarely see data scientists using GIT to submit different versions of notebooks or merge branches to achieve specific functions.

As a result, teamwork becomes inefficient and clumsy: team members start exchanging code snippets and notebooks via email or Slack, rolling back to previous versions of code becomes a nightmare, and file organization begins to get messy. This is what I usually see in a project after using Jupyter notebook for two to three weeks without proper version control:

Analysis.ipynb analysis_COPY (1). Ipynb analysis_COPY (2). Ipynb analysis_FINAL.ipynb analysis_FINAL_2.ipynb

Jupyter notebook is ideal for exploring and rapid prototyping. They are certainly not designed for reusability or production use. If you have developed a data processing pipeline using Jupyter notebook, the best-case scenario is that the code runs linearly synchronously on a laptop or VM only in the order in which the unit is executed.

But this doesn't explain how your code behaves in more complex environments, such as larger input datasets, other asynchronous parallel tasks, or allocating fewer resources. In fact, it is difficult to test notebooks because their behavior is sometimes unpredictable.

As a person who spends most of my time on VSCode, I often take advantage of powerful extensions for code addition, styling, code structure, autocompletion, and code base search, so I can't help feeling powerless when I switch back to Jupyter. Compared with VSCode, Jupyter notebook lacks extensions that enforce best programming practices.

All right, so much for complaining. The author really likes Jupyter and thinks it is very useful for design work. You can certainly use it to guide small projects or quickly prototype ideas, but you must follow the principles of software engineering. When data scientists use notebook, they sometimes ignore these principles. Let's review some of them.

Tips for making the code good again

These skills are compiled from different projects, parties attended by the author, and discussions among software engineers and architects who have worked together in the past. Note that the following assumes that we are writing python scripts, not notebook.

1. Clean up the code

The most important dimension of code quality is clear, clear and readable code that is critical to collaboration and maintainability. Doing so will help you get more concise code:

Use meaningful descriptive and suggestive variable names. For example, if you want to declare a Boolean variable on an attribute, such as age, to check whether a person is old, you can use is_old to make it both descriptive and typological. The data is declared in the same way: make it interpretive.

# not good... Import pandas as pd df = pd.read_csv (path) # betterboxes = pd.read_csv (path)

Avoid using abbreviations that only you can understand and long variable names that no one can stand.

Do not code "magic numbers" directly in the code. Define them in variables so that everyone can understand what they mean.

# not good... Optimizer = SGD (0.0045, momentum=True) # better! Learning_rate = 0.0045 optimizer = SGD (learning_rate, momentum=True)

Follow the PEP8 convention to name objects: for example, function and method names are represented in lowercase letters, words are separated by underscores, class names follow the UpperCaseCamelCase convention, constants are represented in uppercase letters, and so on.

Use indents and spaces to make the code more beautiful. There are some standard conventions, such as "use 4 spaces per indent", "individual sections should have extra blank lines", and so on.

two。 Modularize the code

When you start to build something that can be reused in the same or other projects, you must organize your code into logical functions and modules, which helps build better organization and maintainability.

For example, you are working on a NLP project, and you may have different processing functions to deal with text data (tags, stripping URL, modifiers, etc.). You can put all these units into a python module called text_processing.py and import them from them, making the main program lighter.

Here are some tips for writing modular code:

Don't repeat yourself. Generalize or merge your code as much as possible.

Function should be used to do one thing. If a function performs multiple operations, it is difficult to generalize.

Abstract logic in a function, but don't overdesign it, or you may end up with too many modules. Use your judgment. If you are inexperienced, check out popular GitHub repositories such as scikit-learn and learn their coding styles.

3. Refactoring code

Refactoring is designed to reorganize the internal structure of the code without changing its functionality, usually on a valid (but not yet fully organized) version of the code. It helps eliminate repetitive functions, reorganize file structures, and add more abstractions.

4. Improve code efficiency

Writing efficient code to execute quickly and consume less memory and storage space is another important skill in software development. It takes years of experience to write efficient code, but here are a few tips to help you determine whether your code is slow and how to improve it:

Before performing any action, check the complexity of the algorithm to evaluate its execution time.

Check for bottlenecks that the script may encounter by checking the run time of each operation.

Avoid for loops and vectorize operations whenever possible, especially if libraries such as NumPy or pandas are used.

Take advantage of the computer's CPU kernel by using multiprocessing.

5. Use GIT or any other version control system

Using GIT + Github helped me improve my coding skills and better organize the project. Because I use it when working with friends and colleagues, I abide by standards I didn't follow in the past.

Whether in data science or software development, there are many advantages to using version control systems.

Track your changes

Roll back to any previous version of the code

Effective collaboration among team members through mergers and requests

Improve code quality

Code review

Assign tasks to team members and provide "continuous integration" and "continuous delivery" hooks to build and deploy projects automatically.

Image source: Atlassian

6. Test code

If you want to build a data pipeline that performs a series of operations and make sure it works for the purpose of the design, one way is to write tests that check the expected behavior. Testing can be as simple as checking the output shape or expected value of a function.

Writing tests for functions and modules has many benefits:

It improves the stability of the code and makes errors easier to find.

Prevent accidental output

Help to detect edge conditions

Prevent broken code from being pushed into the production environment

7. Use logging

Once the first version of the code is running, you need to monitor each step to see what happened, track progress, or find errors, and you can use logging. Here are some techniques for using logging effectively:

Different levels (debugging, information, warnings) are used depending on the nature of the message to be logged.

Provide useful information in the log to help resolve related problems.

Import logging logging.basicConfig (filename='example.log',level=logging.DEBUG) logging.debug ('This message should go to the log file') logging.info (' So should this') logging.warning ('And this, too') what are the problems with Jupyter Notebook? thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report