In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Today, I would like to talk to you about the 10 practical experiences of using Python to deploy machine learning models, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something from this article.
Sometimes, as data scientists, we forget what the company pays us to do. We are first developers, then researchers, and then maybe mathematicians. Our primary responsibility is to quickly develop bug-free solutions.
Just because we can make models doesn't mean we are gods. It doesn't give us the freedom to write junk code.
I've made a lot of mistakes from the start, and I'd like to share with you the most common skills I've seen in ML projects. In my opinion, this is also the skill that is most lacking in the industry at present.
I call them "software illiterates" because many of them are engineers on the non-computer Science course Learning platform (Coursera). I used to be.
If I were to recruit between a great data scientist and a great ML engineer, I would choose the latter. Let's get started.
1. Learn to write abstract classes
Once you start writing abstract classes, you will know how much clarity it can bring to your code base. They execute the same method and method name. If many people work on the same project, everyone will start to use a different approach. This may lead to inefficient confusion.
Import os from abc import ABCMeta, abstractmethod class DataProcessor (metaclass=ABCMeta): "Base processor to be used for all preparation." Def _ init__ (self, input_directory, output_directory): self.input_directory = input_directory self.output_directory = output_directory @ abstractmethod def read (self): "" Read raw data. "" @ abstractmethod def process (self): "Processes raw data. This step should create the raw dataframe with all the required features. Shouldn't implement statistical or text cleaning. "@ abstractmethod def save (self):"Saves processed data." Class Trainer (metaclass=ABCMeta): "Base trainer to be used for all models." Def _ init__ (self, directory): self.directory = directory self.model_directory = os.path.join (directory, 'models') @ abstractmethod def preprocess (self): "" This takes the preprocessed data and returns clean data. This is more about statistical or text cleaning. "@ abstractmethod def set_model (self):" Define model here. "" @ abstractmethod def fit_model (self): "This takes the vectorised data and returns at rained model."@ abstractmethod def generate_metrics (self):" Generates metric with trained model and test data. "" @ abstractmethod def save_model (self) Model_name): "This method saves the model in our required format." Class Predict (metaclass=ABCMeta): "Base predictor to be used for all models." Def _ init__ (self, directory): self.directory = directory self.model_directory = os.path.join (directory 'models') @ abstractmethod def load_model (self): "Load model here."@ abstractmethod def preprocess (self):" This takes the raw data and returns clean data for prediction. "" @ abstractmethod def predict (self): "This is used for prediction." Class BaseDB (metaclass=ABCMeta): "" Base database class to be used for all DB connectors. "" @ abstractmethod def get_connection (self): "This creates a new DB connection."@ abstractmethod def close_connection (self):" This closes the DB connection. "
two。 Set your random number seed at the front
The repeatability of the experiment is very important, and the seed is our enemy. Catch it, otherwise it will lead to different training / test data segmentation and different weights to initialize the neural network. This led to inconsistent results.
Def set_seed (args): random.seed (args.seed) np.random.seed (args.seed) torch.manual_seed (args.seed) if args.n_gpu > 0: torch.cuda.manual_seed_all (args.seed)
3. Start with a few lines of data
If your data is too large and your job is the rest of the code, such as cleaning up the data or modeling, you can use nrows to avoid loading huge data each time. Use this method when you only want to test the code without actually running the entire code.
This is very useful when your local PC configuration cannot load all the data, but you like to develop locally.
Df_train = pd.read_csv ('train.csv', nrows=1000)
4. Anticipate failure (a sign of a mature developer)
Be sure to check the NA in the data, because these will cause you problems in the future. Even if you don't have current data, that doesn't mean it won't happen in the future retraining cycle. So keep checking anyway.
Print (len (df)) df.isna (. Sum () df.dropna () print (len (df))
5. Show processing progress
When you are dealing with big data, knowing how long it will take and our position in the whole process will certainly make you feel good.
Option 1-tqdm
From tqdm import tqdm import time tqdm.pandas () df ['col'] = df [' col']. Progress_apply (lambda x: Xerox) text = "" for char in tqdm (["a", "b", "c", "d"]): time.sleep (0.25) text = text + char
Option 2-fastprogress
From fastprogress.fastprogress import master_bar, progress_bar from time import sleep mb = master_bar (range (10)) for i in mb: for j in progress_bar (range, parent=mb): sleep mb.child.comment = f'second bar stat' mb.first_bar.comment = f'first bar stat' mb.write (f'Finished loop {I}.')
6. Pandas is slow
If you've ever used pandas, you know how slow it is sometimes-- especially groupby. Instead of breaking your head to find a "great" solution to speed up, just change a single line of code using modin.
Import modin.pandas as pd
7. Time of statistical function
Not all functions are created equal.
Even if the whole code works, it doesn't mean the code you write is great. Some software bug actually slows down your code, so it's necessary to find them. Use this decorator to record the time of the function.
Import time def timing (f): "Decorator for timing functions Usage: @ timing def function (a): pass"@ wraps (f) def wrapper (* args, * * kwargs): start = time.time () result = f (* args) * * kwargs) end = time.time () print ('function:%r took:% 2.2f sec'% (f. End-start)) return result return wrapper
8. Don't burn money on the cloud.
No one likes engineers who waste cloud resources.
Some of our experiments can last for hours. It is difficult to track it and close the cloud instance when it is complete. I have made mistakes myself, and I have seen people open examples for several days.
This happened on Friday. After leaving, I realized on Monday.
As long as you call this function at the end of the execution, your ass will never catch fire again!
But wrap the main code in try, and this method in except-- so that if an error occurs, the server will not continue to run. Yes, I've dealt with these situations, too.
Let's be more responsible and don't produce carbon dioxide.
Import os def run_command (cmd): return os.system (cmd) def shutdown (seconds=0, os='linux'): "Shutdown system after seconds given. Useful for shutting EC2 to save costs." If os = 'linux': run_command (' sudo shutdown-h-t sec% s'% seconds) elif os = = 'windows': run_command (' shutdown-s-t% s'% seconds)
9. Create and save reports
After a particular point in modeling, all great insights come only from error and metric analysis. Be sure to create and save well-formed reports for yourself and your management.
Management likes to report, right?
Import json import os from sklearn.metrics import (accuracy_score, classification_report, confusion_matrix, f1_score, fbeta_score) def get_metrics (y, y_pred, beta=2, average_method='macro' Y_encoder=None): if y_encoder: y = y_encoder.inverse_transform (y) y_pred = y_encoder.inverse_transform (y_pred) return {'accuracy': round (accuracy_score (y, y_pred), 4),' F1 score macrostructures: round (f1_score (y, y_pred, average=average_method), 4), 'fbeta_score_macro': round (fbeta_score (y) Y_pred, beta, average=average_method), 4), 'report': classification_report (y, y_pred, output_dict=True),' report_csv': classification_report (y, y_pred, output_dict=False). Replace ('\ nMe'')} def save_metrics (metrics: dict, model_directory, file_name): path = os.path.join (model_directory) File_name +'_ report.txt') classification_report_to_csv (metrics ['report_csv'], path) metrics.pop (' report_csv') path = os.path.join (model_directory, file_name +'_ metrics.json') json.dump (metrics, open (path,'w'), indent=4)
10. Write a good APIs
A bad result is a bad result.
You can do a good job of data cleaning and modeling, but it can still cause a lot of confusion in the end. My experience with people tells me that many people don't know how to write good api, documentation, and server settings. I will write another article about this soon, but let me get started.
Here are some good ways to deploy classic ML and DL under low loads (such as 1000/min).
Fasbut + uvicorn
Fastest-write API in fastapi because it's fast.
Documentation-write API in fastapi so that we don't have to worry about documentation.
Workers-deploy API using uvicorn
Use four worker to run these commands for deployment. Optimize the number of workers through load testing.
Pip install fastapi uvicorn uvicorn main:app-- workers 4-- host 0.0.0.0-- port 8000 after reading the above, do you have any further understanding of the 10 practical experiences of deploying a machine learning model using Python? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.