In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "what are the Python skills for reading, creating and running multiple files". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
motivation
When putting code into production, you probably need to deal with the organization of code files. It is time-consuming to read, create, and run many data files. This article will show you how to automatically
Iterate through the files in the directory
If no nested files exist, create them
Use bash for loop to run a file with different inputs
These skills have saved me a lot of time in data science projects. I hope you will find them useful, too!
Iterate through the files in the directory
If we want to read and process multiple data like this:
├── data │ ├── data1.csv │ ├── data2.csv │ └── data3.csv └── main.py
We can try to read one file at a time manually.
Import pandas as pd def process_data (df): pass df = pd.read_csv (data1.csv) process_data (df) df2 = pd.read_csv (data2.csv) process_data (df2) df3 = pd.read_csv (data3.csv) process_data (df3)
When we have more than three data, this is OK, but not valid. If we only changed the data in the script above, why not use a for loop to access each data?
The following script allows us to traverse the files in the specified directory
Import os import pandas as pd def loop_directory (directory: str):''files in the circular directory' 'for filename in os.listdir (directory): if filename.endswith (".csv"): file_directory = os.path.join (directory) Filename) print (file_directory) pd.read_csv (file_directory) else: continue if _ _ name__=='__main__': loop_directory ('data/') data/data3.csv data/data2.csv data/data1.csv
The following is an explanation of the above script
For filename in os.listdir (directory): iterate through files in a specific directory
If filename.endswith (".csv"): access files ending with ".csv"
File_directory = os.path.join (directory, filename): connect the parent directory ('data') and the files in the directory.
Now we can access all the files in the "data" directory!
If no nested files exist, create them
Sometimes we may want to create nested files to organize code or models, which makes them easier to find in the future. For example, we can use "model 1" to specify a specific feature project.
When using Model 1, we may need to use different types of machine learning models to train our data ("model1/XGBoost").
When using each machine learning model, we may even want to save a different version of the model because the model uses different hyperparameters.
Therefore, our model catalog looks as complex as the following
Model ├── model1 │ ├── NaiveBayes │ └── XGBoost │ ├── version_1 │ └── version_2 └── model2 ├── NaiveBayes └── XGBoost ├── version_1 └── version_2
For each model we create, it can take a lot of time to create a nested file manually. Is there any way to automate this process? Yes, os.makedirs (datapath).
Def create_path_if_not_exists (datapath):''if it does not exist, create a new file and save the data''if not os.path.exists (datapath): os.makedirs (datapath) if_ _ name__=='__main__': create_path_if_not_exists ('model/model1/XGBoost/version_1')
Run the above file and you should see that the nested file 'model/model2/XGBoost/version_2' is automatically created!
Now you can save the model or data to a new directory!
Import joblib import os def create_path_if_not_exists (datapath):''if it doesn't exist, create''if not os.path.exists (datapath): os.makedirs (datapath) if_ _ name__=='__main__': # create directory model_path = 'model/model2/XGBoost/version_2' create_path_if_not_exists (model_path) # Save joblib.dump (model, model_path)
Bash for Loop: run a file with different parameters
What if we want to run a file with different parameters? For example, we might want to use the same script and use different models to predict data.
Import joblib # df =. Model_path = 'model/model1/XGBoost/version_1' model = joblib.load (model_path) model.predict (df)
If a script takes a long time to run and we have multiple models to run, it can be time-consuming to wait for the script to finish and then run the next one. Is there a way to tell the computer to run 1, 2, 3, 10 with a command line, and then do something else?
Yes, we can use for bash for loop. First, we use the system argv to enable us to parse command-line arguments. If you want to overwrite the configuration file on the command line, you can also use tools such as hydra.
Import sys import joblib # df =. Model_type = sys.argv [1] model_version = sys.argv [2] model_path = f'''model/model1/ {model_type} / version_ {model_version}''print (' Loading model from', model_path, 'for training') model = joblib.load (model_path) mode.predict (df) > python train.py XGBoost 1 Loading model from model/model1/XGBoost/version_1 for training
Great! We just told our script to use the model XGBoost,version 1 to predict the data on the command line. Now we can use bash loops to traverse different versions of the model.
If you can execute a for loop using Python, you can also execute it on a terminal such as the following
$for version in 2 3 4 > do > python train.py XGBoost $version > done
Type Enter delimited lines
Output:
Loading model from model/model1/XGBoost/version_1 for training Loading model from model/model1/XGBoost/version_2 for training Loading model from model/model1/XGBoost/version_3 for training Loading model from model/model1/XGBoost/version_4 for training
Now you can run scripts using different models and perform other actions at the same time! How convenient it is!
That's all for "what are the Python tips for reading, creating, and running multiple files?" Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.