Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use file operations with more than 99% of Python

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article focuses on "how to use more than 99% of Python file operations", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "how to use more than 99% of Python file operations"!

I. opening and closing files

When you want to read or write to a file, the first thing to do is to open the file. Python has an open built-in function that opens the file and returns the file object. The type of file object depends on the mode in which the file is opened. It can be text file objects, original binaries, and buffered binaries. Every file object has methods such as read () and write ().

There is a problem in this code block, can you identify it? We will discuss it later.

File = open ("test_file.txt", "w +") file.read () file.write ("a new line")

The Python document lists all possible file schemas. The most common patterns are listed in the table. An important rule is that any w-related schema will first truncate the file (if it exists) and then create a new file. If you do not want to overwrite the file, use this mode carefully and use additional mode whenever possible.

Mode meaning r open for reading (default) r + open for read and write (file pointer at the beginning of the file) w open for write (truncate the file if present) w + can read and write at the same time (truncate the file, if present) an open write operation (append to the end of the file, if present, and the file pointer is at the end of the file)

The problem with the previous code block is that we only opened the file, but did not close it. It is important to always close the file when working with the file. Having an open file object can lead to unpredictable behavior, such as resource leaks. There are two ways to ensure that the file is closed correctly.

1. Use close ()

The first method is to explicitly use close (). A good practice is to put it at the end so that we can make sure that the file will be closed in any case. It makes the code clearer, but on the other hand, developers should take responsibility and don't forget to turn it off.

Try: file = open ("test_file.txt", "w +") file.write ("a new line") exception Exception as e: logging.exception (e) finally: file.close ()

two。 Using the context manager, open (...) Set to f

The second method is to use the context manager. If you are not familiar with the context manager, check out the context manager and "with" statements written by Dan Bader in Python. Used with open () because the f statement implements the _ _ enter__ and _ _ exit__ methods to open and close the file. In addition, it encapsulates try / finally statements in the context manager, which means we will never forget to close the file.

With open ("test_file", "w +") as file: file.write ("a new line")

Is this context manager solution always better than close ()? It depends on where you use it. The following example implements three different ways to write 50000 records to a file. As you can see from the output, the use_context_manager_2 () function has extremely low performance compared to other functions. This is because the with statement is in a separate function, which basically opens and closes the file for each record. This expensive I / O operation can greatly affect performance.

Def _ write_to_file (file, line): with open (file, "a") as f: f.write (line) def _ valid_records (): for i in range (100000): if I% 2 = = 0: yield i def use_context_manager_2 (file): for line in _ valid_records (): _ write_to_file (file Str (line)) def use_context_manager_1 (file): with open (file, "a") as f: for line in _ valid_records (): f.write (str (line)) def use_close_method (file): F = open (file) "a") for line in _ valid_records (): f.write (str (line)) f.close () use_close_method ("test.txt") use_context_manager_1 ("test.txt") use_context_manager_2 ("test.txt") # Finished 'use_close_method' in 0.0253 secs # Finished' use_context_manager_1' in 0.0231 secs # Finished 'use_context_manager_2' in 4.6302 secs

Second, read and write documents

After you open the file, you must read or write to the file. The file object provides three methods for reading files, namely read (), readline (), and readlines ().

By default, read (size =-1) returns the full contents of the file. If the file is larger than memory, the optional parameter size can help you limit the size of characters (text mode) or bytes (binary mode) returned.

Readline (size =-1) returns the entire line, including the character\ n finally. If size is greater than 0, it returns the maximum number of characters from that line.

Readlines (hint =-1) returns all lines of the file in the list. The optional parameter hint indicates that if the number of characters returned exceeds hint, no rows will be returned.

Of the three methods, read () and readlines () are less memory efficient because by default, they return the complete file as a string or list. A more efficient memory iteration is to use readline () and make it stop reading until an empty string is returned. The empty string "" indicates that the pointer reaches the end of the file.

With open ('test.txt', 'r') as reader: line = reader.readline () while line! = "": line = reader.readline () print (line)

In terms of writing, there are two methods, write () and writelines (). As the name implies, write () writes a string, while writelines () writes a list of strings. It is the developer's responsibility to add\ n at the end.

With open ("test.txt", "w +") as f: f.write ("hi\ n") f.writelines (["this is a line\ n", "this is another line\ n"]) # > > cat test.txt # hi # this is a line # this is another line

If you write text to a special file type, such as JSON or csv, you should use the Python built-in module json or csv at the top of the file object.

Import csv import json with open ("cities.csv", "w +") as file: writer = csv.DictWriter (file, fieldnames= ["city", "country"]) writer.writeheader () writer.writerow ({"city": "Amsterdam", "country": "Netherlands"}) writer.writerows ([{"city": "Berlin", "country": "Germany"}) {"city": "Shanghai", "country": "China"},]) # > cat cities.csv # city,country # Amsterdam,Netherlands # Berlin,Germany # Shanghai,China with open ("cities.json", "w +") as file: json.dump ({"city": "Amsterdam", "country": "Netherlands"}) File) # > > cat cities.json # {"city": "Amsterdam", "country": "Netherlands"}

1. Move the pointer within the file

When we open the file, we get a file handler that points to a specific location. In r and w modes, the processor points to the beginning of the file. In one mode, the handler points to the end of the file.

(1) tell () and seek ()

When we read from a file, the pointer moves to the location where the next read will begin, unless we tell the pointer to move. You can do this in two ways: tell () and seek ().

Tell () returns the current position of the pointer in the form of bytes / characters at the beginning of the file. Seek (offset,whence = 0) moves the handler to a location where the offset character is wherece away. The location can be:

0: from the beginning of the file

1: start from the current location

2: start at the end of the file

In text mode, wherece should only be 0 and offset should be ≥ 0.

With open ("text.txt", "w +") as f: f.write ("0123456789abcdef") f.seek (9) print (f.tell ()) # 9 (pointer moves to 9, next read starts from 9) print (f.read ()) # 9abcdef

two。 Understand file statu

The file system on the operating system can tell you a lot of practical information about files. For example, the size of the file, the time it was created and modified. To get this information in Python, you can use the os or pathlib module. In fact, os and pathlib have a lot in common. Pathlib is a more object-oriented module than os.

3. Operating system

One way to get the full state is to use os.stat ("test.txt"). It returns result objects with a lot of statistical information, such as st_size (file size in bytes), st_atime (last accessed timestamp), st_mtime (last modified timestamp), and so on.

Print (os.stat ("text.txt")) > > os.stat_result (st_mode=33188, st_ino=8618932538, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=16, st_atime=1597527409, st_mtime=1597527409, st_ctime=1597527409)

You can also use os.path to get statistics separately.

Os.path.getatime () os.path.getctime () os.path.getmtime () os.path.getsize ()

Third, the path database

Another way to get the full state is to use pathlib.Path ("text.txt"). Stat (). It returns the same object as os.stat ().

Print (pathlib.Path ("text.txt"). Stat () > > os.stat_result (st_mode=33188, st_ino=8618932538, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=16, st_atime=1597528703, st_mtime=1597528703, st_ctime=1597528703)

In the following sections, we will compare more aspects of os and pathlib.

Copy, move and delete files

Python has many built-in modules to handle file movement. Before you trust the first answer returned by Google, you should be aware that different module choices can lead to different performance. Some modules block threads until the file movement is complete, while others may execute asynchronously.

1. Close

Shutil is the most famous module for moving, copying, and deleting files and folders. It provides four ways to copy files only. Copy (), copy2 () and copyfile ().

Copy () is very similar to copy2 (): copy2 () and copy (). The difference is that copy2 () also copies the metadata of the file, such as the most recent access time, the most recent modification time. But according to the Python documentation, even copy2 () cannot copy all the metadata due to operating system limitations.

Shutil.copy ("1.csv", "copy.csv") shutil.copy2 ("1.csv", "copy2.csv") print (pathlib.Path ("1.csv"). Stat () print (pathlib.Path ("copy.csv"). Stat () print (pathlib.Path ("copy2.csv"). Stat () # 1.csv # os.stat_result (st_mode=33152, st_ino=8618884732, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11 St_atime=1597570395, st_mtime=1597259421, st_ctime=1597570360) # copy.csv # os.stat_result (st_mode=33152, st_ino=8618983930, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395, st_ctime=1597570395) # copy2.csv # os.stat_result (st_mode=33152, st_ino=8618983989, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570395, st_mtime=1597259421, st_ctime=1597570395)

2. 367/5000

Copy () and copyfile (): copy () set the permissions of the new file to the same as the original file, but copyfile () does not copy its permission mode. Second, the target of copy () can be a directory. If a file with the same name exists, it is overwritten, otherwise, a new file is created. However, the destination of copyfile () must be the target file name.

Shutil.copy ("1.csv", "copy.csv") shutil.copyfile ("1.csv", "copyfile.csv") print (pathlib.Path ("1.csv"). Stat () print (pathlib.Path ("copy.csv"). Stat () print (pathlib.Path ("copyfile.csv"). Stat () # 1.csv # os.stat_result (st_mode=33152, st_ino=8618884732, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11 St_atime=1597570395, st_mtime=1597259421, st_ctime=1597570360) # copy.csv # os.stat_result (st_mode=33152, st_ino=8618983930, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395, st_ctime=1597570395) # copyfile.csv # permission (st_mode) is changed # os.stat_result (st_mode=33188, st_ino=8618984694, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395 St_ctime=1597570395) shutil.copyfile ("1.csv", ". / source") # IsADirectoryError: [Errno 21] Is a directory:'. / source'

3. Os

The os module has a system () function that allows you to execute commands in the child shell. You need to pass this command as an argument to system (). This has the same effect as commands executed on the operating system. To move and delete files, you can also use specialized features in the os module.

# copy os.system ("cp 1.csv copy.csv") # rename/move os.system ("mv 1.csv move.csv") os.rename ("1.csv", "move.csv") # delete os.system ("rm move.csv")

4. Copy / move files asynchronously

So far, the solution has always been synchronized, which means that if the file is large and takes more time to move, the program may be blocked. If you want to make the program asynchronous, you can use the threading,multiprocessing or subprocess module to make file operations run in separate threads or in separate processes.

Import threading import subprocess import multiprocessing src = "1.csv" dst = "dst_thread.csv" thread = threading.Thread (target=shutil.copy, args= [src, dst]) thread.start () thread.join () dst = "dst_multiprocessing.csv" process = multiprocessing.Process (target=shutil.copy, args= [src, dst]) process.start () cmd = "cp 1.csv dst_subprocess.csv" status = subprocess.call (cmd, shell=True)

Search for documents

After copying and moving files, you may need to search for file names that match a specific pattern. Python provides many built-in functions for you to choose from.

1. Glob

The glob module finds all pathnames that match the specified pattern according to the rules used by Unix shell. It supports wildcards, such as *?. [] .

Glob.glob ("*. Csv") searches all files in the current directory with the csv extension. Using the glob module, you can also search for files in subdirectories.

> import glob > glob.glob ("* .csv") ['1.csvents,' 2.csvv'] > glob.glob ("* * / * .csv", recursive=True) ['1.csvents,' 2.csvents, 'source/3.csv']

2. Os

The os module is so powerful that it can basically perform file operations. We can simply use os.listdir () to list all the files in the directory and use file.endswith () and file.startswith () to detect the pattern. If you want to traverse the directory, use os.walk ().

Import os for file in os.listdir ("."): if file.endswith (".csv"): print (file) for root, dirs, files in os.walk ("."): for file in files: if file.endswith (".csv"): print (file)

3. Pathlib

Pathlib has similar functionality to the glob module. You can also search for file names recursively. Pathlib has less code and provides more object-oriented solutions than previous os-based solutions.

6. Playback file path

Using file paths is another common task we perform. It can get the relative and absolute paths of the file. It can also connect multiple paths and find the parent directory and so on.

1. Relative path and absolute path

Both os and pathlib provide the ability to get the relative and absolute paths of a file or directory.

Import os import pathlib print (os.path.abspath ("1.txt")) # absolute print (os.path.relpath ("1.txt")) # relative print (pathlib.Path ("1.txt"). Absolute ()) # absolute print (pathlib.Path ("1.txt")) # relative

two。 Connection path

This is how we can connect paths in os and pathlib independently of the environment. Pathlib uses slashes to create subpaths.

Import os import pathlib print (os.path.join ("/ home", "file.txt")) print (pathlib.Path ("/ home") / "file.txt")

3. Get parent directory

Dirname () is a function to get the parent directory in os, while in pathlib, you can just use Path (). Parent to get the parent folder.

Import os import pathlib # relative path print (os.path.dirname ("source/2.csv")) # source print (pathlib.Path ("source/2.csv"). Parent) # source # absolute path print (pathlib.Path ("source/2.csv"). Resolve (). Parent) # / Users//project/source print (os.path.dirname (os.path.abspath ("source/2.csv") # / Users//project/source

4. Operating system path library

Last but not least, I'd like to briefly introduce os and pathlib. As stated in the Python documentation, pathlib is a more object-oriented solution than os. It represents each file path as the appropriate object, not as a string. This brings many benefits to developers, such as making it easier to connect multiple paths, more consistent on different operating systems, and access methods directly from objects.

At this point, I believe you have a deeper understanding of "how to use more than 99% of Python file operations". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report