What are the most common file manipulation skills in Python 07/01 Update SLTechnology News&Howtos

What are the most common file manipulation skills in Python

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "what are the most common file manipulation skills in Python". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Open and close files

Before reading or writing to a file, the first thing to do is to open the file. Python's built-in function open can open the file and return the file object. The type of file object depends on the mode in which the file is opened, which can be a text file object, a raw binary file, or a buffered binary file object. Every file object has methods such as read () and write ().

Can you see the problem in the following code block? We'll find out later.

File = open ("test_file.txt", "w +") file.read () file.write ("a new line")

The Python document lists all possible file schemas, the most common of which are shown in the table below. But be aware of an important rule: if a file exists, any w-related schema will truncate the file and create a new file. If you do not want to overwrite the original file, please use this mode carefully, or try to use append mode a.

The problem in the previous code block was that the file was not closed after it was opened. It is important to close the file after processing the file, because open file objects may have unpredictable risks such as resource leaks, and there are two ways to ensure that the file is closed correctly.

1. Use close ()

The first method is to explicitly use close (). But it is better to put the code at the end, because this ensures that the file can be closed in any case and makes the code clearer. But developers should also be responsible and remember to close the file.

Try: file = open ("test_file.txt", "w +") file.write ("a new line") exception Exception as e: logging.exception (e) finally: file.close ()

two。 Using the context manager, with open (...) As f

The second method is to use the context manager. If you are not familiar with this, please also refer to the context manager and "with" statements written by Dan Bader in Python. Using withopen () as f, you can use the _ _ enter__ and _ _ exit__ methods to open and close files. In addition, it encapsulates try / finally statements in the context manager so that we don't forget to close the file.

With open ("test_file", "w +") as file: file.write ("a new line")

Which of the two methods is better? It depends on the scene you use. The following example implements three different ways to write 50000 records to a file. As you can see from the output, the use_context_manager_2 () function performs very poorly compared to other functions. This is because the with statement, in a separate function, basically opens and closes files for each record, and this tedious I / O operation can greatly affect performance.

Def _ write_to_file (file, line): with open (file, "a") as f: f.write (line) def _ valid_records (): for i inrange (100000): if I% 2requests 0: yield i def use_context_manager_2 (file): for line in_valid_records (): _ write_to_file (file) Str (line)) def use_context_manager_1 (file): with open (file, "a") as f: for line in_valid_records (): f.write (str (line)) def use_close_method (file): F = open (file "a") for line in_valid_records (): f.write (str (line)) f.close () use_close_method ("test.txt") use_context_manager_1 ("test.txt") use_context_manager_2 ("test.txt") # Finished use_close_method in 0.0253 secs # Finished use_context_manager_1 in 0.0231 secs # Finished use_context_manager_2 in 4.6302 secs

Comparison of close () and with statements

Read and write files

After the file is opened, start reading or writing to the file. The file object provides three methods for reading files: read (), readline (), and readlines ().

By default, read (size=-1) returns the full contents of the file. However, if the file is larger than memory, the optional parameter size can help limit the size of characters (text mode) or bytes (binary mode) returned.

Readline (size=-1) returns the entire line, followed by the character n. If size is greater than 0, it returns the maximum number of characters from that line.

Readlines (hint=-1) returns all the lines of the file in the list. If the number of characters returned exceeds the optional parameter hint, no lines will be returned.

Of the above three methods, read () and readlines () are less memory efficient because they return the complete file as a string or list by default. A more efficient memory iteration is to use readline () and make it stop reading until an empty string is returned. The empty string "" indicates that the pointer reaches the end of the file.

With open (test.txt, r) as reader: line = reader.readline () while line! = ": line = reader.readline () print (line)

Read files in a memory-saving way

There are two ways to write it: write () and writelines (). As the name implies, write () can write a string, while writelines () can write a list of strings. The developer must add n at the end.

With open ("test.txt", "w +") as f: f.write ("hi") f.writelines (["this is aline", "this is anotherline"]) # > > cat test.txt # hi # this is aline # this is anotherline

Write lines in a file

To write text to a special file type, such as JSON or csv, use the Python built-in module json or csv at the top of the file object.

Import csv import json with open ("cities.csv", "w +") as file: writer = csv.DictWriter (file, fieldnames= ["city", "country"]) writer.writeheader () writer.writerow ({"city": "Amsterdam", "country": "Netherlands"}) writer.writerows ([{"city": "Berlin", "country": "Germany"}) {"city": "Shanghai", "country": "China"},]) # > cat cities.csv # city,country # Amsterdam,Netherlands # Berlin,Germany # Shanghai,China with open ("cities.json", "w+") as file: json.dump ({"city": "Amsterdam", "country": "Netherlands"}, file) # > > cat cities.json # {"city": "Amsterdam" "country": "Netherlands"}

Move the pointer within the file

When you open a file, you get a file handler that points to a specific location. In r and w modes, the processor points to the beginning of the file. In a mode, the handler points to the end of the file.

Tell () and seek ()

When reading a file, if the pointer is not moved, the pointer moves itself to the next location where it starts reading. You can do this in two ways: tell () and seek ().

Tell () returns the current position of the pointer in the form of bytes / characters at the beginning of the file. Seek (offset,whence = 0) moves the handler to the offset character away from wherece. Wherece can be:

0: start from the beginning of the file

1: start from the current location

2: start at the end of the file

In text mode, wherece should only be 0. ≥ offset should be 0.

With open ("text.txt", "w +") as f: f.write ("0123456789abcdef") f.seek (9) print (f.tell ()) # 9 (pointermoves to 9, next read starts from 9) print (f.read ()) # 9abcdef

Tell () and seek ()

Understand file statu

The file system in the operating system has a lot of practical information about files, such as the size of the file, the time it was created and modified. To get this information in Python, you can use the os or pathlib module. In fact, os and pathlib have a lot in common. But the latter is more object-oriented.

Use os.stat ("test.txt") to get the full status of the file. It can return result objects with a lot of statistical information, such as st_size (file size, in bytes), st_atime (latest access timestamp), st_mtime (newly modified timestamp), and so on.

Print (os.stat ("text.txt")) > > os.stat_result (st_mode=33188, st_ino=8618932538,st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=16,st_atime=1597527409, st_mtime=1597527409, st_ctime=1597527409)

Statistics can be obtained by using os.path alone.

Os.path.getatime () os.path.getctime () os.path.getmtime () os.path.getsize ()

Pathlib

The full status of the file can also be obtained by using pathlib.Path ("text.txt"). Stat (). It can return the same object as os.stat ().

Print (pathlib.Path ("text.txt"). Stat () > > os.stat_result (st_mode=33188, st_ino=8618932538, st_dev=16777220, st_nlink=1,st_uid=501, st_gid=20, st_size=16, st_atime=1597528703, st_mtime=1597528703,st_ctime=1597528703)

The following compares the similarities and differences between os and pathlib in many ways.

Copy, move, and delete files

Python has many built-in modules that handle file movement. Before you trust the first answer returned by Google, you should know that the performance will be different with different module choices. Some modules block threads until the file movement is complete, while others may execute asynchronously.

Shutil

Shutil is the most famous module for moving, copying, and deleting files (folders). It has three methods for copying files only: copy (), copy2 (), and copyfile ().

Copy () v.s. Copy2 (): copy2 () is very similar to copy (). But the difference is that the former can also copy the file's metadata, such as the most recent access time and modification time. However, due to the limitations of the Python documentation operating system, even copy2 () cannot copy all the metadata.

Shutil.copy ("1.csv", "copy.csv") shutil.copy2 ("1.csv", "copy2.csv") print (pathlib.Path ("1.csv"). Stat () print (pathlib.Path ("copy.csv"). Stat () print (pathlib.Path ("copy2.csv"). Stat () # 1.csv # os.stat_result (st_mode=33152, st_ino=8618884732,st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11 St_atime=1597570395,st_mtime=1597259421, st_ctime=1597570360) # copy.csv # os.stat_result (st_mode=33152, st_ino=8618983930,st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=11,st_atime=1597570387, st_mtime=1597570395, st_ctime=1597570395) # copy2.csv # os.stat_result (st_mode=33152, st_ino=8618983989, st_dev=16777220,st_nlink=1, st_uid=501, st_gid=20, st_size=11,st_atime= 1597570395)

Copy () v.s. Copy2 ()

Copy () v.s. Copyfile (): copy () sets the permissions of the new file to the same as the original file, but copyfile () does not copy its permission mode. Second, the target of copy () can be a directory. If a file with the same name exists, the original file is overwritten or a new file is created. However, the target of copyfile () must be the target file name.

Shutil.copy ("1.csv", "copy.csv") shutil.copyfile ("1.csv", "copyfile.csv") print (pathlib.Path ("1.csv"). Stat () print (pathlib.Path ("copy.csv"). Stat () print (pathlib.Path ("copyfile.csv"). Stat () # 1.csv # os.stat_result (st_mode=33152, st_ino=8618884732, st_dev=16777220, st_nlink=1,st_uid=501, st_gid=20, st_size=11 St_atime=1597570395, st_mtime=1597259421,st_ctime=1597570360) # copy.csv # os.stat_result (st_mode=33152, st_ino=8618983930, st_dev=16777220, st_nlink=1,st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395,st_ctime=1597570395) # copyfile.csv # permission (st_mode) is changed # os.stat_result (st_mode=33188, st_ino=8618984694, st_dev=16777220, st_nlink=1,st_uid=501, st_gid=20, st_size=11, st_atime=1597570387, st_mtime=1597570395 St_ctime=1597570395) shutil.copyfile ("1.csv", ". / source") # IsADirectoryError: [Errno 21] Is a directory:. / source

Copy () v.s. Copyfile ()

The os module contains the system () function, which executes commands in subshell. You need to pass this command as an argument to system (), which has the same effect as executing the command on the operating system. In order to move and delete files, special features can also be used in the os module.

# copy os.system ("cp 1.csvcopy.csv") # rename/move os.system ("mv 1.csvmove.csv") os.rename ("1.csv", "move.csv") # delete os.system ("rmmove.csv")

Copy / move files asynchronously

So far, the solution has always been executed synchronously, which means that if the file is too large and takes more time to move, the program may stop running. If you want to execute the program asynchronously, you can use threading,multiprocessing or subprocess modules, which allow file operations to run in separate threads or processes.

Import threading import subprocess import multiprocessing src = "1.csv" dst = "dst_thread.csv" thread = threading.Thread (target=shutil.copy,args= [src, dst]) thread.start thread.join () dst = "dst_multiprocessing.csv" process = multiprocessing.Process (target=shutil.copy,args= [src, dst]) process.start () process.join () cmd = "cp 1.csv dst_subprocess.csv" status = subprocess.call (cmd, shell=True)

Perform file operations asynchronously

Search for files

After copying and moving files, you may need to search for file names that match a particular pattern, and Python provides many built-in functions to choose from.

Glob

The glob module, which supports the use of wildcards, finds all pathnames that match the specified pattern according to the rules used by Unix shell.

Glob.glob ("*. Csv") searches all files in the current directory with the csv extension. Using the glob module, you can also search for files in subdirectories.

> import glob > glob.glob ("* .csv") [1.csv, 2.csv] > glob.glob ("* * / * .csv", recursive=True) [1.csv, 2.csv, source/3.csv]

The os module is so powerful that it can perform almost all file operations. We can simply use os.listdir () to list all the files in the directory, use file.endswith () and file.startswith () to detect the pattern, and os.walk () to traverse the directory.

Import os for file in os.listdir ("."): if file.endswith (".csv"): print (file) for root, dirs, files in os.walk ("."): for file in files: if file.endswith (".csv"): print (file)

Search for file name-os

Pathlib

The function of pathlib is similar to that of glob module. It can also search for file names recursively. Pathlib has less code and provides more object-oriented solutions than the os-based solutions above.

From pathlib importPath p = Path (. ") For name in p.glob ("* / * .csv"): # recursive print (name)

Search for file name-pathlib

Manage file paths

Managing file paths is another common execution task. It can get the relative path and absolute path of the file, or it can connect multiple paths and find the parent directory and so on.

Relative path and absolute path

Both os and pathlib can get the relative and absolute paths of a file or directory.

Import os import pathlib print (os.path.abspath ("1.txt")) # absolute print (os.path.relpath ("1.txt")) # relative print (pathlib.Path ("1.txt"). Absolute ()) # absolute print (pathlib.Path ("1.txt")) # relative

Relative and absolute paths to files

Connection path

This is how we can connect paths in os and pathlib independently of the environment. Pathlib uses slashes to create subpaths.

Import os import pathlib print (os.path.join ("/ home", "file.txt")) print (pathlib.Path ("/ home") / "file.txt")

Link file path

Get parent directory

Dirname () is a function to get the parent directory in os, while in pathlib, you can get the parent folder simply by using the Path (). Parent function.

Import os import pathlib # relative path print (os.path.dirname ("source/2.csv")) # source print (pathlib.Path ("source/2.csv"). Parent) # source # absolute path print (pathlib.Path ("source/2.csv"). Resolve (). Parent) # / Users//project/source print (os.path.dirname (os.path.abspath ("source/2.csv") # / Users//project/source

Get parent folder

This is the end of the content of "what are the most common file manipulation skills in Python". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.