How to use pathlib in Python3 07/15 Update SLTechnology News&Howtos

How to use pathlib in Python3

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to use pathlib in Python3". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to use pathlib in Python3.

Use pathlib to handle better paths

Pathlib is a default module in Python3 that can help you avoid using a lot of os.path.join.

From pathlib import Pathdataset = 'wiki_images'datasets_root = Path (' / path/to/datasets/') # Navigating inside a directory tree,use:/train_path = datasets_root / dataset / 'train'test_path = datasets_root / dataset /' test'for image_path in train_path.iterdir (): with image_path.open () as f: # note, open is a method of Path object # do something with an image

Do not use string links to stitch paths, depending on the operating system, there will be errors, we can use / combined with pathlib to stitch paths, which is very safe, convenient and highly readable.

Pathlib also has many attributes. For more information, please refer to the official documentation of pathlib. Here are a few:

From pathlib import Patha = Path ("/ data") b = "test" c = a / bprint (c) print (c.exists ()) # whether print (c.is_dir ()) exists in the path # determine whether it is a folder print (c.parts) # detach path print (c.with_name ('sibling.png')) # modify only the extension Will not modify the source file print (c.with_suffix ('.jpg')) # only modify the extension, will not modify the source file c.chmod (777) # modify directory permissions c.rmdir () # delete the directory

Type hints are now part of the language

An example of using Typing in Pycharm:

Type hints are introduced to help solve the increasingly complex problems of the program, and IDE can identify the types of parameters and give users hints.

For the specific usage of Tying, please refer to: the final Guide to python Type Detection-- the use of Typing

Run-time type hint type check

In addition to continuing type checking with the mypy module mentioned in the previous article, you can also use the enforce module to check, which can be installed through pip. Examples of use are as follows:

Import enforce@enforce.runtime_validationdef foo (text: str)-> None: print (text) foo ('Hi') # okfoo (5) # fails

Output

HiTraceback (most recent call last): File "/ Users/chennan/pythonproject/dataanalysis/e.py", line 10, in foo (5) # fails File "/ Users/chennan/Desktop/2019/env/lib/python3.6/site-packages/enforce/decorators.py", line 104, in universal _ args, _ kwargs, _ = enforcer.validate_inputs (parameters) File "/ Users/chennan/Desktop/2019/env/lib/python3.6/site-packages/enforce/enforcers.py", line 86 In validate_inputs raise RuntimeTypeError (exception_text) enforce.exceptions.RuntimeTypeError: The following runtime type errors were encountered: Argument 'text' was not of type. Actual type was int.

Use @ to represent the multiplication of a matrix

Let's implement the simplest ML model-L2 regularized linear regression (also known as ridge regression).

# l2-regularized linear regression: | | AX-y | | ^ 2 + alpha * | x | | ^ 2-> min# Python 2X = np.linalg.inv (np.dot (A.T, A) + alpha * np.eye (A.Shape [1])) .dot (A.T.dot (y)) # Python 3X = np.linalg.inv (A.T @ A + alpha * np.eye (A.shape [1])) @ (A.T @ y)

Using the @ symbol, the entire code becomes more readable and portable to other scientific computing-related libraries, such as numpy, cupy, pytorch, tensorflow, etc.

* * use of wildcards

In Python2, it is not easy to find files recursively, even if you use the glob library, but starting with Python3.5, you can do it simply through the * * wildcard.

Import glob# Python 2found_images = (glob.glob ('/ path/*.jpg') + glob.glob ('/ path/*/*.jpg') + glob.glob ('/ path/*.jpg')) # Python 3found_images = glob.glob ('/ path/**/*.jpg', recursive=True)

A better way to write the path is the pathlib mentioned above, and we can further rewrite the code in the following form.

# Python 3import pathlibimport globfound_images = pathlib.Path ('/ path/') .glob ('* / * .jpg')

Print function

Although Python3's print adds a pair of parentheses, this does not affect its advantages.

Write a file in the form of a file descriptor

Print > > sys.stderr, "critical error" # Python 2print ("critical error", file=sys.stderr) # Python 3

Do not use str.join to concatenate strings

# Python 3print (* array, sep='') print (batch, epoch, loss, accuracy, time, sep='')

Redefine the behavior of the print method

Since print in Python3 is a function, we can rewrite it.

# Python 3_print = print # store the original print functiondef print (* args, * * kargs): pass # do something useful, e.g. Store output to some file

Note: in Jupyter, it is best to record each output in a separate file (tracking what happens after disconnection) so that print can be overwritten.

@ contextlib.contextmanagerdef replace_print (): import builtins _ print = print # saving old print function # or use some other function here builtins.print = lambda * args, * * kwargs: _ print ('new printing', * args, * * kwargs) yield builtins.print = _ printwith replace_print ():

Although the above code can also achieve the purpose of rewriting the print function, it is not recommended.

Print can participate in list understanding and other language construction

# Python 3result = process (x) if is_valid (x) else print ('invalid item:', x)

Underscores in numeric text (thousand delimiters)

Adding underscores to numbers is introduced in PEP-515. In Python3, underscores can be used for integers, floating points, and plural numbers, and this underscore acts as a grouping

# grouping decimal numbers by thousandsone_million = 1000000 grouping hexadecimal addresses by wordsaddr = 0xCAFE_F00D# grouping bits into nibbles ina binary literalflags = 0b_0011_1111_0100_1110# same, for string conversionsflags = int ('0bounded 11110000mm, 2)

In other words, 10000, you can write it in the form of 10000.

Simple and visible string formatting f-string

The string formatting system provided by Python2 is still not good enough and too lengthy and troublesome. We usually write a piece of code to output log information:

# Python 2print'{batch:3} {epoch:3} / {total_epochs:3} accuracy: {acc_mean:0.4f} ±{acc_std:0.4f} time: {avg_time:3.2f} '.format (batch=batch, epoch=epoch, total_epochs=total_epochs, acc_mean=numpy.mean (accuracies), acc_std=numpy.std (accuracies), avg_time=time / len (data_batch)) # Python 2 (too error-prone during fast modifications Please avoid): print'{: 3} {: 3} / {: 3} accuracy: {: 0.4f} ±: 0.4f} time: {: 3.2f} '.format (batch, epoch, total_epochs, numpy.mean (accuracies), numpy.std (accuracies), time / len (data_batch))

The output is

12012 / 300 accuracy: 0.8180 ±0.4649 time: 56.60

F-string (formatted string) is introduced in Python3.6

Print (f'{batch:3} {epoch:3} / {total_epochs:3} accuracy: {numpy.mean (accuracies): 0.4f} ±{numpy.std (accuracies): 0.4f} time: {time / len (data_batch): 3.2f}')

For the usage of f-string, please see my video in bilibili [https://www.bilibili.com/video/av31608754]]

There is an obvious difference between'/ 'and' /'in mathematical operations.

This is undoubtedly a convenient change for data science.

Data = pandas.read_csv ('timing.csv') velocity = data [' distance'] / data ['time']

The results in Python2 depend on whether time and distance (for example, in meters and seconds) are stored as integers. In python3, the results are correct in both cases, because the result of division is a floating point number.

Another example is floor division, which is now an explicit operation

N_gifts = money / / gift_price # correct for int and float arguments

Nutshell

> from operator import truediv, floordiv > truediv.__doc__, floordiv.__doc__ ('truediv (a, b)-- Same as a / b.rabbit,' floordiv (a, b)-- Same as a / / b.`) > (3 / 2), (3 / 2), (3 / / 2), (3 / / 2) (1.5,1,1.0)

It is worth noting that this rule applies to both built-in types and custom types provided by packets (such as numpy or pandas).

Strict order

The following comparisons are legal in Python3.

3 < '332 < None (3,4) < (3, None) (4,5) < [4,5]

It is illegal for either 2 or 3 of the following

(4,5) = [4,5]

If you sort different types

Sorted ([2, '1clients, 3])

Although the above writing results in Python2 [2, 3,'1'], the above writing is not allowed in Python3.

Check the reasonable scheme for None

If an is not None: passif a: # WRONG check for None pass

NLP Unicode problem

S = 'Hello' print (len (s)) print (s [: 2])

Output content

Python 2: 6. Hello, Python 3: 2.

And the following operations.

X = u 'Mr. okx' x + = 'co' # okx + =' thanks'# fail

Python2 failed and Python3 worked properly (because I used Russian letters in the string).

In Python3, strings are all unicode-encoded, so it is more convenient to deal with non-English text.

Some other actions

'a'< type < Upria'# Python 2: True'a' < Upria'# Python 2: False

And for example,

From collections import CounterCounter ('M ö belst ü ck')

In Python2

Counter ({'use': 2, 'baked: 1,' eBay: 1, 'clocked: 1,' KTH: 1, 'MOS: 1,' lump: 1,'s cause: 1, 'tweak: 1,' Q: 1, 'quarter': 1})

In Python3

Counter ({'Maureen: 1,' ö': 1, 'baked: 1,' eBay: 1, 'lager: 1,' taper: 1,'ü': 1, 'cession: 1,' kink: 1})

Although these results can be handled correctly in Python2, the results look more friendly in Python3.

Keep the order of dictionaries and * * kwargs

In CPython3.6+, by default, dict behaves like OrderedDict, sorting automatically (which is guaranteed in Python3.7+). At the same time, the order is preserved in dictionary generation (and other operations, such as during json serialization / deserialization).

Import jsonx = {str (I): i for i in range (5)} json.loads (json.dumps (x)) # Python 2 {upright: 1, upright: 0, upright 3: 3, upright 2: 2, upright 4: 4} # Python 3 {'0: 0,'1: 1,'2: 2,'3: 3,'4: 4}

The same applies to * * kwargs (in Python 3.6 +), which are in the same order as they appear in the parameters. When it comes to data pipelines, order is crucial, and we used to have to write it in a tedious way

From torch import nn# Python 2model = nn.Sequential (OrderedDict ([('conv1', nn.Conv2d (1 conv1', nn.Conv2d), (' relu1', nn.ReLU ()), ('conv2', nn.Conv2d (20 (64), ((' relu2', nn.ReLU ())])

And you can do this after Python3.6.

# Python 3.6, how it * can* be done, not supported right now in pytorchmodel = nn.Sequential (conv1=nn.Conv2d, relu1=nn.ReLU (), conv2=nn.Conv2d, relu2=nn.ReLU ())

Iterative object unpacking

Similar to unpacking tuples and lists, take a look at the following code example.

# handy when amount of additional stored info may vary between experiments, but the same code can be used in all casesmodel_paramteres, optimizer_parameters, * other_params = load (checkpoint_name) # picking two last values from a sequence*prev, next_to_last, last = values_history# This also works with any iterables, so if you have a function that yields e.g. Qualities,# below is a simple way to take only last two values from a list*prev, next_to_last, last = iter_train (args)

Provides higher performance pickle

Python2

Import cPickle as pickleimport numpyprint len (pickle.dumps (numpy.random.normal (size= [1000, 1000])) # result: 23691675

Python3

Import pickleimport numpylen (pickle.dumps (numpy.random.normal (size= [1000, 1000])) # result: 8000162

There's three times less space. And much faster. In fact, you can achieve similar compression (but not speed) with the protocol=2 parameter, but developers often ignore this option (or don't know it at all).

Note: pickle is not secure (and cannot be completely transferred), so do not unpickle data received from untrusted or unauthenticated sources.

Safer list derivation

Labels = predictions = [model.predict (data) for data, labels in dataset] # labels are overwritten in Python labels are not affected by comprehension in Python 3

Simpler super ()

In python2, super-related code is often prone to miswriting.

# Python 2class MySubClass (MySuperClass): def _ init__ (self, name, * * options): super (MySubClass, self). _ init__ (name='subclass', * * options) # Python 3class MySubClass (MySuperClass): def _ init__ (self, name, * * options): super (). _ init__ (name='subclass', * * options)

This Python3 is greatly optimized, and the new super () can no longer pass parameters.

At the same time, the order of calls is also different.

IDE can give a better hint.

The most interesting thing about programming in languages such as Java and c # is that IDE provides good advice because the type of each identifier is known before the program is executed.

This is hard to achieve in python, but comments will help you

This is an example of an PyCharm prompt with variable comments. This method can be used even if the function you are using is unannotated (for example, due to backward compatibility).

Multiple unpacking

How to merge two dictionaries

X = dict (axi1, bread2) y = dict (bread3, dumb4) # Python 3.5roomz = {* * x, * * y} # z = {'astat1,' baked: 3, 'dumped: 4}, note that value for `b` is taken from the latter dict.

I also posted a related video on bilibili [https://www.bilibili.com/video/av50376841]]

The same method applies to lists, tuples, and collections (a, b, c are any iterators)

[* a, * b, * c] # list, concatenating (* a, * b, * c) # tuple, concatenating {* a, * b, * c} # set, union

The function also supports multiple unpacking of * arg and * * kwarg.

# Python 3.5+do_something (* * {* default_settings, * * custom_settings}) # Also possible, this code also checks there is no intersection between keys of dictionariesdo_something (* * first_args, * * second_args)

Data classes

Python 3.7introduces the Dataclass class, which is suitable for storing data objects. What is the data object? Several, though not comprehensive, characteristics of this object type are listed below:

They store data and represent a data type, such as numbers. For friends familiar with ORM), a data model instance is a data object. It represents a particular entity. The attributes it has define or represent the entity. They can be compared with other objects of the same type. For example: greater than, less than, or equal to.

Of course, there are many more features, and the following example is a good alternative to namedtuple.

@ dataclassclass Person: name: str age: int@dataclassclass Coder (Person): preferred_language: str = 'Python 3'

Dataclass decorator implements the functions of several magic function methods (_ _ init__,__repr__,__le__,__eq__)

There are several features about data classes:

Data classes can be mutable or immutable default values for supporting fields can be inherited by other classes data classes can define new methods and override existing method initialization post-processing (such as verifying consistency)

For more information, please refer to the official documentation.

Customize access to module properties

In Python, you can use getattr and dir to control property access and prompts for any object. Because of python3.7, you can do the same with modules.

A natural example is the random submodule that implements the tensor library, which is usually a shortcut to skip initializing and passing random state objects. The implementation of numpy is as follows:

# nprandom.pyimport numpy__random_state = numpy.random.RandomState () def _ getattr__ (name): return getattr (_ _ random_state, name) def _ _ dir__ (): return dir (_ _ random_state) def seed (seed): _ _ random_state = numpy.random.RandomState (seed=seed)

You can also mix the functions of different objects / submodules in this way. Compared to the techniques in pytorch and cupy.

In addition, you can do the following:

Use it to delay the loading of submodules. For example, importing tensorflow imports all submodules (and dependencies). It requires about 150 megabytes of memory.

Use this option for depreciation in the API

Introduce runtime routing between sub-modules

Built-in breakpoint

In python3.7, you can directly use breakpoint to break the code.

# Python 3.7, not all IDEs support this at the momentfoo () breakpoint () bar ()

Before python3.7, we could have achieved the same function through import pdb's pdb.set_trace ().

For remote debugging, try using breakpoint () with web-pdb.

Constants in the Math module

# Python 3math.inf # Infinite floatmath.nan # not a numbermax_quality =-math.inf # no more magic initial valuations for model in trained_models: max_quality = max (max_quality, compute_quality (model, data))

Only int is an integer type.

Python 2 provides two basic integer types, one is int (64-bit signed integer) and the other is long, which is very confusing to use, while only the int type is provided in python3.

Isinstance (x, numbers.Integral) # Python 2, the canonical wayisinstance (x, (long, int)) # Python 2isinstance (x, int) # Python 3, easier to remember

The same can be applied to other integer types in python3, such as numpy.int32 and numpy.int64, but other types do not apply.

Conclusion

Although Python 2 and Python 3 have coexisted for nearly 10 years, we should turn to Python 3.

With Python3, the code becomes shorter, easier to read, and more secure, both for research and production.

Thank you for reading, the above is the content of "how to use pathlib in Python3". After the study of this article, I believe you have a deeper understanding of how to use pathlib in Python3, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.