In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you how to use Python to quickly count the number of lines of text, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Usually we use wc-l to count the number of file lines, but it is also very simple to use Python statistics.
To quickly count the number of lines in a text file is to count the number of newline characters in the text file. In order to speed up as much as possible, we need to read as much text as possible and then deal with it together. You can use bytes's built-in count method to count the number of newline characters.
The code is as follows:
From _ _ future__ import print_functionimport timeif _ _ name__ ='_ _ main__': import sys start = time.time () with open (sys.argv [1] 'rb') as f: count = 0 last_data ='\ n' while True: data = f.read (0x400000) if not data: break count + = data.count (b'\ n') last_data = data if last_data [- 1:]! = b'\ nbread: Count + = 1 # Remove this if a wc-like count is needed end = time.time () print (count) print ((end-start) * 1000)
In the above code, we count the incomplete part of the file with no newline character at the end of the file as a line, which is slightly different from wc-l. If you want to be consistent with wc-l, you can delete the commented line.
There is no logic such as dealing with universal newline, ignoring blank lines, and so on, and if these features are needed, the program will become a little more complicated.
Test with three text files, 10 million lines, 160 million lines, and 640 million lines, respectively. Run it twice with wc-l, and then use Python's wc.py.
Running result:
[root@yz- test] # docker run-it-- rm-v `pwd`: / opt/workspace python:3 bash-c "cd / opt/workspace & & time wc-l text.txt & & time wc-l text.txt & & time python3 wc.py text.txt" 10000000 text.txtreal 0m0.086suser 0m0.072ssys 0m0.013s10000000 text.txtreal 0m0.080suser 0m0.060ssys 0m0.019s1000000064.38159942626953real 0m0.150suser 0m0.100ssys 0m0.033s [root@yz- test] # Docker run-it-- rm-v `pwd`: / opt/workspace python:3 bash-c "cd / opt/workspace & & time wc-l text3.txt & & time wc-l text3.txt & & time python3 wc.py text3.txt" 160000000 text3.txtreal 0m1.322suser 0m0.991ssys 0m0.318s160000000 text3.txtreal 0m1.313suser 0m0.966ssys 0m0.341s160000000838.7012481689453real 0m0.908suser 0m0.595ssys 0m0.297s [root@yz- test] # docker run-it-- rm- V `pwd`: / opt/workspace python:3 bash-c "cd / opt/workspace & & time wc-l text4.txt & & time wc-l text4.txt & & time python3 wc.py text4.txt" 640000000 text4.txtreal 0m5.805suser 0m4.349ssys 0m1.455s640000000 text4.txtreal 0m5.787suser 0m4.342ssys 0m1.445s6400000003323.5926628112793real 0m3.399suser 0m2.255ssys 0m1.108s
You can see that Python is actually faster than wc-l, mainly because there are few steps in pure Python, and most of the time is in C implementations such as read () and count (). The reason why wc is slower is guessed that the default buffer is smaller, so more read () is needed.
The above content is how to use Python to quickly count the number of lines of text. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.