打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
python 按行读文件
Doing it the usual way
The standard idiom consists of a an ‘endless’ while loop, in which we repeatedly call the file’s readline method. Here’s an example:
# File: readline-example-1.pyfile = open("sample.txt")while 1: line = file.readline() if not line: break pass # do somethingThis snippet reads the file line by line. If readline reaches the end of the file, it returns an empty string. Otherwise, it returns the line of text, including the trailing newline character.
On my test machine, using a 10 megabyte sample text file, this script reads about 32,000 lines per second.
Using the fileinput module
If you think the while loop is ugly, you can hide the readline call in a wrapper class. The standard fileinput module contains an input class which does exactly that.
# File: readline-example-2.pyimport fileinputfor line in fileinput.input("sample.txt"): passHowever, adding more layers of Python code doesn’t exactly help. For the same test setup, performance drops to 13,000 lines per second. That’s nearly two and half times slower!
Speeding up line reading
To speed things up, we obviously need to make sure we spend as little time on in Python code (running under the interpreter) as possible.
One way to do this is to tell the file object to read larger chunks of data. For example, if you have enough memory, you can slurp the entire file into memory, using the readlines method. Or you could even use the read method to read the entire file into a single memory block, and then use string.split to chop it up into individual lines.
However, if you’re processing really large files, it would be nice if you could limit the chunk size to something reasonable. For example, if you read a few thousand lines at a time, you probably won’t use up more than 100 kilobytes or so.
The following script uses a nested loop. The outer loop uses readlines to read about 100,000 bytes of text, and the inner loop processes those lines using a simple for-in loop:
# File: readline-example-3.pyfile = open("sample.txt")while 1: lines = file.readlines(100000) if not lines: break for line in lines: pass # do somethingCan this really be faster? You bet. With the same test data, we can now process 96,900 lines of text per second!
Or to put it another way, this solution is three times as fast as the standard solution, and over seven times faster than the fileinput version.
In Python 2.2 and later, you can loop over the file object itself. This works pretty much like readlines(N) under the covers, but looks much better:
# File: readline-example-5.pyfile = open("sample.txt")for line in file: pass # do somethingIn Python 2.1, you have to use the xreadlines iterator factory instead:
# File: readline-example-4.pyfile = open("sample.txt")for line in file.xreadlines(): pass # do something
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
Python按行读文件
【python3系列】timeit模块
成功解决Exception "unhandled RuntimeError" run loop already started File: F:\Program Files\Python\Python
python按行读取txt文件
Python 读取大文件
open
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服