How to Read One File Line by Line to a List in Python
-
readlines
to Read the File Line by Line in Python - Iterate Over the File Method to Read a File Line by Line in Python
-
file.read
Method to Read the File Line by Line in Python - Comparison of Different Methods in Reading a File Line by Line in Python
Suppose we have a file with the content below,
Line One: 1
Line Two: 2
Line Three: 3
Line Four: 4
Line Five: 5
We need to read the file content line by line to a list, ["Line One: 1", "Line Two: 2", "Line Three: 3", "Line Four: 4", "Line Five: 5"]
.
We will introduce different methods to read a file line by line to a list below.
readlines
to Read the File Line by Line in Python
readlines
returns a list of lines from the stream.
>>> filePath = r"/your/file/path"
>>> with open(filePath, 'r', encoding='utf-8') as f:
f.readlines()
['Line One: 1\n', 'Line Two: 2\n', 'Line Three: 3\n', 'Line Four: 4\n', 'Line Five: 5']
The ending character \n
is also included in the string and it could be removed with str.rstrip('\n')
>>> with open(filePath, 'r', encoding='utf-8') as f:
[_.rstrip('\n') for _ in f.readlines()]
['Line One: 1', 'Line Two: 2', 'Line Three: 3', 'Line Four: 4', 'Line Five: 5']
Iterate Over the File Method to Read a File Line by Line in Python
We could iterate over the file to read it line by line, rather than using readlines
.
>>> with open(filePath, 'r', encoding='utf-8') as f:
[_.rstrip('\n') for _ in f]
['Line One: 1', 'Line Two: 2', 'Line Three: 3', 'Line Four: 4', 'Line Five: 5']
This method is much better than the above method from the perspective of memory usage. readlines
method holds all the lines of the file in the memory, but the interation method only takes one line of the file content to the memory and process it. It is preferred if the file size is super large to avoid MemoryError
.
file.read
Method to Read the File Line by Line in Python
file.read(size=-1, /)
reads from the file until EOF if size
is not set. We could split the lines from it by using str.splitlines
function.
>>> with open(filePath, 'r') as f:
f.read().splitlines()
['Line One: 1', 'Line Two: 2', 'Line Three: 3', 'Line Four: 4', 'Line Five: 5']
The result doesn’t include the ending character \n
in default str.splitlines
method. But you could include \n
if the keepends
parameter is set to be True
.
>>> with open(filePath, 'r') as f:
f.read().splitlines(keepends=True)
['Line One: 1\n', 'Line Two: 2\n', 'Line Three: 3\n', 'Line Four: 4\n', 'Line Five: 5']
Comparison of Different Methods in Reading a File Line by Line in Python
We will compare the efficiency performance among different methods introduced in this article. We increase the number of lines in the tested file to 8000
to easily compare the performance difference.
>>> timeit.timeit('''with open(filePath, 'r', encoding='utf-8') as f:
f.readlines()''',
setup='filePath=r"C:\Test\Test.txt"',
number = 10000)
16.36330720000001
>>> timeit.timeit('''with open(filePath, 'r', encoding='utf-8') as f:
[_ for _ in f]''',
setup='filePath=r"C:\Test\Test.txt"',
number = 10000)
18.37279060000003
>>> timeit.timeit('''with open(filePath, 'r', encoding='utf-8') as f:
f.read().splitlines()''',
setup='filePath=r"C:\Test\Test.txt"',
number = 10000)
12.122660100000019
readlines()
method is sligtly better than file iteration method, and file.read().splitlines()
is the most efficient method with the margin of more than 25% compared to the other two methods.
But, if in the BigData
application where memory is the constrainer, the file iteration method is the best as explained above.
Founder of DelftStack.com. Jinku has worked in the robotics and automotive industries for over 8 years. He sharpened his coding skills when he needed to do the automatic testing, data collection from remote servers and report creation from the endurance test. He is from an electrical/electronics engineering background but has expanded his interest to embedded electronics, embedded programming and front-/back-end programming.
LinkedIn Facebook