- Skip first couple of lines while reading lines in Python file
- 9 Answers 9
- Python skip empty lines
- # Table of Contents
- # Remove the empty lines from a String in Python
- # Remove the empty lines with or without whitespace from a String
- # Remove the empty lines from a String using str.join() with \n
- # Additional Resources
- Skipping blank lines
- Recommended Answers Collapse Answers
- All 3 Replies
Skip first couple of lines while reading lines in Python file
I just want the good stuff. What I’m doing is a lot more complicated, but this is the part I’m having trouble with.
9 Answers 9
with open('yourfile.txt') as f: lines_after_17 = f.readlines()[17:]
If the file is too big to load in memory:
with open('yourfile.txt') as f: for _ in range(17): next(f) for line in f: # do stuff
I use the second solutions to read ten lines at the end of a file with 8 million (8e6) lines and it takes ~22 seconds. Is this still the preferred (=fastest) way for such long files (~250 MB)?
@wim: I guess, tail doesn’t work on Windows. Furthermore I don’t always want to read the last 10 lines. I want to be able to read some lines in the middle. (e.g. if I read 10 lines after ~4e6 lines in the same file it takes still half of that time, ~11 seconds)
The thing is, you need to read the entire content before line number ~4e6 in order to know where the line separator bytes are located, otherwise you don’t know how many lines you’ve passed. There’s no way to magically jump to a line number. ~250 MB should be OK to read entire file to memory though, that’s not particularly big data.
Use itertools.islice , starting at index 17. It will automatically skip the 17 first lines.
import itertools with open('file.txt') as f: for line in itertools.islice(f, 17, None): # start=17, stop=None # process lines
Is this feasible for large text files that may not fit in the memory? That is, does itertools.islice load the entire file into the memory? I couldn’t find this in the documentation.
for line in dropwhile(isBadLine, lines): # process as you see fit
from itertools import * def isBadLine(line): return line=='0' with open(. ) as f: for line in dropwhile(isBadLine, f): # process as you see fit
Advantages: This is easily extensible to cases where your prefix lines are more complicated than «0» (but not interdependent).
Here are the timeit results for the top 2 answers. Note that «file.txt» is a text file containing 100,000+ lines of random string with a file size of 1MB+.
import itertools from timeit import timeit timeit("""with open("file.txt", "r") as fo: for line in itertools.islice(fo, 90000, None): line.strip()""", number=100) >>> 1.604976346003241
from timeit import timeit timeit("""with open("file.txt", "r") as fo: for i in range(90000): next(fo) for j in fo: j.strip()""", number=100) >>> 2.427317383000627
clearly the itertools method is more efficient when dealing with large files.
Python skip empty lines
Last updated: Feb 24, 2023
Reading time · 4 min
# Table of Contents
# Remove the empty lines from a String in Python
To remove the empty lines from a string:
- Use the str.splitlines() method to split the string on newline characters.
- Use a list comprehension to iterate over the list.
- Exclude the empty lines from the result.
- Use the str.join() method to join the list with os.linesep as the separator.
Copied!import os multiline_string = """\ First line Second line Third line """ without_empty_lines = os.linesep.join( [ line for line in multiline_string.splitlines() if line ] ) # First line # Second line # Third line print(without_empty_lines)
The str.splitlines method splits the string on newline characters and returns a list containing the lines in the string.
Copied!multiline_string = """\ First line Second line Third line """ # 👇️ ['First line', '', 'Second line', '', 'Third line', '', ''] print(multiline_string.splitlines())
We used a list comprehension to iterate over the list of lines.
List comprehensions are used to perform some operation for every element or select a subset of elements that meet a condition.
On each iteration, we check if the current line is truthy to exclude empty strings from the result.
The last step is to use the str.join() method to join the filtered list.
The str.join method takes an iterable as an argument and returns a string which is the concatenation of the strings in the iterable.
The string the method is called on is used as the separator between the elements.
We used os.linesep as the separator.
The os.linesep attribute returns the string used to separate lines on the current platform.
For example \n on Unix and \r\n on Windows.
I’m on Linux, so this is the output of os.linesep for me.
Copied!import os print(repr(os.linesep)) # 👉️ '\n'
If you want to handle the scenario where the empty lines might contain only whitespace, use the str.strip() method.
# Remove the empty lines with or without whitespace from a String
If you need to remove the empty lines that may or may not contain whitespace:
- Use the str.splitlines() method to split the string on newline characters.
- Use a list comprehension to iterate over the list.
- Use the str.strip() method to filter out empty lines that may contain whitespace.
- Use the str.join() method to join the list with a newline character separator.
Copied!import os multiline_string = """\ First line Second line Third line """ without_empty_lines = os.linesep.join([ line for line in multiline_string.splitlines() if line.strip() != '' ]) # First line # Second line # Third line print(without_empty_lines)
If the empty lines in the multiline string contain only whitespace characters, we can use the str.strip() method to remove the whitespace and compare the result to an empty string.
Here is an example of calling str.splitlines() on a multiline string where some of the empty lines contain only whitespace characters.
Copied!multiline_string = """\ First line Second line Third line """ # 👇️ ['First line', ' ', 'Second line', ' ', 'Third line', ' ', ''] print(multiline_string.splitlines())
The str.strip method returns a copy of the string with the leading and trailing whitespace removed.
If the line is equal to an empty string once the leading and trailing whitespace is removed, we consider it to be an empty line.
Alternatively, you can use the str.join() method with a newline character to avoid an extra import.
# Remove the empty lines from a String using str.join() with \n
This is a four-step process:
- Use the str.splitlines() method to split the string on newline characters.
- Use a list comprehension to iterate over the list.
- Exclude the empty lines from the result.
- Use the str.join() method to join the filtered list with \n as the separator.
Copied!multiline_string = """\ First line Second line Third line """ without_empty_lines = '\n'.join([ line for line in multiline_string.splitlines() if line ]) # First line # Second line # Third line print(without_empty_lines)
We used the \n (newline) character as the separator in the example to not have to import the os module.
However, note that this approach doesn’t handle the scenario where the lines in the multiline string are separated by another character, e.g. \r\n\ (Windows).
For a solution that is consistent between operating systems, stick to the os.linesep attribute.
# Additional Resources
You can learn more about the related topics by checking out the following tutorials:
- How to remove Accents from a String in Python
- Remove all Non-Numeric characters from a String in Python
- How to remove the ‘b’ prefix from a String in Python
- Remove Backslashes or Forward slashes from String in Python
- Remove all Empty Strings from a List of Strings in Python
- Remove First and Last Characters from a String in Python
- Remove first occurrence of character from String in Python
- Remove the First N characters from String in Python
- Remove the HTML tags from a String in Python
- Remove the last comma from a String in Python
- Remove the last N characters from a String in Python
I wrote a book in which I share everything I know about how to become a better, more efficient programmer.
Skipping blank lines
Hey all, I have been reading some older posts on line/list manipulation, but don’t quite understand exactly how to implement all of the tools that you guys are using (line.strip() for example) to get what I need. Essentially, I have a working program that iterates through data and outputs it into a file. If one of the lines in the file is blank, the program crashes. I want to know what simple snipet of code I can add to ensure that the program continues iterating. Even more awesome would be a way for the program to let me know at which points in the data blank lines appear. But I would be happy just having the program skip the blank lines. Any suggestions?
- 3 Contributors
- 3 Replies
- 20K Views
- 1 Month Discussion Span
- Latest Post 14 Years Ago Latest Post by tdeck
Recommended Answers Collapse Answers
How would you guys put that in laymen’s terms? if not line.strip()
I would translate it to: if empty line: I guess. Here’s an example, basically this code is relying on the fact that Python considers an empty string, or » to be a False, while …
All 3 Replies
Perhaps I can clarify my concern. Take this code, which claims to skip blank lines (and works) which I found online:
infile = open("text.txt","r") for line in infile: if not line.strip(): continue else: print line
What is this code saying? Take a line in the infile, and then I don’t understand what «if not line.strip()» is really doing. How would you guys put that in laymen’s terms? «if not line.strip()»
I would translate it to: if empty line: I guess. Here’s an example, basically this code is relying on the fact that Python considers an empty string, or » to be a False, while a non-empty string is True. Let me demonstrate:
>>> empty_line = '\n' >>> text_line = 'Lipsum\n' >>> >>> empty_line.strip() '' >>> text_line.strip() 'Lipsum' >>> bool( empty_line.strip() ) False >>> bool( text_line.strip() ) True >>> if '': . print 'Empty!' . >>> if not '': . print 'Empty!' . Empty! >>>
So as you can see, strip removes the new line character and any bounding white space (tabs, newline, spaces — on the far left and right of the string). Since Python considers an empty line to be False, this code is waiting for a line that does not evaluate to False, before acting on it.
>>> lines = [ 'Not empty\n', '\n', ' Foobar\n' ] >>> for line in lines: . print bool( line.strip() ), '=', line.strip() . True = Not empty False = True = Foobar >>> for line in lines: . if line.strip(): . print 'Found a line: %s' % line.strip() . else: . print 'Empty line!' . Found a line: Not empty Empty line! Found a line: Foobar >>> # Now repeat with 'not' to negate the True/False >>> for line in lines: . if not line.strip(): . print 'Empty line!' . else: . print 'Found a line: %s' % line.strip() . Found a line: Not empty Empty line! Found a line: Foobar >>>
Now the above example demonstrates the boolean evaluation of the string that is returned by strip() . Hopefully that clears it up for you