Python iterate through directory

Python 3: List the Contents of a Directory, Including Recursively

This article shows how to list the files and directories inside a directory using Python 3. Throughout this article, we’ll refer to the following example directory structure:

We’ll assume the code examples will be saved in script.py above, and will be run from inside the mydir directory so that the relative path ‘.’ always refers to mydir .

Using pathlib (Python 3.4 and up)

Non-Recursive

iterdir

To list the contents of a directory using Python 3.4 or higher, we can use the built-in pathlib library’s iterdir() to iterate through the contents. In our example directory, we can write in script.py :

from pathlib import Path for p in Path( '.' ).iterdir(): print( p )

When we run from inside mydir , we should see output like:

Because iterdir is non-recursive, it only lists the immediate contents of mydir and not the contents of subdirectories (like a1.html ).

Note that each item returned by iterdir is also a pathlib.Path , so we can call any pathlib.Path method on the object. For example, to resolve each item as an absolute path, we can write in script.py :

from pathlib import Path for p in Path( '.' ).iterdir(): print( p.resolve() )

This will list the resolved absolute path of each item instead of just the filenames.

Читайте также:  Css отступ от левого элемента

Because iterdir returns a generator object (meant to be used in loops), if we want to store the results in a list variable, we can write:

from pathlib import Path files = list( Path( '.' ).iterdir() ) print( files )

glob

We can also use pathlib.Path.glob to list all files (the equivalent of iterdir ):

from pathlib import Path for p in Path( '.' ).glob( '*' ): print( p )

Filename Pattern Matching with glob

If we want to filter our results using Unix glob command-style pattern matching, glob can handle that too. For example, if we only want to list .html files, we would write in script.py :

from pathlib import Path for p in Path( '.' ).glob( '*.html' ): print( p )

As with iterdir , glob returns a generator object, so we’ll have to use list() if we want to convert it to a list:

from pathlib import Path files = list( Path( '.' ).glob( '*.html' ) ) print( files )

Recursive

To recursively list the entire directory tree rooted at a particular directory (including the contents of subdirectories), we can use rglob . In script.py , we can write:

from pathlib import Path for p in Path( '.' ).rglob( '*' ): print( p )

This time, when we run script.py from inside mydir , we should see output like:

rglob is the equivalent of calling glob with **/ at the beginning of the path, so the following code is equivalent to the rglob code we just saw:

from pathlib import Path for p in Path( '.' ).glob( '**/*' ): print( p )

Filename Pattern Matching with rglob

Just as with glob , rglob also allows glob-style pattern matching, but automatically does so recursively. In our example, to list all *.html files in the directory tree rooted at mydir , we can write in script.py :

from pathlib import Path for p in Path( '.' ).rglob( '*.html' ): print( p )

This should display all and only .html files, including those inside subdirectories:

Since rglob is the same as calling glob with **/ , we could also just use glob to achieve the same result:

from pathlib import Path for p in Path( '.' ).glob( '**/*.html' ): print( p )

Not Using pathlib

Non-Recursive

os.listdir

On any version of Python 3, we can use the built-in os library to list directory contents. In script.py , we can write:

import os for filename in os.listdir( '.' ): print( filename )

Unlike with pathlib , os.listdir simply returns filenames as strings, so we can’t call methods like .resolve() on the result items. To get full paths, we have to build them manually:

import os root = '.' for filename in os.listdir( root ): relative_path = os.path.join( root, filename ) absolute_path = os.path.abspath( relative_path ) print( absolute_path )

Another difference from pathlib is that os.listdir returns a list of strings, so we don’t need to call list() on the result to convert it to a list:

import os files = os.listdir( '.' ) # files is a list print( files )

glob

Also available on all versions of Python 3 is the built-in glob library, which provides Unix glob command-style filename pattern matching.

To list all items in a directory (equivalent to os.listdir ), we can write in script.py :

import glob for filename in glob.glob( './*' ): print( filename )

This will produce output like:

Note that the root directory ( ‘.’ in our example) is simply included in the path pattern passed into glob.glob() .

Filename Pattern Matching with glob

To list only .html files, we can write in script.py :

import glob for filename in glob.glob( './*.html' ): print( filename )

Recursive

Since Python versions lower than 3.5 do not have a recursive glob option, and Python versions 3.5 and up have pathlib.Path.rglob , we’ll skip recursive examples of glob.glob here.

os.walk

On any version of Python 3, we can use os.walk to list all the contents of a directory recursively.

os.walk() returns a generator object that can be used with a for loop. Each iteration yields a 3-tuple that represents a directory in the directory tree: — current_dir : the path of the directory that the current iteration represents; — subdirs : list of names (strings) of immediate subdirectories of current_dir ; and — files : list of names (strings) of files inside current_dir .

In our example, we can write in script.py :

import os for current_dir, subdirs, files in os.walk( '.' ): # Current Iteration Directory print( current_dir ) # Directories for dirname in subdirs: print( '\t' + dirname ) # Files for filename in files: print( '\t' + filename )

This produces the following output:

Источник

Python loop through files in directory

The files are placed in directories or subdirectories in OS and this is a very common scenario when you have to iterate files over a particular directory using python. In this tutorial, I show you how to use python to loop through files in a directory recursively or non-recursively.

Python loop through files

Python provides several built-in methods or modules which can be used for file iteration and later you can perform different operations on files. Below is the list of methods or modules which can be used in the python loop through files and folders in the directory –

Let’s understand these methods one by one with examples.

Python loop through files in a directory using os.scandir() method

If you are using python 3.5 or later then the scandir() is the fastest file iterator method you can use. It returns the “DirEntry” object that holds the filename in a string. It provides below two options –

With Parameter – list files from given folder/directory.

No Parameter – list files from current folder/directory.

And the output of scandir() method looks like below –

Scandir() Example –

Import os x = os.scandir() For i in x: Print(i)

If you didn’t pass the path of the directory it by default read the current working directory. It prints all the available files and subdirectory them to the console. And If you want only files need to be fetched and ignore directories then add a file type check in your script like below –

import os directory = r'C:\testfolder' for strfile in os.scandir(directory): if (strfile.path.endswith(".xlsx") or strfile.path.endswith(".docx")) and strfile.is_file(): print(strfile.path)

I only required a .xlsx and .docx file from the entire directory so I have added the file type check.

Note – The scandir() method is not recursive use the walk() method which I show below if you need to iterate over nested folders.

Iterate file over directory using os.listdir() method –

If you are using python 2 which is an old but popular version of python then you can use listdir() method to iterate files from any particular directory –

import os myfiles = os.listdir() Print (myfiles)

It returns all files and folders from the current directory because I don’t mention the path in listdir() method. Let’s pass the folder path and iterate the file from the given folder.

import os directory = r'C:\testfolder' myfiles = [x for x in os.listdir(directory) if x.endswith(".jpg")] print(os.path.join(directory, myfiles))

It returns all the .jpg files from “testfolder” directory.

Iterate file from given directory using os.walk() method –

The os.scandir() and os.listdir() method have one limitation it only iterates files and folders from immediate directory means it’s not recursive if you need to iterate through nested directory or folder use os.walk() method –

dir = r'C:\testfolder' for subdir, dirs, files in os.walk(dir): for filename in files: filepath = subdir + os.sep + filename #check file extension ends with .png if filepath.endswith(".png"): print (filepath)

it returns “.png” files from all the folders in a given path.

Iterate file over directory using glob.iglob() method

The glob.iglob() or glob.glob() methods are used to retrieve paths recursively from inside the folder/directories.

glob.glob(pathname, *, recursive=False) glob.iglob(pathname, *, recursive=False)

By default recursive is false means it doesn’t retrieve the files from the nested folder it only fetches from the immediate directory. If you set recursive true then it recursively lists files from nested folders.

Let say we have one directory having below path –

Now we need to retrieve all the text files from the above path then use the below script –

import glob # fetch all txt files from given path for filepath in glob.iglob(r'C:\testfolder\files\*.txt'): print(filepath)

The above code only retrieve the .txt files inside the given immediate folder if you want to recursively fetch from a nested folder use the below code –

Python loop through files in directory recursively

import glob # Recursively fetch all txt files from given path for filepath in glob.iglob(r'C:\testfolder\files\*.txt' , recursive=True): print(filepath)

Iterate file over directory using pathlib() method

The pathlib method works the same as iglob() method the path module provides various classes to handle files. Below is the example to iterate files over a particular directory using pathlib() method.

from pathlib import Path directory = 'C:\testfolder\files' paths = Path(directory).glob('**/*.txt') for path in paths: #convert path object into string pathstr = str(path) # print .txt file path print(pathstr)

Conclusion

I hope now you have a basic understanding of how to iterate files all the above can be used in python to loop through files in the directory recursively.

Recent Posts

Recent Comments

Источник

Оцените статью