How to List All Files in Directory and Subdirectories in Python
-
Use the
os.listdir()
Function to List All Files in the Directory and Subdirectories in Python -
Use the
os.scandir()
Function to List All Files in the Directory and Subdirectories in Python -
Use the
os.walk()
Function to List All Files in the Directory and Subdirectories in Python -
Use the
glob.glob()
Function to List All Files in the Directory and Subdirectories in Python -
Use the
pathlib.path()
Function to List All Files in the Directory and Subdirectories in Python - Conclusion
In programming, managing and manipulating files is a fundamental task. Whether we’re organizing our files, analyzing data, or building applications that require file processing, efficiently listing all files in a directory and its subdirectories is a crucial skill.
With its simplicity and versatility, Python provides an array of tools and libraries to tackle this challenge.
The problem is traversing through a directory structure and gathering a comprehensive list of all its files, including those in its subdirectories. This may seem like a straightforward task, but as the depth and complexity of the directory structure increase, it becomes increasingly cumbersome to locate and enumerate each file manually.
Imagine we are data scientists working on a project that involves analyzing a massive dataset spread across multiple folders and subfolders. Instead of manually searching for all the relevant files and writing custom code for each folder level, we can utilize Python’s file listing capabilities to efficiently traverse through the entire directory structure, compiling a complete list of files for further analysis or processing.
In this tutorial, we will explore various approaches using Python to list all files in a directory and its subdirectories.
Use the os.listdir()
Function to List All Files in the Directory and Subdirectories in Python
The os
module in Python allows interaction with the Operating System. It has many built-in functions that deal with the file system.
The os.listdir()
function in Python provides a straightforward way to list all files in a directory, including its subdirectories. It returns a list of all the entries (files and directories) in the specified directory.
Syntax:
for file in os.listdir(directory_path):
# Code to process files
When using os.listdir()
, we pass the directory_path
parameter, representing the directory path from which we want to list files. It can be either a relative or an absolute path.
The function returns a list of entries within the specified directory. However, this list includes files and directories, so we must distinguish between them during processing.
Our Directory Structure
The following image shows our current directory structure used throughout this tutorial.
The following code uses the os.listdir()
function to list all files in a directory and its subdirectories in Python.
import os
def list_files(directory_path):
files = []
for file_name in os.listdir(directory_path):
file_path = os.path.join(directory_path, file_name)
if os.path.isfile(file_path):
files.append(file_path)
elif os.path.isdir(file_path):
files.extend(list_files(file_path))
return files
# Usage
directory_path = "MyFolder"
all_files = list_files(directory_path)
for file_path in all_files:
print(file_path)
Output:
In the above code, the function list_files
is defined to list all files inside a directory and its subdirectories using os.listdir()
. Within the function, it initializes an empty list, files
, to store the file paths. It iterates over the file_name
obtained from os.listdir(directory_path)
.
For each file_name
, it constructs the file_path
by joining it with the directory_path
using os.path.join()
.
If file_path
represents a file (os.path.isfile(file_path)
), the file path is appended to the files
list. If file_path
represents a directory (os.path.isdir(file_path)
), the function is called recursively to get the files within that subdirectory, and the resulting files are added to the files
list.
After defining the function, we can specify the directory_path
and call the list_files
function. It will return a list of all file paths within the directory and its subdirectories.
As shown in the example, we can then iterate over this list to perform any desired operations, such as printing the file paths.
Use the os.scandir()
Function to List All Files in the Directory and Subdirectories in Python
The os.scandir()
function in Python provides a powerful way to iterate over the entries (files and directories) within a directory, including its subdirectories. It returns an iterator that yields DirEntry
objects representing each entry.
Syntax:
for entry in os.scandir(directory_path):
# Code to process files
When using os.scandir()
, we provide the directory_path
parameter, which is the path of the directory we want to list files from.
The function returns an iterator that allows us to iterate over each entry. Each DirEntry
object has methods like is_file()
and is_dir()
to determine if it is a file or a directory.
Compared with the previously discussed os.listdir()
method, os.scandir()
provides more functionality by returning DirEntry
objects. These objects have additional methods and attributes that can retrieve metadata or perform operations on the entries.
This makes it more versatile when dealing with complex file operations. In the following code, we have displayed the files in the directory and the subdirectories of MyFolder
using the os.scandir()
method in Python.
import os
def list_files(directory_path):
files = []
for entry in os.scandir(directory_path):
if entry.is_file():
files.append(entry.path)
elif entry.is_dir():
files.extend(list_files(entry.path))
return files
# Usage
directory_path = "MyFolder"
all_files = list_files(directory_path)
print("\n".join(all_files))
Output:
The above code defines a function list_files
that takes a directory_path
as input and returns a list of all files inside the directory and its subdirectories. Within the function, it initializes an empty list, files
, to store the file paths.
It uses os.scandir()
to iterate over the entries (files and directories) in the specified directory. For each entry, if it is a file, the path is appended to the files
list.
If it is a directory, the function is called recursively to get the files within that subdirectory, and the resulting files are added to the files
list.
After defining the function, we can specify the directory_path
and call the list_files
function. It will return a list of all file paths within the directory and its subdirectories.
As shown in the example, we can then iterate over this list to perform any desired operations, such as printing the file paths.
Use the os.walk()
Function to List All Files in the Directory and Subdirectories in Python
Using this module, we can fetch, create, remove, and change the directories. The os.walk()
is a recursive method that generates the file names in a directory tree by either walking in a top-down or bottom-up manner.
Syntax:
for root, dirs, files in os.walk(directory_path):
# Code to process files
When using os.walk()
, we provide the directory_path
parameter, representing the path of the directory we want to traverse. This can be either a relative or absolute path.
The function returns a generator that yields a tuple at each iteration. This tuple contains three values: root
, dirs
, and files
.
- The
root
represents the current directory being traversed. It can be useful for constructing the full path of the files or performing specific operations based on the directory. - The
dirs
is a list of directories in the current directory. It allows us to access and manipulate the subdirectories if needed. - The
files
is a list of files in the current directory. This is where we can access and process each file individually.
The os.walk()
method provides a generator-based approach to traverse directories, providing each level access to root, directories, and files. On the other hand, os.listdir()
returns a list of entries directly, requiring explicit handling of files and directories within the processing code, and os.scandir()
returns DirEntry
objects with additional functionality.
In the following code, we have displayed the files in the directory and the subdirectories of MyFolder
using the os.walk()
method in Python.
import os
root = "MyFolder"
for path, subdirs, files in os.walk(root):
for name in files:
print(os.path.join(path, name))
Output:
The provided code uses the os.walk()
function to traverse the directory tree starting from the MyFolder
directory. It iterates over the root, subdirectories, and files at each level.
It prints the full path for each file encountered by joining the current path with the file name using os.path.join()
.
Use the glob.glob()
Function to List All Files in the Directory and Subdirectories in Python
The glob
is a built-in module in Python that stands for global. This module returns all file paths whose name and extension match a specific pattern.
Syntax:
import glob
file_paths = glob.glob(directory_path + "/**/*", recursive=True)
When using glob.glob()
function, we provide the directory_path
parameter along with the pattern '/**/*'
. The recursive=True
argument ensures that subdirectories are included in the search.
The function returns a list of file paths that match the specified pattern within the directory and its subdirectories.
Compared with the previous methods, the glob.glob()
function provides a more concise way to list files using pattern-matching rules. By utilizing glob.glob()
, we can efficiently list all files in a directory and its subdirectories in Python.
The pattern **
will match any files and zero or more folders and subdirectories if recursive is set to True
. In the following code, we have displayed the files in the directory and the subdirectories of MyFolder
using the glob.glob()
method in Python.
import glob
path = "MyFolder\**\*.*"
for file in glob.glob(path, recursive=True):
print(file)
Output:
The above code utilizes the glob.glob()
function to list all files in a directory and its subdirectories in Python based on a specified pattern.
The path
variable represents the pattern used for matching files. The pattern 'MyFolder\**\*.*'
specifies the starting directory as MyFolder
and uses the **
wildcard to indicate all subdirectories. The *.*
wildcard matches any file name with any extension.
The code uses a for
loop to iterate over the file paths returned by glob.glob()
. For each matching file, it prints the file path.
Use the pathlib.path()
Function to List All Files in the Directory and Subdirectories in Python
The pathlib.Path()
function in Python, part of the pathlib
module, provides an object-oriented approach for working with file paths. It can list all files in a directory and its subdirectories.
Syntax:
import pathlib
path = pathlib.Path(directory_path)
for file in path.glob('**/*'):
# Code to process files
When using pathlib.Path()
, we pass the directory_path
parameter to create a Path
object representing the specified directory.
We then utilize the glob()
method on the Path
object with the pattern '**/*'
to traverse the directory and subdirectories recursively. This method returns a generator that yields matching file paths.
Compared with the previously discussed methods, the pathlib.path()
function provides a more object-oriented and intuitive approach to file path manipulation.
By using pathlib.Path()
and its glob()
method, we can efficiently list all files in a directory and its subdirectories in Python while benefiting from the object-oriented features of the pathlib
module.
In the following code, we have displayed the files in the directory and the subdirectories of MyFolder
using the pathlib.path()
method in Python.
import pathlib
def list_files(directory_path):
path = pathlib.Path(directory_path)
for file in path.glob("**/*"):
if file.is_file():
print(file)
# Usage
directory_path = "MyFolder"
list_files(directory_path)
Output:
In the above code, we define the list_files
function that takes a directory_path
as input. We create a pathlib.Path
object using the provided directory path.
We then use the glob
method with the pattern '**/*'
to traverse the directory and its subdirectories recursively. For each item returned by the glob
method, we check if it is a file using the is_file()
method.
If it is a file, we print its path. We can specify the directory_path
variable to the desired directory, and the code will print all the files in that directory and its subdirectories.
Conclusion
We have explored various methods for listing all files in a directory and its subdirectories in Python.
os.listdir()
provides a basic approach but lacks recursive functionality.os.scandir()
offers enhanced functionality withDirEntry
objects but requires additional handling.os.walk()
is a generator-based approach that yields tuples, providing comprehensive directory traversal capabilities.glob.glob()
allows pattern matching for a more specific file selection.pathlib.Path()
offers an object-oriented approach with a versatileglob()
method and intuitive file path manipulation.
Each method has advantages and disadvantages, providing different levels of functionality and convenience. Choosing the most suitable method depends on the specific requirements of the task at hand.
I am Fariba Laiq from Pakistan. An android app developer, technical content writer, and coding instructor. Writing has always been one of my passions. I love to learn, implement and convey my knowledge to others.
LinkedIn