How to Get File Extension in Python
-
Use the
os.path
Module to Extract Extension From File in Python -
Use the
pathlib
Module to Extract Extension From File in Python
This tutorial will introduce how to get the file extension from the filename in Python.
Use the os.path
Module to Extract Extension From File in Python
Python has a module os.path
that has pre-made useful utility functions to manipulate OS file paths. It includes opening, saving and updating, and getting the information from file paths.
We will use this module to get the file extension in Python.
os.path
has a function splitext()
to split the root and the extension of the given file path. The function returns a tuple containing the root string and the extension string.
Let’s provide an example file path with a docx
extension.
/Users/user/Documents/sampledoc.docx
The output expected should be the extension .docx
.
Declare two separate variables to catch the result of splitext()
named extension
and root
.
import os
path = "/Users/user/Documents/sampledoc.docx"
root, extension = os.path.splitext(path)
print("Root:", root)
print("extension:", extension)
Output:
Root: /Users/user/Documents/sampledoc
Extension: .docx
The extension has now been successfully returned ted from the root file path.
Use the pathlib
Module to Extract Extension From File in Python
pathlib
is a Python module that contains classes representing file paths and implements utility functions and constants for these classes.
pathlib.Path()
accepts a path string as an argument and returns a new Path
object.
pathlib.Path
object has the attribute suffix
that returns the file extension information.
import pathlib
path = pathlib.Path("/Users/user/Documents/sampledoc.docx")
print("Parent:", path.parent)
print("Filename:", path.name)
print("Extension:", path.suffix)
Other than the root, we can also get the parent file path and the actual file name of the given file path by simply calling the attributes parent
and name
within the Path
object.
Output:
Parent: /Users/user/Documents
Filename: sampledoc.docx
Extension: .docx
What if we have a file extension like .tar.gz
or .tar.bz2
?
pathlib
also provides an attribute for files with multiple suffixes as extensions. The attribute suffixes
within the Path
object is a list containing all of the suffixes of the given file. If we use the example above and print out the suffixes
attribute:
import pathlib
path = pathlib.Path("/Users/user/Documents/sampledoc.docx")
print("Suffix(es):", path.suffixes)
Output:
Suffix(es): ['.docx']
So even if there is only one suffix, the output will result in a singleton list.
Now try an example with a .tar.gz
extension. To convert the list into a single string, the join()
function can be used on an empty string and accept the suffixes
attribute as an argument.
import pathlib
path = pathlib.Path("/Users/user/Documents/app_sample.tar.gz")
print("Parent:", path.parent)
print("Filename:", path.name)
print("Extension:", "".join(path.suffixes))
Output:
Parent: /Users/user/Documents
Filename: app_sample.tar.gz
Extension: .tar.gz
Now the actual extension is displayed instead of a list.
In summary, the two modules os
and pathlib
provide convenient methods to get the file extension from a file path in Python.
The os
module has the function splitext
to split the root and the filename from the file extension. pathlib
creates a Path
object and simply stores the extension within the attribute suffixes
.
If you’re anticipating more than one extension in a file, it would be best to use pathlib
as it provides easy support for multiple extensions using the attribute suffixes
.
Skilled in Python, Java, Spring Boot, AngularJS, and Agile Methodologies. Strong engineering professional with a passion for development and always seeking opportunities for personal and career growth. A Technical Writer writing about comprehensive how-to articles, environment set-ups, and technical walkthroughs. Specializes in writing Python, Java, Spring, and SQL articles.
LinkedIn