How do I list all files of a directory?

Asked 2023-09-20 20:16:39 View 255,864

How can I list all files of a directory in Python and add them to a list?

Answers

os.listdir() returns everything inside a directory -- including both files and directories.

os.path's isfile() can be used to only list files:

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

Alternatively, os.walk() yields two lists for each directory it visits -- one for files and one for dirs. If you only want the top directory you can break the first time it yields:

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

or, shorter:

from os import walk

filenames = next(walk(mypath), (None, None, []))[2]  # [] if no file

Answered   2023-09-20 20:16:39

  • To get all the full paths of the files in the sub-folders, recursively: [os.path.join(dirpath,f) for (dirpath, dirnames, filenames) in os.walk(mypath) for f in filenames] - anyone
  • Why the need of the break? - anyone
  • @João Víctor Melo os.walk() recursively visits all subdirectories, their subdirectories, and so on. By breaking, we only visit the first directory mypath. - anyone

I prefer using the glob module, as it does pattern matching and expansion.

import glob
print(glob.glob("/home/adam/*"))

It does pattern matching intuitively

import glob
# All files and directories ending with .txt and that don't begin with a dot:
print(glob.glob("/home/adam/*.txt")) 
# All files and directories ending with .txt with depth of 2 folders, ignoring names beginning with a dot:
print(glob.glob("/home/adam/*/*.txt")) 

It will return a list with the queried files and directories:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

Note that glob ignores files and directories that begin with a dot ., as those are considered hidden files and directories, unless the pattern is something like .*.

Use glob.escape to escape strings that are not meant to be patterns:

print(glob.glob(glob.escape(directory_name) + "/*.txt"))

Answered   2023-09-20 20:16:39

  • from glob import glob replaces glob.glob() with glob(). - anyone
  • from glob import glob as g replaces glob() by g(). - anyone

list in the current directory

With listdir in os module you get the files and the folders in the current dir

import os

arr = os.listdir()

Looking in a directory

arr = os.listdir('c:\\files')

with glob you can specify a type of file to list like this

import glob

txtfiles = []
for file in glob.glob("*.txt"):
    txtfiles.append(file)

or

mylist = [f for f in glob.glob("*.txt")]

get the full path of only files in the current directory

import os
from os import listdir
from os.path import isfile, join

cwd = os.getcwd()
onlyfiles = [os.path.join(cwd, f) for f in os.listdir(cwd) if 
os.path.isfile(os.path.join(cwd, f))]
print(onlyfiles) 

['G:\\getfilesname\\getfilesname.py', 'G:\\getfilesname\\example.txt']

Getting the full path name with os.path.abspath

You get the full path in return

 import os
 files_path = [os.path.abspath(x) for x in os.listdir()]
 print(files_path)
 
 ['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

Walk: going through sub directories

os.walk returns the root, the directories list and the files list, that is why I unpacked them in r, d, f in the for loop; it, then, looks for other files and directories in the subfolders of the root and so on until there are no subfolders.

import os

# Getting the current work directory (cwd)
thisdir = os.getcwd()

# r=root, d=directories, f = files
for r, d, f in os.walk(thisdir):
    for file in f:
        if file.endswith(".docx"):
            print(os.path.join(r, file))

To go up in the directory tree

# Method 1
x = os.listdir('..')

# Method 2
x= os.listdir('/')

Get files of a particular subdirectory with os.listdir()

import os

x = os.listdir("./content")

os.walk('.') - current directory

 import os
 arr = next(os.walk('.'))[2]
 print(arr)
 
 >>> ['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

next(os.walk('.')) and os.path.join('dir', 'file')

 import os
 arr = []
 for d,r,f in next(os.walk("F:\\_python")):
     for file in f:
         arr.append(os.path.join(r,file))

 for f in arr:
     print(files)

>>> F:\\_python\\dict_class.py
>>> F:\\_python\\programmi.txt

next... walk

 [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]
 
 >>> ['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]
print(x)

>>> ['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() - get only txt files

 arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
 

Using glob to get the full path of the files

from path import path
from glob import glob

x = [path(f).abspath() for f in glob("F:\\*.txt")]

Using os.path.isfile to avoid directories in the list

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]

Using pathlib from Python 3.4

import pathlib

flist = []
for p in pathlib.Path('.').iterdir():
    if p.is_file():
        print(p)
        flist.append(p)

With list comprehension:

flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

Use glob method in pathlib.Path()

import pathlib

py = pathlib.Path().glob("*.py")

Get all and only files with os.walk: checks only in the third element returned, i.e. the list of the files

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)

Get only files with next in a directory: returns only the file in the root folder

 import os
 x = next(os.walk('F://python'))[2]

Get only directories with next and walk in a directory, because in the [1] element there are the folders only

 import os
 next(os.walk('F://python'))[1] # for the current dir use ('.')
 
 >>> ['python3','others']

Get all the subdir names with walk

for r,d,f in os.walk("F:\\_python"):
    for dirs in d:
        print(dirs)

os.scandir() from Python 3.5 and greater

import os
x = [f.name for f in os.scandir() if f.is_file()]

# Another example with `scandir` (a little variation from docs.python.org)
# This one is more efficient than `os.listdir`.
# In this case, it shows the files only in the current directory
# where the script is executed.

import os
with os.scandir() as i:
    for entry in i:
        if entry.is_file():
            print(entry.name)

Answered   2023-09-20 20:16:39

import os
os.listdir("somedirectory")

will return a list of all files and directories in "somedirectory".

Answered   2023-09-20 20:16:39

A one-line solution to get only list of files (no subdirectories):

filenames = next(os.walk(path))[2]

or absolute pathnames:

paths = [os.path.join(path, fn) for fn in next(os.walk(path))[2]]

Answered   2023-09-20 20:16:39

Getting Full File Paths From a Directory and All Its Subdirectories

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • The path I provided in the above function contained 3 files— two of them in the root directory, and another in a subfolder called "SUBFOLDER." You can now do things like:
  • print full_file_paths which will print the list:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']

If you'd like, you can open and read the contents, or focus only on files with the extension ".dat" like in the code below:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat

Answered   2023-09-20 20:16:39

Since version 3.4 there are builtin iterators for this which are a lot more efficient than os.listdir():

pathlib: New in version 3.4.

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

According to PEP 428, the aim of the pathlib library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.

os.scandir(): New in version 3.5.

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

Note that os.walk() uses os.scandir() instead of os.listdir() from version 3.5, and its speed got increased by 2-20 times according to PEP 471.

Let me also recommend reading ShadowRanger's comment below.

Answered   2023-09-20 20:16:39

  • Note: The os.scandir solution is going to be more efficient than os.listdir with an os.path.is_file check or the like, even if you need a list (so you don't benefit from lazy iteration), because os.scandir uses OS provided APIs that give you the is_file information for free as it iterates, no per-file round trip to the disk to stat them at all (on Windows, the DirEntrys get you complete stat info for free, on *NIX systems it needs to stat for info beyond is_file, is_dir, etc., but DirEntry caches on first stat for convenience). - anyone

Preliminary notes

  • Although there's a clear differentiation between file and directory terms in the question text, some may argue that directories are actually special files

  • The statement: "all files of a directory" can be interpreted in two ways:

    1. All direct (or level 1) descendants only

    2. All descendants in the whole directory tree (including the ones in sub-directories)

  • When the question was asked, I imagine that Python 2, was the LTS version, however the code samples will be run by Python 3(.5) (I'll keep them as Python 2 compliant as possible; also, any code belonging to Python that I'm going to post, is from v3.5.4 - unless otherwise specified).
    That has consequences related to another keyword in the question: "add them into a list":

    >>> import sys
    >>>
    >>> sys.version
    '2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda function
    >>> m, type(m)
    ([1, 2, 3], <type 'list'>)
    >>> len(m)
    3
    

    >>> import sys
    >>>
    >>> sys.version
    '3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)]'
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class 'map'>)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type 'map' has no len()
    >>> lm0 = list(m)  # Build a list from the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class 'list'>)
    >>>
    >>> lm1 = list(m)  # Build a list from the same generator
    >>> lm1, type(lm1)  # Empty list now - generator already exhausted
    ([], <class 'list'>)
    
  • The examples will be based on a directory called root_dir with the following structure (this example is for Win, but I'm using the same tree on Nix as well). Note that I'll be reusing the console:

    [cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q003207219]> sopr.bat
    ### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
    
    [prompt]> 
    [prompt]> tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
    ¦   file0
    ¦   file1
    ¦
    +---dir0
    ¦   +---dir00
    ¦   ¦   ¦   file000
    ¦   ¦   ¦
    ¦   ¦   +---dir000
    ¦   ¦           file0000
    ¦   ¦
    ¦   +---dir01
    ¦   ¦       file010
    ¦   ¦       file011
    ¦   ¦
    ¦   +---dir02
    ¦       +---dir020
    ¦           +---dir0200
    +---dir1
    ¦       file10
    ¦       file11
    ¦       file12
    ¦
    +---dir2
    ¦   ¦   file20
    ¦   ¦
    ¦   +---dir20
    ¦           file200
    ¦
    +---dir3
    

Solutions

Programmatic approaches

1. [Python.Docs]: os.listdir(path='.')

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' ...


>>> import os
>>>
>>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
>>>
>>> os.listdir(root_dir)  # List all the items in root_dir
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter items and only keep files (strip out directories)
['file0', 'file1']

A more elaborate example (code_os_listdir.py):

#!/usr/bin/env python

import os
import sys
from pprint import pformat as pf


def _get_dir_content(path, include_folders, recursive):
    entries = os.listdir(path)
    for entry in entries:
        entry_with_path = os.path.join(path, entry)
        if os.path.isdir(entry_with_path):
            if include_folders:
                yield entry_with_path
            if recursive:
                for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                    yield sub_entry
        else:
            yield entry_with_path


def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
    path_len = len(path) + len(os.path.sep)
    for item in _get_dir_content(path, include_folders, recursive):
        yield item if prepend_folder_name else item[path_len:]


def _get_dir_content_old(path, include_folders, recursive):
    entries = os.listdir(path)
    ret = list()
    for entry in entries:
        entry_with_path = os.path.join(path, entry)
        if os.path.isdir(entry_with_path):
            if include_folders:
                ret.append(entry_with_path)
            if recursive:
                ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
        else:
            ret.append(entry_with_path)
    return ret


def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
    path_len = len(path) + len(os.path.sep)
    return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]


def main(*argv):
    root_dir = "root_dir"
    ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
    lret0 = list(ret0)
    print("{:} {:d}\n{:s}".format(ret0, len(lret0), pf(lret0)))
    ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
    print("\n{:d}\n{:s}".format(len(ret1), pf(ret1)))


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.\n")
    sys.exit(rc)

Notes:

  • There are two implementations:

    1. One that uses generators (of course here it seems useless, since I immediately convert the result to a list)

    2. The classic one (function names ending in _old)

  • Recursion is used (to get into subdirectories)

  • For each implementation there are two functions:

    • One that starts with an underscore (_): "private" (should not be called directly) - that does all the work

    • The public one (wrapper over previous): it just strips off the initial path (if required) from the returned entries. It's an ugly implementation, but it's the only idea that I could come with at this point

  • In terms of performance, generators are generally a little bit faster (considering both creation and iteration times), but I didn't test them in recursive functions, and also I am iterating inside the function over inner generators - don't know how performance friendly is that

  • Play with the arguments to get different results

Output:

[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.05.04_test0\Scripts\python.exe" ".\code_os_listdir.py"
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] 064bit on win32

<generator object get_dir_content at 0x000002C080418F68> 22
['root_dir\\dir0',
 'root_dir\\dir0\\dir00',
 'root_dir\\dir0\\dir00\\dir000',
 'root_dir\\dir0\\dir00\\dir000\\file0000',
 'root_dir\\dir0\\dir00\\file000',
 'root_dir\\dir0\\dir01',
 'root_dir\\dir0\\dir01\\file010',
 'root_dir\\dir0\\dir01\\file011',
 'root_dir\\dir0\\dir02',
 'root_dir\\dir0\\dir02\\dir020',
 'root_dir\\dir0\\dir02\\dir020\\dir0200',
 'root_dir\\dir1',
 'root_dir\\dir1\\file10',
 'root_dir\\dir1\\file11',
 'root_dir\\dir1\\file12',
 'root_dir\\dir2',
 'root_dir\\dir2\\dir20',
 'root_dir\\dir2\\dir20\\file200',
 'root_dir\\dir2\\file20',
 'root_dir\\dir3',
 'root_dir\\file0',
 'root_dir\\file1']

11
['dir0\\dir00\\dir000\\file0000',
 'dir0\\dir00\\file000',
 'dir0\\dir01\\file010',
 'dir0\\dir01\\file011',
 'dir1\\file10',
 'dir1\\file11',
 'dir1\\file12',
 'dir2\\dir20\\file200',
 'dir2\\file20',
 'file0',
 'file1']

Done.

2. [Python.Docs]: os.scandir(path='.')

In Python 3.5+ only, backport: [PyPI]: scandir:

Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.


>>> import os
>>>
>>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
>>> root_dir
'.\\root_dir'
>>>
>>> scandir_iterator = os.scandir(root_dir)
>>> scandir_iterator
<nt.ScandirIterator object at 0x00000268CF4BC140>
>>> [item.path for item in scandir_iterator]
['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
>>>
>>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
[]
>>>
>>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
>>> for item in scandir_iterator :
...     if os.path.isfile(item.path):
...             print(item.name)
...
file0
file1

Notes:

  • Similar to os.listdir

  • But it's also more flexible (and offers more functionality), more Pythonic (and in some cases, faster)

3. [Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False)

Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).


>>> import os
>>>
>>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
>>> root_dir
'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
>>>
>>> walk_generator = os.walk(root_dir)
>>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (passed as an argument)
>>> root_dir_entry
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
>>>
>>> root_dir_entry[1] + root_dir_entry[2]  # Display dirs and files (direct descendants) in a single list
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
>>>
>>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
...     print(entry)
...
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])

Notes:

  • Under the scenes, it uses os.scandir (os.listdir on older (Python) versions)

  • It does the heavy lifting by recurring in subfolders

4. [Python.Docs]: glob.glob(pathname, *, root_dir=None, dir_fd=None, recursive=False, include_hidden=False)

Or glob.iglob:

Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).
...
Changed in version 3.5: Support for recursive globs using “**”.


>>> import glob, os
>>>
>>> wildcard_pattern = "*"
>>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
>>> root_dir
'root_dir\\*'
>>>
>>> glob_list = glob.glob(root_dir)
>>> glob_list
['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
>>>
>>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> for entry in glob.iglob(root_dir + "*", recursive=True):
...     print(entry)
...
root_dir\
root_dir\dir0
root_dir\dir0\dir00
root_dir\dir0\dir00\dir000
root_dir\dir0\dir00\dir000\file0000
root_dir\dir0\dir00\file000
root_dir\dir0\dir01
root_dir\dir0\dir01\file010
root_dir\dir0\dir01\file011
root_dir\dir0\dir02
root_dir\dir0\dir02\dir020
root_dir\dir0\dir02\dir020\dir0200
root_dir\dir1
root_dir\dir1\file10
root_dir\dir1\file11
root_dir\dir1\file12
root_dir\dir2
root_dir\dir2\dir20
root_dir\dir2\dir20\file200
root_dir\dir2\file20
root_dir\dir3
root_dir\file0
root_dir\file1

Notes:

  • Uses os.listdir

  • For large trees (especially if recursive is on), iglob is preferred

  • Allows advanced filtering based on name (due to the wildcard)

5. [Python.Docs]: class pathlib.Path(*pathsegments)

Python 3.4+, backport: [PyPI]: pathlib2.

>>> import pathlib
>>>
>>> root_dir = "root_dir"
>>> root_dir_instance = pathlib.Path(root_dir)
>>> root_dir_instance
WindowsPath('root_dir')
>>> root_dir_instance.name
'root_dir'
>>> root_dir_instance.is_dir()
True
>>>
>>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
['root_dir\\file0', 'root_dir\\file1']

Notes:

  • This is one way of achieving our goal

  • It's the OOP style of handling paths

  • Offers lots of functionalities

6. [Python 2.Docs]: dircache.listdir(path)


def listdir(path):
    """List directory contents, using cache."""
    try:
        cached_mtime, list = cache[path]
        del cache[path]
    except KeyError:
        cached_mtime, list = -1, []
    mtime = os.stat(path).st_mtime
    if mtime != cached_mtime:
        list = os.listdir(path)
        list.sort()
    cache[path] = mtime, list
    return list

7. Native OS APIs

POSIX specific:

Available via [Python.Docs]: ctypes - A foreign function library for Python:

ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

Not directly related, but check [SO]: C function called from Python via ctypes returns incorrect value (@CristiFati's answer) out before working with CTypes.

code_ctypes.py:

#!/usr/bin/env python3

import ctypes as cts
import sys


DT_DIR = 4
DT_REG = 8


class NixDirent64(cts.Structure):
    _fields_ = (
        ("d_ino", cts.c_ulonglong),
        ("d_off", cts.c_longlong),
        ("d_reclen", cts.c_ushort),
        ("d_type", cts.c_ubyte),
        ("d_name", cts.c_char * 256),
    )

NixDirent64Ptr = cts.POINTER(NixDirent64)


libc = this_process = cts.CDLL(None, use_errno=True)

opendir = libc.opendir
opendir.argtypes = (cts.c_char_p,)
opendir.restype = cts.c_void_p
readdir = libc.readdir
readdir.argtypes = (cts.c_void_p,)
readdir.restype = NixDirent64Ptr
closedir = libc.closedir
closedir.argtypes = (cts.c_void_p,)


def get_dir_content(path):
    ret = [path, [], []]
    pdir = opendir(cts.create_string_buffer(path.encode()))
    if not pdir:
        print("opendir returned NULL (errno: {:d})".format(cts.get_errno()))
        return ret
    cts.set_errno(0)
    while True:
        pdirent = readdir(pdir)
        if not pdirent:
            break
        dirent = pdirent.contents
        name = dirent.d_name.decode()
        if dirent.d_type & DT_DIR:
            if name not in (".", ".."):
                ret[1].append(name)
        elif dirent.d_type & DT_REG:
            ret[2].append(name)
    if cts.get_errno():
        print("readdir returned NULL (errno: {:d})".format(cts.get_errno()))
    closedir(pdir)
    return ret


def main(*argv):
    root_dir = "root_dir"
    entries = get_dir_content(root_dir)
    print("Entries:\n{:}".format(entries))


if __name__ == "__main__":
    print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                   64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.\n")
    sys.exit(rc)

Notes:

  • It loads the three functions from LibC (libc.so - loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists without exceptions? (@CristiFati's answer) - last notes from item #4.). That would place this approach very close to the Python / C edge

  • NixDirent64 is the CTypes representation of struct dirent64 from [Man7]: dirent.h(0P) (so are the DT_ constants) from my Ubuntu OS. On other flavors / versions, the structure definition might differ, and if so, the CTypes alias should be updated, otherwise it will yield Undefined Behavior

  • It returns data in the os.walk's format. I didn't bother to make it recursive, but starting from the existing code, that would be a fairly trivial task

  • Everything is doable on Win as well, the data (libraries, functions, structs, constants, ...) differ

Output:

[cfati@cfati-5510-0:/mnt/e/Work/Dev/StackOverflow/q003207219]> python3.5 ./code_ctypes.py
Python 3.5.10 (default, Jan 15 2022, 19:53:00) [GCC 9.3.0] 064bit on linux

Entries:
['root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1']]

Done.

8. [TimGolden]: win32file.FindFilesW

Win specific:

Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions.


>>> import os, win32file as wfile, win32con as wcon
>>>
>>> root_dir = "root_dir"
>>> root_dir_wildcard = os.path.join(root_dir, "*")
>>> entry_list = wfile.FindFilesW(root_dir_wildcard)
>>> len(entry_list)  # Don't display the whole content as it's too long
8
>>> [entry[-2] for entry in entry_list]  # Only display the entry names
['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
>>>
>>> [entry[-2] for entry in entry_list if entry[0] & wcon.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
['dir0', 'dir1', 'dir2', 'dir3']
>>>
>>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (wcon.FILE_ATTRIBUTE_NORMAL | wcon.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
['root_dir\\file0', 'root_dir\\file1']

Notes:

9. Use some (other) 3rd-party package that does the trick

Most likely, will rely on one (or more) of the above (maybe with slight customizations).

Notes:

  • Code is meant to be portable (except places that target a specific area - which are marked) or cross:

    • OS (Nix, Win, )

    • Python version (2, 3, )

  • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the "tools" used are flexible in this direction

  • os.listdir and os.scandir use opendir / readdir / closedir ([MS.Learn]: FindFirstFileW function (fileapi.h) / [MS.Learn]: FindNextFileW function (fileapi.h) / [MS.Learn]: FindClose function (fileapi.h)) (via [GitHub]: python/cpython - (main) cpython/Modules/posixmodule.c)

  • win32file.FindFilesW uses those (Win specific) functions as well (via [GitHub]: mhammond/pywin32 - (main) pywin32/win32/src/win32file.i)

  • _get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)

    • Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn't strip out anything) and inside _get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
  • Nota Bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 pc064), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range (recursionlimit - 1000 (default)), I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don't know if that is even possible), that could be a problem.
    I must also mention that I didn't try to increase recursionlimit, but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine).
    Check [SO]: _csv.Error: field larger than field limit (131072) (@CristiFati's answer) for more details on the topic

  • Code samples are for demonstrative purposes only. That means that I didn't take into account error handling (I don't think there's any try / except / else / finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well

Other approaches:

1. Use Python only as a wrapper

  • Everything is done using another technology

  • That technology is invoked from Python

  • The most famous flavor that I know is what I call the SysAdmin approach:

    • Use Python (or any programming language for that matter) in order to execute Shell commands (and parse their outputs)

    • Some consider this a neat hack

    • I consider it more like a lame workaround (gainarie), as the action per se is performed from Shell (Cmd in this case), and thus doesn't have anything to do with Python

    • Filtering (grep / findstr) or output formatting could be done on both sides, but I'm not going to insist on it. Also, I deliberately used os.system instead of [Python.Docs]: subprocess - Subprocess management routines (run, check_output, ...)

[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.05.04_test0\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
dir0
dir1
dir2
dir3
file0
file1

[cfati@cfati-5510-0:/mnt/e/Work/Dev/StackOverflow/q003207219]> python3.5 -c "import os;os.system(\"ls root_dir\")"
dir0  dir1  dir2  dir3  file0  file1

In general, this approach is to be avoided, since if some command output format slightly differs between OS versions / flavors, the parsing code should be adapted as well - not to mention differences between locales.

Answered   2023-09-20 20:16:39

I really liked adamk's answer, suggesting that you use glob(), from the module of the same name. This allows you to have pattern matching with *s.

But as other people pointed out in the comments, glob() can get tripped up over inconsistent slash directions. To help with that, I suggest you use the join() and expanduser() functions in the os.path module, and perhaps the getcwd() function in the os module, as well.

As examples:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

The above is terrible - the path has been hardcoded and will only ever work on Windows between the drive name and the \s being hardcoded into the path.

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

The above works better, but it relies on the folder name Users which is often found on Windows and not so often found on other OSs. It also relies on the user having a specific name, admin.

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

This works perfectly across all platforms.

Another great example that works perfectly across platforms and does something a bit different:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

Hope these examples help you see the power of a few of the functions you can find in the standard Python library modules.

Answered   2023-09-20 20:16:39

def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 

Answered   2023-09-20 20:16:39

If you are looking for a Python implementation of find, this is a recipe I use rather frequently:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

So I made a PyPI package out of it and there is also a GitHub repository. I hope that someone finds it potentially useful for this code.

Answered   2023-09-20 20:16:39

For greater results, you can use listdir() method of the os module along with a generator (a generator is a powerful iterator that keeps its state, remember?). The following code works fine with both versions: Python 2 and Python 3.

Here's a code:

import os

def files(path):  
    for file in os.listdir(path):
        if os.path.isfile(os.path.join(path, file)):
            yield file

for file in files("."):  
    print (file)

The listdir() method returns the list of entries for the given directory. The method os.path.isfile() returns True if the given entry is a file. And the yield operator quits the func but keeps its current state, and it returns only the name of the entry detected as a file. All the above allows us to loop over the generator function.

Answered   2023-09-20 20:16:39

Returning a list of absolute filepaths, does not recurse into subdirectories

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

Answered   2023-09-20 20:16:39

  • Note: os.path.abspath(f) would be a somewhat cheaper substitute for os.path.join(os.getcwd(),f). - anyone
  • I'd be more efficient still if you started with cwd = os.path.abspath('.'), then used cwd instead of '.' and os.getcwd() throughout to avoid loads of redundant system calls. - anyone

A wise teacher told me once that:

When there are several established ways to do something, none of them is good for all cases.

I will thus add a solution for a subset of the problem: quite often, we only want to check whether a file matches a start string and an end string, without going into subdirectories. We would thus like a function that returns a list of filenames, like:

filenames = dir_filter('foo/baz', radical='radical', extension='.txt')

If you care to first declare two functions, this can be done:

def file_filter(filename, radical='', extension=''):
    "Check if a filename matches a radical and extension"
    if not filename:
        return False
    filename = filename.strip()
    return(filename.startswith(radical) and filename.endswith(extension))

def dir_filter(dirname='', radical='', extension=''):
    "Filter filenames in directory according to radical and extension"
    if not dirname:
        dirname = '.'
    return [filename for filename in os.listdir(dirname)
                if file_filter(filename, radical, extension)]

This solution could be easily generalized with regular expressions (and you might want to add a pattern argument, if you do not want your patterns to always stick to the start or end of the filename).

Answered   2023-09-20 20:16:39

import os
import os.path


def get_files(target_dir):
    item_list = os.listdir(target_dir)

    file_list = list()
    for item in item_list:
        item_dir = os.path.join(target_dir,item)
        if os.path.isdir(item_dir):
            file_list += get_files(item_dir)
        else:
            file_list.append(item_dir)
    return file_list

Here I use a recursive structure.

Answered   2023-09-20 20:16:39

  • The same can be achieved just in one line with pathlib: filter(Path.is_file, Path().rglob('*')) - anyone

Using generators

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

Answered   2023-09-20 20:16:39

Another very readable variant for Python 3.4+ is using pathlib.Path.glob:

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

It is simple to make more specific, e.g. only look for Python source files which are not symbolic links, also in all subdirectories:

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

Answered   2023-09-20 20:16:39

For Python 2:

pip install rglob

Then do

import rglob
file_list = rglob.rglob("/home/base/dir/", "*")
print file_list

Answered   2023-09-20 20:16:39

  • When an external dep can be avoided, do it. What is the added value of using an external dependency when all you need is already in the language? - anyone

Here's my general-purpose function for this. It returns a list of file paths rather than filenames since I found that to be more useful. It has a few optional arguments that make it versatile. For instance, I often use it with arguments like pattern='*.txt' or subfolders=True.

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
    """Return a list of the file paths matching the pattern in the specified 
    folder, optionally including files inside subfolders.
    """
    match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
    walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
    return [os.path.join(root, f)
            for root, dirnames, filenames in walked
            for f in filenames if match(f, pattern)]

Answered   2023-09-20 20:16:39

I will provide a sample one liner where sourcepath and file type can be provided as input. The code returns a list of filenames with csv extension. Use . in case all files needs to be returned. This will also recursively scans the subdirectories.

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

Modify file extensions and source path as needed.

Answered   2023-09-20 20:16:39

  • If you are going to use glob, then just use glob('**/*.csv', recursive=True). No need to combine this with os.walk() to recurse (recursive and ** are supported since Python 3.5). - anyone

dircache is "Deprecated since version 2.6: The dircache module has been removed in Python 3.0."

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

Answered   2023-09-20 20:16:39