Below is the code
import os from glob import glob PATH = "/home/someuser/projects/someproject" EXT = "*.csv" all_csv_files = [file for path, subdir, files in os.walk(PATH) for file in glob(os.path.join(path, EXT))] print(all_csv_files)
That’s it !!
While above code is written for searching csv files recursively in directory and subdirectory; it can be used to search for any file type. You just need to change the
So say you want to find all the
.css files, all you have to do is change the
EXT = "*.css"
Now let’s go line by line. I’ll try to explain every line of code.
First we import
import os from glob import glob
osmodule provides us operating system dependent functionality.
path.joinfunctions. These are explained in details shortly.
globmodule is used for finding pathnames matching a specific pattern. So we can tell it to find all js files by specifying something like
Next we are setting two constants.
PATH = "/home/someuser/projects/someproject" EXT = "*.csv"
PATHconstant is the path of the directory inside which we have to search.
EXTconstant is the pattern for the extension we intend to search for.
Then we have
all_csv_files = [file for path, subdir, files in os.walk(PATH) for file in glob(os.path.join(path, EXT))]
Above line written in typical python style can be rewritten as below.
all_csv_files =  for path, subdir, files in os.walk(PATH): for file in glob(os.path.join(path, EXT)): all_csv_files.append(file)
os.walk("path/to/some/directory")will list down all the file names in a directory tree (directory and any subdirectories inside it and subdirectories inside subdirectory and so on).
|-blog |-index.php |-controllers |-controllers\HomeController.php |-models |-models\BlogModel.php
For above directory below code
for path, subdir, files in os.walk("path/to/some/directory/blog"): print("---------------") print("Path", path) print("Subdir", subdir) print("Files", files)
will give below output
--------------- ('Path', 'path/to/some/directory/blog') ('Subdir', ['controllers', 'models']) ('Files', ['index.php']) --------------- ('Path', 'path/to/some/directory/blog/controllers') ('Subdir', ) ('Files', ['HomeController.php']) --------------- ('Path', 'path/to/some/directory/blog/models') ('Subdir', ) ('Files', ['BlogModel.php'])
glob(os.path.join(path, EXT))is called inside the loop. So it will be executed for every subdirectory. If we consider the
blogdirectory above it will be executed for
os.path.join(path, EXT)will join the two paths; so
os.path.join('path/to/some/directory/blog', '*.csv')will return
glob('path/to/some/directory/blog/*.csv')returns all the
blogdirectory (just blog directory and not it’s subdirectories). But since this is iterated for every directory we are calling
globon every directory.
Hope this explains the code.