Below is the code
import os
from glob import glob
PATH = "/home/someuser/projects/someproject"
EXT = "*.csv"
all_csv_files = [file
for path, subdir, files in os.walk(PATH)
for file in glob(os.path.join(path, EXT))]
print(all_csv_files)
That’s it !!
While above code is written for searching csv files recursively in directory and subdirectory; it can be used to search for any file type. You just need to change the EXT
.
So say you want to find all the .css
files, all you have to do is change the EXT
to .css
EXT = "*.css"
Now let’s go line by line. I’ll try to explain every line of code.
First we import glob
and os
modules
import os
from glob import glob
os
module provides us operating system dependent functionality.walk
and path.join
functions. These are explained in details shortly.glob
module is used for finding pathnames matching a specific pattern. So we can tell it to find all js files by specifying something like *.js
.Next we are setting two constants.
PATH = "/home/someuser/projects/someproject"
EXT = "*.csv"
PATH
constant is the path of the directory inside which we have to search.EXT
constant is the pattern for the extension we intend to search for.Then we have
all_csv_files = [file
for path, subdir, files in os.walk(PATH)
for file in glob(os.path.join(path, EXT))]
Above line written in typical python style can be rewritten as below.
all_csv_files = []
for path, subdir, files in os.walk(PATH):
for file in glob(os.path.join(path, EXT)):
all_csv_files.append(file)
os.walk("path/to/some/directory")
will list down all the file names in a directory tree (directory and any subdirectories inside it and subdirectories inside subdirectory and so on).
|-blog
|-index.php
|-controllers
|-controllers\HomeController.php
|-models
|-models\BlogModel.php
For above directory below code
for path, subdir, files in os.walk("path/to/some/directory/blog"):
print("---------------")
print("Path", path)
print("Subdir", subdir)
print("Files", files)
will give below output
---------------
('Path', 'path/to/some/directory/blog')
('Subdir', ['controllers', 'models'])
('Files', ['index.php'])
---------------
('Path', 'path/to/some/directory/blog/controllers')
('Subdir', [])
('Files', ['HomeController.php'])
---------------
('Path', 'path/to/some/directory/blog/models')
('Subdir', [])
('Files', ['BlogModel.php'])
glob(os.path.join(path, EXT))
is called inside the loop. So it will be executed for every subdirectory. If we consider the blog
directory above it will be executed for
path/to/some/directory/blog
path/to/some/directory/blog/controllers
path/to/some/directory/blog/models
os.path.join(path, EXT)
will join the two paths; so os.path.join('path/to/some/directory/blog', '*.csv')
will return path/to/some/directory/blog/*.csv
.
glob('path/to/some/directory/blog/*.csv')
returns all the .csv
files in blog
directory (just blog directory and not it’s subdirectories). But since this is iterated for every directory we are calling glob
on every directory.
Hope this explains the code.
is EXT can be a list of extensions ??
this is what i did and its not working
ext=[“*.csv”,”*.VSI”,”*.ETH”,”*.POLICER”]
all_csv_files = [file
for input_path, subdir, files in os.walk(PATH)
for file in glob(os.path.join(input_path, (x for x in ext)))]