Getting all CSV files in directory and subdirectories using Python

Below is the code

That’s it !!

While above code is written for searching csv files recursively in directory and subdirectory; it can be used to search for any file type. You just need to change the EXT.
So say you want to find all the .css files, all you have to do is change the EXT to .css

Code explained

Now let’s go line by line. I’ll try to explain every line of code.

First we import glob and os modules

  • os module provides us operating system dependent functionality.
    In our case we’ll be using it’s walk and path.join functions. These are explained in details shortly.
  • glob module is used for finding pathnames matching a specific pattern. So we can tell it to find all js files by specifying something like *.js.

Next we are setting two constants.

  • PATH constant is the path of the directory inside which we have to search.
  • EXT constant is the pattern for the extension we intend to search for.

Then we have

Above line written in typical python style can be rewritten as below.

  • os.walk("path/to/some/directory") will list down all the file names in a directory tree (directory and any subdirectories inside it and subdirectories inside subdirectory and so on).
    It yields (it’s a generator) a tuple with 3 elements – path of the directory, list of subdirectories inside current path and all the files in directory.
    So suppose you have below directory structure.

    For above directory below code

    will give below output
  • Now on next line glob(os.path.join(path, EXT)) is called inside the loop. So it will be executed for every subdirectory. If we consider the blog directory above it will be executed for
    • path/to/some/directory/blog
    • path/to/some/directory/blog/controllers
    • path/to/some/directory/blog/models
  • os.path.join(path, EXT) will join the two paths; so os.path.join('path/to/some/directory/blog', '*.csv') will return path/to/some/directory/blog/*.csv.
  • glob('path/to/some/directory/blog/*.csv') returns all the .csv files in blog directory (just blog directory and not it’s subdirectories). But since this is iterated for every directory we are calling glob on every directory.

Hope this explains the code.

References

Leave a Reply

Your email address will not be published. Required fields are marked *