Using the Linux Command Line to Find and Copy A Large Number of Files from a Large Archive, Preserving Metadata


One of my recent challenges is to go through an archive on a NAS and find all of the .xlsx files, then copy them; preserving as much of the file metadata (date created, folder tree, etc) as possible, to a specified folder.  After this copy, they will be gone through with another script, to rename the files, using the metadata, where they will then be processed by an application, which utilizes the name of the file in its process.

The part I want to share here, is finding the files and copying them to a folder, with metadata preserved.  This is where the power of the find utility comes in handy.

Since this is a huge archive, I want to first produce a list of the files, that way I will be able to break this up into two steps. This will produce a list and write it into a text file.  I am first going to run a find command on the volume I have mounted called data in my Volumes folder.

find /Volumes/data/archive/2012 -name '*.xlsx' > ~/archive/2012_files.txt

Now that the list is saved into a text file, I want to copy the files in the list, preserving the file metadata and path information, to my archive folder.  The cpio utility accepts the paths of the files to copy from stdin, then copies them to my archive folder.

cat ~/archive/2012_files.txt | cpio -pvdm ~/archive

Leave a Reply

Your email address will not be published. Required fields are marked *