Using the Linux Command Line to Find and Copy A Large Number of Files from a Large Archive, Preserving Metadata

One of my recent chal­lenges is to go through an archive on a NAS and find all of the .xlsx files, then copy them; pre­serv­ing as much of the file meta­data (date cre­at­ed, fold­er tree, etc) as pos­si­ble, to a spec­i­fied fold­er.  After this copy, they will be gone through with anoth­er script, to rename the files, using the meta­data, where they will then be processed by an appli­ca­tion, which uti­lizes the name of the file in its process.

The part I want to share here, is find­ing the files and copy­ing them to a fold­er, with meta­data pre­served.  This is where the pow­er of the find util­i­ty comes in handy.

Since this is a huge archive, I want to first pro­duce a list of the files, that way I will be able to break this up into two steps. This will pro­duce a list and write it into a text file.  I am first going to run a find com­mand on the vol­ume I have mount­ed called data in my Vol­umes fold­er.

find /Volumes/data/archive/2012 -name '*.xlsx' > ~/archive/2012_files.txt

Now that the list is saved into a text file, I want to copy the files in the list, pre­serv­ing the file meta­data and path infor­ma­tion, to my archive fold­er.  The cpio util­i­ty accepts the paths of the files to copy from std­in, then copies them to my archive fold­er.

cat ~/archive/2012_files.txt | cpio -pvdm ~/archive