LINUX: Removing Files Older Than x Days

It can often be use­ful to remove files that are unnec­es­sary, such as log files, back­up files, etc, when it is not already done auto­mat­i­cal­ly. For­tu­nate­ly there is a very sim­ple com­mand to do just that.

Using the find com­mand, it is pos­si­ble to find the files in the fold­er you want to clean out and remove them. The fol­low­ing com­mand scans the fold­er /home/myuser/myfolder/ for files old­er than 30 days and then exe­cutes rm, to remove those files.

find /home/myuser/myfolder/* -mtime +30 -exec rm {} \;

If you want to be cau­tions, you can use the fol­low­ing com­mands to test it out:

To see what find pulls up, you can run this.

find /home/myuser/myfolder/* -mtime +30

If you want to make cer­tain the exec com­mand is given the right para­me­ters, you can run it through ls.

find /home/myuser/myfolder/* -mtime +30 -exec ls -l {} \;

Automatically Check RSYNC and Restart if Stopped

I occa­sion­al­ly use RSYNC to syn­chro­nize large direc­to­ries of files between servers. This is espe­cial­ly use­ful if you’re mov­ing a client from one server to anoth­er and they have alot of sta­t­ic files that are always chang­ing. You can copy the files and sync them up, all with RSYNC and if your con­nec­tion gets cut off, it will start where it left off. It will also grab changes to files that have already been RSYNCd.

I ran into an issue with RSYNC recent­ly, where­in the RSYNC process was run­ning in the back­ground; but was ter­mi­nat­ing due to errors sim­i­lar to the fol­low­ing. The­se con­nec­tions were prob­a­bly relat­ed to the slow and unsta­ble con­nec­tion to the remote server.

rsync: writefd_unbuffered failed to write 998 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (888092 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]

Given that I was trans­fer­ring files through a rel­a­tive­ly bad inter­net con­nec­tion and received this error a half dozen times over a cou­ple of days, I decid­ed the best way to han­dle it, would be to write a cron script. This cron script should check for the RSYNC process and start it if it isn’t run­ning.

rsync_check.sh

Cus­tomize this script for your own pur­pose, to check for your RSYNC process and start it if it isn’t run­ning.

#!/bin/bash
echo "checking for active rsync process"
COUNT=`ps ax | grep rsync | grep -v grep | grep -v rsync_check.sh | wc -l` # see how many are running
echo "there are $COUNT rsync related processes running";
if [ $COUNT -eq 0 ] 
then
	echo "no rsync processes running, restarting process"
	killall rsync  # prevent RSYNCs from piling up, if by some unforeseen reason there are already processes running
	rsync -avz -e "ssh" user@host.com:/mnt/syncdirectory/ /home/ccase/syncdirectory/ 
fi

Crontab Entry

Save the script in the appro­pri­ate cron direc­to­ry, or add it to the cron.d direc­to­ry and put a crontab entry in, to run it at the desired inter­val. This will have it run every 10 min­utes.

*/10 * * * * ccase /etc/cron.d/rsync_check.sh

No More Worries

Now you can move onto oth­er things, with the knowl­edge that your RSYNC will not just fail and leave the work undone. It prob­a­bly wouldn’t hurt to check on it at first and from time to time; but there’s alot less to wor­ry about!

Getting the Last Modification Timestamp of a File with Stat

If we want to get just the date mod­i­fied, for a file, in a for­mat of our choos­ing. This can be done with a util­i­ty called stat.

The syn­tax is as fol­lows:

stat -f <format> -t "<timestamp format>" <path to file>

In this exam­ple, we are print­ing just the date cre­at­ed in the for­mat YYYYMMDD_HHMMSS.

stat -f "%Sm" -t "%Y%m%d_%H%M%S" filename.txt

We are using the –f “%Sm flag to spec­i­fy that we want to print out only the date mod­i­fied. The –t “%Y%m%d_%H%M%S” sets the date for­mat.

In my exam­ple, the out­put was:

20121130_180221

This trans­lates to Novem­ber 30, 2012 at 18:02:21.

Using the Linux Command Line to Find and Copy A Large Number of Files from a Large Archive, Preserving Metadata

One of my recent chal­lenges is to go through an archive on a NAS and find all of the .xlsx files, then copy them; pre­serv­ing as much of the file meta­data (date cre­at­ed, fold­er tree, etc) as pos­si­ble, to a spec­i­fied fold­er.  After this copy, they will be gone through with anoth­er script, to rename the files, using the meta­data, where they will then be processed by an appli­ca­tion, which uti­lizes the name of the file in its process.

The part I want to share here, is find­ing the files and copy­ing them to a fold­er, with meta­data pre­served.  This is where the pow­er of the find util­i­ty comes in handy.

Since this is a huge archive, I want to first pro­duce a list of the files, that way I will be able to break this up into two steps. This will pro­duce a list and write it into a text file.  I am first going to run a find com­mand on the vol­ume I have mount­ed called data in my Vol­umes fold­er.

find /Volumes/data/archive/2012 -name '*.xlsx' > ~/archive/2012_files.txt

Now that the list is saved into a text file, I want to copy the files in the list, pre­serv­ing the file meta­data and path infor­ma­tion, to my archive fold­er.  The cpio util­i­ty accepts the paths of the files to copy from std­in, then copies them to my archive fold­er.

cat ~/archive/2012_files.txt | cpio -pvdm ~/archive

Explicitly Setting log4j Configuration File Location

I ran into an issue recent­ly, where an exist­ing log4j.xml con­fig­u­ra­tion file was built into a jar file I was ref­er­enc­ing and I was unable to get Java to rec­og­nize anoth­er file that I want­ed it to use instead.  For­tu­nate­ly, the solu­tion to this prob­lem is fair­ly straight­for­ward and sim­ple.

I was run­ning a stand­alone appli­ca­tion in lin­ux, via a bash shell script; but this tech­nique can be used in oth­er ways too.  You sim­ply add a para­me­ter to the JVM call like the exam­ple below.

So the syn­tax is basi­cal­ly:

java -Dlog4j.configuration="file:<full path to file>" -cp <classpath settings> <package name where my main function is located>

Lets say I have a file named log4j.xml in /opt/tools/myapp/ which I want to use when my appli­ca­tion runs, instead of any exist­ing log4j.xml files.  This can be done by pass­ing a JVM flag –Dlog4j.configuration to Java.

Here is an exam­ple:

java -Dlog4j.configuration="file:/opt/tools/myapp/log4j.xml" -cp $CLASSPATH  my.standalone.mainClass;

With that change, as long as your log4j file is set up prop­er­ly, your prob­lems should be behind you.

Appending to a Remote File via SSH

Most LINUX users know how to copy and over­write a file from one server to anoth­er; but it can also be use­ful to direct­ly append to a file, with­out hav­ing to login to the remote server and make the changes man­u­al­ly. This does not appear to be pos­si­ble with the com­mon­ly used SCP util­i­ty; how­ev­er, there is a way to do this with SSH. Its actu­al­ly quite sim­ple. Con­tin­ue read­ing “Append­ing to a Remote File via SSH”

RSYNC a File Through a Remote Firewall

One of my recent tasks was to set-up a new auto­mat­ic back­up script, which dumps out the MySQL data­base on the remote host at a reg­u­lar time, at a lat­er time, it is RSYNC’d from the back­up server through a remote fire­wall. I must say that I was a lit­tle sur­prised, to dis­cov­er that the fin­ished script and the con­fig­u­ra­tion that goes along with it, was actu­al­ly quite sim­ple and eas­i­ly repeat­able. I was able to repli­cate the process for three sites very quick­ly and will eas­i­ly be able to scale it to many more when nec­es­sary.

SSH Tunneling and SSH Keys

In order to per­form a process on a remote fire­walled host, you need to first set up keys, to allow the trust­ed back­up server to gain access to the inter­me­di­ate host. You must also set up a key which allows the inter­me­di­ate host to gain access to the fire­walled host.

First, let’s gen­er­ate a pub­lic key on the back­up server, if we don’t already have one. Be sure to use an emp­ty pass phrase since this is an unat­tend­ed script.

[backup@lexx log]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/backup/.ssh/id_dsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /backup/.ssh/id_dsa.
Your public key has been saved in /backup/.ssh/id_dsa.pub.
The key fingerprint is:
3d:48:9c:0f:46:dc:da:c3:a6:19:82:63:b1:18:91:62 backup@lexx

The default key will, by default, be locat­ed in ~/.ssh/id_dsa.pub Copy the con­tents of this file to the clip­board, you will need this to get the remote server to trust the back­up server.

Logon to the remote exter­nal server via ssh. On this server we will con­fig­ure it to trust the back­up server.

[backup@lexx ~]# ssh user@remotehost.com
user@remotehost.com's password: 
Last login: Thu Jul 14 22:57:58 2011 from 69.73.94.214
[user@remotehost ~]# ls -al .ssh
total 28
drwx------  2 user user 4096 2011-07-14 22:05 .
drwxr-x--- 12 user user 4096 2011-07-14 21:54 ..
-rw-------  1 user user 3024 2011-07-14 21:57 authorized_keys2
-rw-------  1 user user  668 2010-10-27 23:52 id_dsa
-rw-r--r--  1 user user  605 2010-10-27 23:52 id_dsa.pub
-rw-r--r--  1 user user 5169 2010-10-21 13:01 known_hosts

If the authorized_keys2 or sim­mi­lar­ly named file does not yet exist, cre­ate it and open the file in your text edi­tor of choice. Then paste the key you copied from the id_dsa.pub file on the back­up server.

To make the remote server rec­og­nize the new­ly added key run the fol­low­ing:

[user@remotehost ~]# ssh-agent sh -c 'ssh-add < /dev/null && bash'

Now we can make sure that the key works as intend­ed by run­ning the fol­low­ing com­mand, which will ssh into the server and exe­cute the upti­me com­mand:

[backup@lexx ~]$ ssh user@remotehost.com uptime
 23:57:17 up 47 days,  4:11,  1 user,  load average: 0.54, 0.14, 0.04

Since we got the out­put of the upti­me com­mand with­out a login prompt, it means the key was cre­at­ed suc­cess­ful­ly.

Now we repeat the ssh key process, this time between the remote­host server and the fire­walled server.

[user@remotehost ~]# ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/user/.ssh/id_dsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /user/.ssh/id_dsa.
Your public key has been saved in /user/.ssh/id_dsa.pub.
The key fingerprint is:
3d:48:9c:0f:46:dd:df:c3:a6:19:82:63:b1:18:91:62 user@remotehost

Copy the infor­ma­tion from the .ssh/id_dsa.pub of the remote exter­nal server to the fire­walled server, then add to the authorized_keys file and run:

[user@firewalledserver ~]# ssh-agent sh -c 'ssh-add < /dev/null && bash'

Now you should be able to pass the rsync com­mand all the way through the remote fire­wall, to the fire­walled server from the back­up server.

This can be test­ed by the fol­low­ing com­mand, which tun­nels through the fire­wall and exe­cutes the upti­me com­mand on the inter­nal server:

[backup@lexx ~]$ ssh user@remotehost.com ssh user@firewalledserver uptime
 23:52:17 up 41 days,  4:12,  1 user,  load average: 0.50, 0.13, 0.03

RSYNC The Data From the Backup Server, Through The Firewall

Now that we’ve got all of our keys set-up, most of the work has been done. I’m assum­ing you have a cron job on the inter­nal server which dumps the mysql data­base at a speci­fic time. You should set-up your rsync com­mand late enough, so that the cron job has had enough time to dump the data­base.

Here is the rsync com­mand which puts you through the fire­wall to down­load the remote mysql data­base dump. The –z flag allows you to do this with com­pres­sion, which can sig­nif­i­cant­ly speed up the process.

[backup@lexx ~]$ rsync -avz -e "ssh user@remotehost.com ssh" user@firewalledserver:/home/user/rsync-backup/mysqldump.sql /home/backup/

This will cre­ate a hid­den file that will be named some­thing like .mysqldump.sql.NvD8D, which stores the data until the sync is com­plete. After the sync is com­plete you will see a file named mysqldump.sql in /home/backup/ fold­er.

Just set up the nec­es­sary cron scripts to make sure every­thing hap­pens at the right time, pos­si­bly put some log­ging in there so you can see what has hap­pened and you’re done!

Here’s an exam­ple of what I did on the back­up server, to call the back­up script. It appends the out­put of both STDOUT and STDERR to the /var/log/remote_backuplog file each time it is run. It also runs the script as the back­up user so the files it gen­er­ates have the cor­rect per­mis­sions for the back­up user to access.

01 6 * * * backup /home/backup/run_backups.sh >> /var/log/remote_backuplog 2>&1

Here is what my rsync script run_backups.sh looks like.

#!/bin/bash
 
echo "running backups"
# print the date into the logfile
date
 
# backup server 1
echo "backing up server1"
ssh user@externalserver1 ssh user@internalserver1 ls -l /home/user/rsync-backup/mysqldump.sql
/usr/bin/rsync -avz -e "ssh user@externalserver1 ssh" user@internalserver1:/home/user/rsync-backup/mysqldump.sql /home/backup/server1/
 
# backup server 2
echo "backing up server2"
ssh user@externalserver2 ssh user@internalserver2 ls -l /home/user/rsync-backup/mysqldump.sql
/usr/bin/rsync -avz -e "ssh user@externalserver2 ssh" user@internalserver2:/home/user/rsync-backup/mysqldump.sql /home/backup/server2/
 
# backup server 3
echo "backing up server3"
ssh user@externalserver3 ssh user@internalserver3 ls -l /home/user/rsync-backup/mysqldump.sql
/usr/bin/rsync -avz -e "ssh user@externalserver3 ssh" user@internalserver3:/home/user/rsync-backup/mysqldump.sql /home/backup/server3/

Quick and Easy Regular Expression Command/Script to Run on Files in the Bash Shell

I often find it nec­es­sary to run reg­u­lar expres­sions on, not just one file; but instead a range of files. There are per­haps dozens of ways this can be done, with vary­ing lev­els of under­stand­ing nec­es­sary to do this. 

The sim­plest way I have encoun­tered uti­lizes the fol­low­ing syn­tax:

perl -pi -e "s/<find string>/<replace with string>/g" <files to replace in>

Here is an exam­ple where I replace the IP address in a range of report tem­plates with a dif­fer­ent IP address:

perl -pi -e "s/mysql:\/\/192.168.2.110/mysql:\/\/192.168.2.111/g" $reportTemplateLocation/*.rpt*

Basi­cal­ly, I am look­ing for a line which con­tains mysql://192.168.2.110, which I want to replace with mysql://192.168.2.111.

Here is an exam­ple of a bash script I call changeReportTemplateDatabase.sh, which I wrap around that com­mand, to accom­plish that same task with more ele­gance:

#!/bin/bash
#
# @(#)$Id$
#
# Point the report templates to a different database IP address.
reportTemplateLocation="/home/apphome/jboss-4.0.2/server/default/appResources/reportTemplates";
 
error()
{
    echo "$arg0: $*" 1>&2
    exit 1
}
usage()
{
        echo "Usage $0 -o <old-ip-address> -n <new-ip-address>";
}
 
vflag=0
oldip=
newip=
while getopts hvVo:n: flag
do
    case "$flag" in
    (h) help; exit 0;;
    (V) echo "$arg0: version 0.1 8/28/2010"; exit 0;;
    (v) vflag=1;;
    (o) oldip="$OPTARG";;
    (n) newip="$OPTARG";;
    (*) usage;;
    esac
done
shift $(expr $OPTIND - 1)
 
if [ "$oldip" = "" ]; then
        usage;
        exit 1;
fi
if [ "$newip" = "" ]; then
        usage;
        exit 1;
fi
 
echo "$0: Changing report templates to use the database at $newip from $oldip";
perl -pi -e "s/mysql:\/\/$oldip/mysql:\/\/$newip/g" $reportTemplateLocation/*.rpt*

Usage of the script is as sim­ple as the com­mand below. It will change every data­base ref­er­ence on report tem­plates in the direc­to­ry ref­er­enced by the vari­able report­Tem­plate­Lo­ca­tion to the new val­ue.

./changeReportTemplateDatabase.sh  -o 192.168.2.110 -n 192.168.2.111

A fur­ther improve­ment, which may be use­ful to some, would be to make the direc­to­ry a flag which can be edit­ed at the com­mand line.

Sending Mail in Shell Scripts via an External Server with Nail

If you’ve ever tried send­ing email via the com­mand line, using the mail util­i­ty, you may find that the method can be unre­li­able in some cas­es.  The email mes­sages are often inter­cept­ed by spam bots, fil­tered by secu­ri­ty pro­grams, etc.  A more ele­gant and sta­ble alter­na­tive, is to use your exist­ing email server to send the mes­sage.  Using the pro­gram nail makes this an easy task to do via the com­mand line.

The fol­low­ing exam­ple shows you how to send a sim­ple mes­sage with attach­ment. Here is the syn­tax for send­ing a mes­sage with nail.

echo "" | nail -s "" -a   ...

In order for nail to func­tion, you must have the .mail con­fig­u­ra­tion file in your path. Here is a sam­ple .mail con­fig­u­ra­tion file to get you start­ed quick­ly.

set smtp=smtp://yourhost.com
set from="yourname@yourhost.com (Display Name)"
set smtp-auth=login
set smtp-auth-user=your_username
set smtp-auth-password=your_password

Copying Yesterday’s Exceptions with Stack Traces from Logs, Then Emailing To Administrators

When you have a java appli­ca­tion server which gen­er­ates a great deal of logs, it can be tricky to find the most impor­tant infor­ma­tion, espe­cial­ly if you have detailed log­ging. For­tu­nate­ly grep is capa­ble of doing this very well.

The fol­low­ing com­mand will gath­er all WARN, ERROR, FATAL, and Excep­tion stack traces. This com­mand can be very use­ful for Java log mon­i­tor­ing scripts.

cat /jboss-4.0.2/server/default/log/server.log | grep "ERROR\|FATAL\|Exception\|at.*\.java\:.*"

Understanding this expression

In this expres­sion ‘\|’ is used as an OR oper­a­tor to look for dif­fer­ent pat­terns. ‘WARN’, ‘ERROR’, and ‘FATAL’ pat­terns are used to fil­ter first line of the log event that can pos­si­bly con­tain an excep­tion at WARN, ERROR and FATAL log­ging lev­els. We lat­er fil­ter the first line of the stack trace with ‘Excep­tion’ as the first line of the stack trace usu­al­ly has the Excep­tion name fol­lowed by the excep­tion mes­sage e.g. ‘java.lang.NullPointerException’.

After this you will have the stack trace ele­ments which start with ‘at’ and end with pat­tern ‘(FileName.java:lineNo)’ e.g. at java.lang.Thread.run(Thread.java:595). The­se stack trace ele­ments are fil­tered with ‘at.*\.java\:.*’. All the­se pattern’s OR’ed togeth­er can fil­ter the com­plete stack trace in log at WARN, ERROR and FATAL log lev­el. Some false pos­i­tives may also get fil­tered out with this com­mand if the log com­ment has words like WARN, ERROR, FATAL, Excep­tion.

source: com­put­er tech­nol­o­gy roller

Filtering by Date: Yesterday’s Logs

If you want to fil­ter the log­files after a cer­tain date, the fol­low­ing com­mand is very use­ful. It gets the date for yes­ter­day, using the date for­mat yyyy-mm-dd, then is uses sed to print all of the lines after the spec­i­fied date. This is a good com­mand to run after mid­night, to retrieve the pre­vi­ous day’s logs.

cat /jboss-4.0.2/server/default/log/server.log | sed "1,/$(date --date='yesterday' '+%Y-%m-%d')/d"

Putting it All Together

Daily Log Monitor Script to Email Error Stack Traces to the Administrator

Here is a com­plete mon­i­tor­ing script I wrote, which emails me all of the pre­vi­ous days errors stack traces. I have it run­ning in cron.daily in order to reg­u­lar­ly send me the jboss error stack traces.

#!/bin/bash

# email addresses to send the message to
email="address@host.com"

# determine the number of running instances
errors=$(cat /jboss-4.0.2/server/default/log/server.log | sed "1,/$(date --date='yesterday' '+%Y-%m-%d')/d" | grep "ERROR\|FATAL\|Exception\|at.*\.java\:.*")

subject="JBOSS DAILY ERROR DIGEST FOR: $(hostname)"

echo "$errors" | /bin/mail -s "$subject" "$email"