Joining Collections in MongoDB Queries using $lookup

Note: This only works in Mon­goDB 3.2 or lat­er, be sure to update if you need this func­tion­al­i­ty!

In sit­u­a­tions where you have an Objec­tID in a col­lec­tion and you want it resolved by Mon­goDB dur­ing your query, you can accom­plish this with aggre­gate and lookup.

Let’s say we had two col­lec­tions: insur­ance­Claim and insur­an­ce­Provider.

Here’s a stripped down exam­ple of insur­ance­Claims

{
    _id: ObjectId("5849ca6b7a5385ddcb2ea422"),
    insurer: ObjectId("584998f690b755f6a9fc3750"),
    filed: false
}

Here’s a stripped down exam­ple of insur­ers:

{
    _id: ObjectId("584998f690b755f6a9fc3750"),
    name: 'foo'
}

If we want to have Mon­goDB resolve the insur­er when we query the claims, we can do so with the fol­low­ing query:

db.getCollection('insuranceClaims')
    .aggregate(
      [
        {
          "$lookup": {
            "from": "insurers", 
            "localField": "insurer", 
            "foreignField": "_id", 
            "as": "insurer_loaded"
          }
        }
      ]
    );

This would result in the fol­low­ing out­put:

{
    _id: ObjectId("5849ca6b7a5385ddcb2ea422"),
    insurer: ObjectId("584998f690b755f6a9fc3750"),
    filed: false,
    insurer_loaded: {
        _id: ObjectId("584998f690b755f6a9fc3750"),
        name: 'foo'
    }
}

Now we’ve effec­tive­ly had mon­go resolve the insur­ers for us!

If you have insur­ers in an array instead, there’s anoth­er step nec­es­sary.

{
    _id: ObjectId("5849ca6b7a5385ddcb2ea422"),
    filed: false,
    insurers: [{ObjectId("584998f690b755f6a9fc3750")}]
}

To accom­plish the same in this sit­u­a­tion we’ll use $unwind on the array.

db.getCollection('insuranceClaims')
  .aggregate(
    [
      {"$unwind": "$insurers"},
      {
        "$lookup": {
          "from": "insurers", 
          "localField": "insurers", 
          "foreignField": "_id", 
          "as": "insurer_loaded"
        }
      }
    ]
  );

This would pro­duce the fol­low­ing out­put:

{
    _id: ObjectId("5849ca6b7a5385ddcb2ea422"),
    insurer: ObjectId("584998f690b755f6a9fc3750"),
    filed: false,
    insurer_loaded: [{
        _id: ObjectId("584998f690b755f6a9fc3750"),
        name: 'foo'
    }]
}

Now that you’ve joined up the col­lec­tions, you prob­a­bly want to add in some fil­ters, to nar­row the list down to exact­ly what you want. To add a query into the mix sim­ply put the query into $match as fol­lows. This query will load up claims where the field filed is false.

  .aggregate(
    [
      {"$match": {"filed": false}},
      {"$unwind": "$insurers"},
      {
        "$lookup": {
          "from": "insurers", 
          "localField": "insurers", 
          "foreignField": "_id", 
          "as": "insurer_loaded"
        }
      }
    ]
  );

Calculating a Modulo 256 Checksum in NodeJS for an Access 2 Laboratory Instrument

I’ve recent­ly had to fig­ure out how to send a check­sum with my mes­sages to an Access 2 instru­ment. I did not find very many good resources for doing so, so I’m going to share what I’ve got here. I got my first clues from the exam­ples in the Access 2 LIS doc­u­men­ta­tion. Unfor­tu­nate­ly, it appears that many of the check­sums in that doc­u­men­ta­tion are incor­rect, so to be sure I got ahold of some raw mes­sages from the device and their check­sums, to check my work.

Sev­er­al of the exam­ples from the doc­u­men­ta­tion are includ­ed, com­ment­ed out, at the top. Unfor­tu­nate­ly, I’ve found that not all of the­se were cor­rect in the doc­u­men­ta­tion.

var msgs = [];
 
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161600' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '1'}); // should result in ascii [c1]=2 and [c2]=1
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161524' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '6'}); // should result in ascii [c1]=2 and [c2]=6
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161412' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '2'}); // should result in ascii [c1]=2 and [c2]=2
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161300' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '1', c2: 'E'}); // should result in ascii [c1]=1 and [c2]=E
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161224' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '3'}); // should result in ascii [c1]=2 and [c2]=3
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161148' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '8'}); // should result in ascii [c1]=2 and [c2]=8
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161112' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '1', c2: 'F'}); // should result in ascii [c1]=1 and [c2]=F
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028161036' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '4'}); // should result in ascii [c1]=2 and [c2]=4
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028140331' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '1', c2: 'F'}); // should result in ascii [c1]=1 and [c2]=F
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028140255' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '4'}); // should result in ascii [c1]=2 and [c2]=4
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028131152' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '0'}); // should result in ascii [c1]=2 and [c2]=0
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028131116' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '0'}); // should result in ascii [c1]=2 and [c2]=0
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028131040' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '1', c2: 'C'}); // should result in ascii [c1]=1 and [c2]=C
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028131004' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '1', c2: 'C'}); // should result in ascii [c1]=1 and [c2]=C
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161028130928' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: 'A'}); // should result in ascii [c1]=2 and [c2]=A
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161024120959' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '9'}); // should result in ascii [c1]=2 and [c2]=9
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161024120923' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '0'}); // should result in ascii [c1]=2 and [c2]=0
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161024120847' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '5'}); // should result in ascii [c1]=2 and [c2]=5
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161024120811' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '1', c2: 'C'}); // should result in ascii [c1]=1 and [c2]=C
msgs.push({str: '1H|\\^&|||ACCESS^511550|||||LIS||P|1|20161024120735' + String.fromCharCode(13)+ String.fromCharCode(3), c1: '2', c2: '1'}); // should result in ascii [c1]=2 and [c2]=1
 
for(var i = 0; i < msgs.length; i++) {
    var msg = msgs[i];
    console.log('');
    console.log(msg.str);
    var checksum = calcChecksum(msg.str);
    console.log('checksum: ' + checksum + '  expected checksum: ' + msg.c1 + msg.c2);
}
 
function calcChecksum(string) {
    var buf = new Buffer(string);
    // Calculate the modulo 256 checksum
    var sum = 0;
    for (var i = 0, l = buf.length; i < l; i++) {
 
        sum = (sum + buf[i]) % 256;
    }
 
    console.log('checksum (binary): ' + sum.toString(2));
    // Convert to a two byte uppercase hex value
    var chars = sum.toString(16).toUpperCase();
    if (chars.length == 1) chars = "0" + chars;
    return chars;
}

LINUX: Removing Files Older Than x Days

It can often be use­ful to remove files that are unnec­es­sary, such as log files, back­up files, etc, when it is not already done auto­mat­i­cal­ly. For­tu­nate­ly there is a very sim­ple com­mand to do just that.

Using the find com­mand, it is pos­si­ble to find the files in the fold­er you want to clean out and remove them. The fol­low­ing com­mand scans the fold­er /home/myuser/myfolder/ for files old­er than 30 days and then exe­cutes rm, to remove those files.

find /home/myuser/myfolder/* -mtime +30 -exec rm {} \;

If you want to be cau­tions, you can use the fol­low­ing com­mands to test it out:

To see what find pulls up, you can run this.

find /home/myuser/myfolder/* -mtime +30

If you want to make cer­tain the exec com­mand is given the right para­me­ters, you can run it through ls.

find /home/myuser/myfolder/* -mtime +30 -exec ls -l {} \;

Deleting Rows From a Table Referenced in the Query

If you do a fair amount of SQL, then every now and then, you’ll like­ly run into a sit­u­a­tion where you need to delete some­thing from a table; but need to also ref­er­ence that table in the query as well.

The trou­ble is MySQL won’t let you do this, in many cir­cum­stances. For­tu­nate­ly, there is a sim­ple workaround.

delete from tableA 
 where id in (
   select b.id 
   from tableB b 
   join tableA a 
     on a.tableb_id = b.id
   );

The above query would throw an error sim­i­lar to the fol­low­ing:

ERROR 1093 (HY000): You can't specify target table 'tableA' for update in FROM clause

We can skirt this lim­i­ta­tion by nest­ing the sub­s­e­lect inside of anoth­er select state­ment. Then it will work just fine.

delete from tableA where id in (
  select aId from (
    select b.id as bId from tableB b join tableA a 
    on a.tableb_id = b.id 
  ) as apt
);

You’ll get out­put to indi­cate that the query is suc­cess­ful.

Query OK, 183 rows affected (0.53 sec)

This saved me a bunch of time and kept me from hav­ing to rework my query com­plete­ly, so I am shar­ing it with you!

Automatically Check RSYNC and Restart if Stopped

I occa­sion­al­ly use RSYNC to syn­chro­nize large direc­to­ries of files between servers. This is espe­cial­ly use­ful if you’re mov­ing a client from one server to anoth­er and they have alot of sta­t­ic files that are always chang­ing. You can copy the files and sync them up, all with RSYNC and if your con­nec­tion gets cut off, it will start where it left off. It will also grab changes to files that have already been RSYNCd.

I ran into an issue with RSYNC recent­ly, where­in the RSYNC process was run­ning in the back­ground; but was ter­mi­nat­ing due to errors sim­i­lar to the fol­low­ing. The­se con­nec­tions were prob­a­bly relat­ed to the slow and unsta­ble con­nec­tion to the remote server.

rsync: writefd_unbuffered failed to write 998 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (888092 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) [sender=3.0.6]

Given that I was trans­fer­ring files through a rel­a­tive­ly bad inter­net con­nec­tion and received this error a half dozen times over a cou­ple of days, I decid­ed the best way to han­dle it, would be to write a cron script. This cron script should check for the RSYNC process and start it if it isn’t run­ning.

rsync_check.sh

Cus­tomize this script for your own pur­pose, to check for your RSYNC process and start it if it isn’t run­ning.

#!/bin/bash
echo "checking for active rsync process"
COUNT=`ps ax | grep rsync | grep -v grep | grep -v rsync_check.sh | wc -l` # see how many are running
echo "there are $COUNT rsync related processes running";
if [ $COUNT -eq 0 ] 
then
	echo "no rsync processes running, restarting process"
	killall rsync  # prevent RSYNCs from piling up, if by some unforeseen reason there are already processes running
	rsync -avz -e "ssh" user@host.com:/mnt/syncdirectory/ /home/ccase/syncdirectory/ 
fi

Crontab Entry

Save the script in the appro­pri­ate cron direc­to­ry, or add it to the cron.d direc­to­ry and put a crontab entry in, to run it at the desired inter­val. This will have it run every 10 min­utes.

*/10 * * * * ccase /etc/cron.d/rsync_check.sh

No More Worries

Now you can move onto oth­er things, with the knowl­edge that your RSYNC will not just fail and leave the work undone. It prob­a­bly wouldn’t hurt to check on it at first and from time to time; but there’s alot less to wor­ry about!

Mounting CIFS Shares At the LINUX Command Line or in /etc/fstab

Lin­ux makes it rel­a­tive­ly easy to mount shared dri­ves either man­u­al­ly, at the com­mand line, or auto­mat­i­cal­ly, by con­fig­ur­ing an entry in /etc/fstab. Here is the basic syn­tax of our mount com­mand.

[ccase@midas ~]$ sudo mount -t cifs  -o username=<share username>,password=<share password>,<additional options> //<name or ip of server>/<share name> <folder to mount to>

Here is an exam­ple of mount­ing our CIFS share to a fold­er named myshare. We are using the option ro to mount the share read only.

[ccase@midas ~]$ sudo mount -t cifs  -o username=admin,password=secret,ro //192.168.1.200/myshare myshare

If we want to make this auto­mat­ic, it can eas­i­ly be con­fig­ured in /etc/fstab/ to mount after the net­work comes up. Here is the basic syn­tax you would use in /etc/fstab/

//<name or ip of server>/<share name> <folder to mount to> cifs  username=<share username>,password=<share password>,_netdev,<additional options>   0 0

Here is an exam­ple of mount­ing our CIFS share auto­mat­i­cal­ly to /mnt/myshare/. We are using the option _netdev, to tell it to attempt the mount only after the net­work has come up and ro, to mount the share read only.

//192.168.1.200/myshare /mnt/myshare cifs  username=admin,password=secret,_netdev,ro   0 0

Getting the Last Modification Timestamp of a File with Stat

If we want to get just the date mod­i­fied, for a file, in a for­mat of our choos­ing. This can be done with a util­i­ty called stat.

The syn­tax is as fol­lows:

stat -f <format> -t "<timestamp format>" <path to file>

In this exam­ple, we are print­ing just the date cre­at­ed in the for­mat YYYYMMDD_HHMMSS.

stat -f "%Sm" -t "%Y%m%d_%H%M%S" filename.txt

We are using the –f “%Sm flag to spec­i­fy that we want to print out only the date mod­i­fied. The –t “%Y%m%d_%H%M%S” sets the date for­mat.

In my exam­ple, the out­put was:

20121130_180221

This trans­lates to Novem­ber 30, 2012 at 18:02:21.

Using the Linux Command Line to Find and Copy A Large Number of Files from a Large Archive, Preserving Metadata

One of my recent chal­lenges is to go through an archive on a NAS and find all of the .xlsx files, then copy them; pre­serv­ing as much of the file meta­data (date cre­at­ed, fold­er tree, etc) as pos­si­ble, to a spec­i­fied fold­er.  After this copy, they will be gone through with anoth­er script, to rename the files, using the meta­data, where they will then be processed by an appli­ca­tion, which uti­lizes the name of the file in its process.

The part I want to share here, is find­ing the files and copy­ing them to a fold­er, with meta­data pre­served.  This is where the pow­er of the find util­i­ty comes in handy.

Since this is a huge archive, I want to first pro­duce a list of the files, that way I will be able to break this up into two steps. This will pro­duce a list and write it into a text file.  I am first going to run a find com­mand on the vol­ume I have mount­ed called data in my Vol­umes fold­er.

find /Volumes/data/archive/2012 -name '*.xlsx' > ~/archive/2012_files.txt

Now that the list is saved into a text file, I want to copy the files in the list, pre­serv­ing the file meta­data and path infor­ma­tion, to my archive fold­er.  The cpio util­i­ty accepts the paths of the files to copy from std­in, then copies them to my archive fold­er.

cat ~/archive/2012_files.txt | cpio -pvdm ~/archive

Explicitly Setting log4j Configuration File Location

I ran into an issue recent­ly, where an exist­ing log4j.xml con­fig­u­ra­tion file was built into a jar file I was ref­er­enc­ing and I was unable to get Java to rec­og­nize anoth­er file that I want­ed it to use instead.  For­tu­nate­ly, the solu­tion to this prob­lem is fair­ly straight­for­ward and sim­ple.

I was run­ning a stand­alone appli­ca­tion in lin­ux, via a bash shell script; but this tech­nique can be used in oth­er ways too.  You sim­ply add a para­me­ter to the JVM call like the exam­ple below.

So the syn­tax is basi­cal­ly:

java -Dlog4j.configuration="file:<full path to file>" -cp <classpath settings> <package name where my main function is located>

Lets say I have a file named log4j.xml in /opt/tools/myapp/ which I want to use when my appli­ca­tion runs, instead of any exist­ing log4j.xml files.  This can be done by pass­ing a JVM flag –Dlog4j.configuration to Java.

Here is an exam­ple:

java -Dlog4j.configuration="file:/opt/tools/myapp/log4j.xml" -cp $CLASSPATH  my.standalone.mainClass;

With that change, as long as your log4j file is set up prop­er­ly, your prob­lems should be behind you.

Linux Mint 13: Enabling the SD Card Reader on the Toshiba Satellite P870

I start­ed using SD cards recent­ly and had a heck of a time using it on my lap­top at first. I tried using my 32 GB SDHC card in the USB adapter, to no avail, then I found the SD slot and it still did not work either. It turned out that the dri­ver was not load­ing by default. This is a com­mon prob­lem in Lin­ux, as the devices that are less com­mon­ly used are not going to always “just work”. You have to often get the dri­ver your­self and install it.

Get­ting it work­ing was not triv­ial, I had to fig­ure out which dri­ver to get, which took some guess­work. It turns out that this lap­top uses a Real­tek RTS5229 for its SD card inter­face. I found this infor­ma­tion with lsp­ci. Con­tin­ue read­ing “Lin­ux Mint 13: Enabling the SD Card Read­er on the Toshiba Satel­lite P870”