Uploading and processing data with inotify-tools
Contents
I'm not a muscleman. In fact, I'm kind of a wimp. But given all this, my weight is 15 kilograms greater than I should have with my height. Time to lose some.
I've been logging my weight for quite a time, but I realized that logging alone doesn't help. I need other people to nudge me, so I would be more ashamed of my condition. To achieve this aim, starting today, I will publish a graph of my weight. It is rendered by gnuplot, and that page contains the script that renders it.
And, of course, since I'm a geek, the process of displaying the graph should be made as automated as possible.
Nearly every morning I weigh and log the result in the text file. My aim is to automatically upload it to the server right after I edit it; of course, manual uploading is out of the question. The update happens at different times, and I don't want to poll the file every hour, since it is just inefficient, and induces an unnecessary latency. And here's the neat solution.
What is inotify
Inotify is an interface to Linux kernel; its purpose is to watch and report when certain filesystem events happen.
Just like Berkeley sockets, inotify calls create file descriptors, which can then be watched by usual select() calls.
A list of events that may be watched includes accessing file, modifying file, moving files and modifying metadata. Directories can also be recursively watched.
More info you can get in man 7 inotify and on Wikipedia.
Tracking filesystem events with inotify-tools
Starting from 2.6.13, Linux kernel has an interface that watches for filesystem events without need to wake up on timer to check them. This subsystem is referred to as "inotify". We can utilize it for our aims.
The generic idea is to run daemon that waits for the weight log file to be changed. After I click "Save" in the text editor, the daemon notices it and sends file to server via scp (copy over ssh). On the web server another notification daemon watches for changes in the destination file, it renders an image with a plot and copies it into the webserver place.
Of course, nowadays one doesn't need to write C programs to access inotify directly via syscalls. Instead, I installed inotify-tools package (sources), which allows to use inotify from shell; a command we will use is inotifywait.
Inotify, as noted in Kernel docs, is interfaces via establishing watches that write information about the events via file descriptors. Waiting for a filesystem event is a mere blocking read from a file descriptor, the intrinsics being handled by the Kernel. The inotifywait command just wraps this in a convenient way.
One of the ways inotifywait can be used is to "monitor" events: the inotifywait process just prints a line to STDOUT when an event we watch for occurs. So we can just pipe the output of inotifywait invocation to a simple shell script that reads the lines and does work on each read. Here's how the daemons look like:
Local PC:
Remote server:
These daemons are currently deployed on my local machine and on server.
Version without monitoring
Due to misunderstanding of how stuff works, I encountered strange denials when I used the -m option to inotifywait. That's why previously I used the following daemons:
Local PC:
Remote server:
The calls to inotifywait here contain no -m option, which means that they exit after the event watched for occurs.
There are several wait operators and subshell spawns that may seem excessive at the first glance. However, they're there to avoid losing data during the race condition.
Assume there's no waits and no subshells, just inotifywait and copy/plotting commands. Then the following scenario is possible:
- I enter new value and save file
- The file is copied to remote server
- Remote server notices that file's changed and starts plotting it
- I notice a mistake I made in data, fix it and save the log again
- The file is copied to remote server
- Remote server finished plotting previous log with mistake
- Remote server starts waiting for file modification
- Changes made in p.4 are ignored!
If the file is copied to remote server while it was plotting it (and plotting/converting takes noticeable time), the subsequent inotifywait call won't notice any changes. Because when the file was modified, inotifywait was not running!
We solve it by doing conversion work asynchronously. Then, step 7 doesn't necessarily follow step 6. Instead, it's executed (hopefully) right after step 3.
If we spawn subprocesses instead of running the whole conversion, the time when we miss modification events is dramatically decreased, and doesn't depend of length of processing.
On the other hand, we can miss modifications when we're waiting for child processes to terminate. But even if we miss modification events, when we finish waiting, we will handle the modified file anyway. So that's acceptable. In general, it's also of no harm to do a, say, sleep 10 after inotifywait: another data save is likely to happen after the first one.
Conclusion
Finally, we've created a synchronized system of two daemons for the local and remote machines, based on inotifywait program of "inotify-tools" package. And now I have a public tracker of my own weight deployed within an hour, but I still wonder, wouldn't it be better if spent this time jogging?..