Everybody already knows how great file-handling tools *NIX operating systems have. I’m just constantly amazed with how much you can get done in a single command line. For example, imagine a process that does some data processing on a remote machine and logs all the communication between the client and the server machine in one file. Now, imagine this process running for 30+ hours, producing a 220+ Mb log file.
After the process is done, your boss wants some kind of reporting - how many entries were processed, how many were processed sucessful, how many errors were there and which errors they were. Not much of a problem when working on a UNIX machine:
A: cat out.txt | grep 'COMMAND SUCCESS' | wc -l
B: cat out.txt | grep 'COMMAND FAILED' | wc -l
… and just to make sure, lets check if A+B = C:
C: cat in.txt | wc -l
Now, lets report on errors:
cat out.txt | grep 'ERROR_CODE:' | sort | uniq
returns a list of errors:
ERROR_CODE: 10065
ERROR_CODE: 11245
ERROR_CODE: 19543
and now just lets find out how many of each we got:
cat out.txt | grep 'ERROR_CODE: 10065' | wc -l
cat out.txt | grep 'ERROR_CODE: 11245' | wc -l
cat out.txt | grep 'ERROR_CODE: 19543' | wc -l
Email to the boss, and we’re done. All good? Great. But, another email comes in: “Could you please send me a list of all the IDs that caused an error #11245″. Sure, no problem:
cat out.txt | grep -B 7 'ERROR_CODE: 11245' | grep 'REQUEST_ID' | awk '{ print $2; }' | sed 's/REQUEST_ID:\([0-9*]\)/\1/g’ > ids.txt
Lets explain this one a bit:
- the initial request that was sent to the system was logged 7 lines before the ERROR_CODE (therefore the “-B 7″)
- the line with the request had the following format:
START REQUEST_ID:XXXXX SOME_OTHER_STUFF
(therefore the awk part) - with sed we just extracted the number from the request_id column
Can it get any more powerful than this?