Everybody already knows how great file-handling tools *NIX operating systems have. I’m just constantly amazed with how much you can get done in a single command line. For example, imagine a process that does some data processing on a remote machine and logs all the communication between the client and the server machine in one file. Now, imagine this process running for 30+ hours, producing a 220+ Mb log file.
After the process is done, your boss wants some kind of reporting - how many entries were processed, how many were processed sucessful, how many errors were there and which errors they were. Not much of a problem when working on a UNIX machine:
A: cat out.txt | grep 'COMMAND SUCCESS' | wc -l
B: cat out.txt | grep 'COMMAND FAILED' | wc -l
… and just to make sure, lets check if A+B = C:
C: cat in.txt | wc -l
Now, lets report on errors:
cat out.txt | grep 'ERROR_CODE:' | sort | uniq
returns a list of errors:
ERROR_CODE: 10065
ERROR_CODE: 11245
ERROR_CODE: 19543
and now just lets find out how many of each we got:
cat out.txt | grep 'ERROR_CODE: 10065' | wc -l
cat out.txt | grep 'ERROR_CODE: 11245' | wc -l
cat out.txt | grep 'ERROR_CODE: 19543' | wc -l
Email to the boss, and we’re done. All good? Great. But, another email comes in: “Could you please send me a list of all the IDs that caused an error #11245″. Sure, no problem:
cat out.txt | grep -B 7 'ERROR_CODE: 11245' | grep 'REQUEST_ID' | awk '{ print $2; }' | sed 's/REQUEST_ID:\([0-9*]\)/\1/g’ > ids.txt
Lets explain this one a bit:
- the initial request that was sent to the system was logged 7 lines before the ERROR_CODE (therefore the “-B 7″)
- the line with the request had the following format:
START REQUEST_ID:XXXXX SOME_OTHER_STUFF
(therefore the awk part) - with sed we just extracted the number from the request_id column
Can it get any more powerful than this?
I never thought that a day would come where I could say: I understand stuff like written above. Lo and behold, the day has come.
Love the sed,awk,grep and Unix in a Nutshell - the book that made them accessible to me :)
> All good? Great. But, another email comes in: “Could you please
> send me a list of all the IDs that caused an error #11245″. Sure, no problem:
Alternatively, awk could be avoided with the tiny modification to the regex:
… | grep REQUEST_ID | sed -r ’s/^.*REQUEST_ID:([0-9]+).*$/\1/g’
or even:
… | perl -ne ‘print $1 if /REQUEST_ID:(\d+)/’
;-)
Love thy unix, but also love thy ports…
http://unxutils.sourceforge.net/