I’m a Systems/Software Engineer in the San Francisco Bay Area. I moved from Columbus, Ohio in 2007 after getting a B.S. in Physics from the Ohio State University. I'm married, and we have dogs.

Under my github account (https://github.com/addumb): I open-sourced python-aliyun in 2014, I have an outdated python 2 project starter template at python-example, and I have a pretty handy “sshec2” command and some others in tools.

Linux Tools

October 10, 2010

http://github.com/addumb/tools

I’ve gotten tired of re-writing small command-line tools that are missing from most Linux distributions. Too often I’ve found myself trying to pipe through a command that calculates the average or standard deviation of a text file. I re-write the same awk lines over and over to get a distribution of certain columns of log files (like nginx HTTP status codes or of the document size sent).

I know these tools have been written over and over by many many people. The ones I’ve found have been either subject specific (e.g. generate a sideways ASCII histogram from a log file, which is really super sweet and useful, but not always what I’m looking for) or too all-in-one to satisfy my need for a more tool-chainy way to do things. This is 90% for myself, so I don’t have to re-write these over and over. I put them up on github so I can just clone the tools wherever I want to use them. It’d be great if other people found these tools useful, and maybe even wrote some more :)

DISCLAIMER: So far these are basically wrappers around awk. That’s mostly because awk is the awesomest tool in the whole world. Also, it’s because I’ve been dealing with a lot of text processing lately at work. I have dreams of many other tools, but none concrete enough to start writing them. Cross-post from http://github.com/addumb/tools/blob/master/README.md:

avg

Spit out the average of a column in a text file

stddev

Spit out the standard deviation of a column in a text file

column-bin

bin the columns of a text file (VERY useful for quick summaries of apache HTTP status codes and the like) Usage:

$ cat test.csv
log HTTP status: 200 body_size: 1024
log HTTP status: 200 body_size: 1024
log HTTP status: 500 body_size: 0
log HTTP status: 200 body_size: 10000
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 409 body_size: 10
$ column-bin -c 4 > test.csv
409	1
400	3
200	3
500	1

column-histogram

Similar to column-bin, but for continuous variables. Bin the column into bins of a fixed size OR bin them into log-based sizes to make either a histogram or a semi-log histogram. Usage:

$ cat test.csv
log HTTP status: 200 body_size: 1024
log HTTP status: 200 body_size: 1024
log HTTP status: 500 body_size: 0
log HTTP status: 200 body_size: 10000
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 409 body_size: 10
$ column-histogram -c 6 -b 1000 > test.csv
10000	1
0	5
1000	2
$ column-histogram -c 6 -L 10 > test.csv
10000	1
0	1
10	4
1000	2

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 United States License. :wq