I’m a Systems/Software Engineer in the San Francisco Bay Area. I moved from Columbus, Ohio in 2007 after getting a B.S. in Physics from the Ohio State University. I'm married, and we have dogs.

Under my github account (https://github.com/addumb): I open-sourced python-aliyun in 2014, I have an outdated python 2 project starter template at python-example, and I have a pretty handy “sshec2” command and some others in tools.

The Cost of 4K 60fps video storage, a dad's perspective (2021-2-20)
Mobile App Operability (2020-7-15)
I don't write (2019-10-30)
I moved addumb.com into GitHub pages (2016-3-7)
Quick Debian Backporting (2014-3-10)
Considering different data systems? (2013-11-15)
I Moved Addumb.com into AWS (2013-3-18)
Truth In Distributed Systems (2012-2-24)
Updated: MySQL 5.0 and 5.1 Side-By-Side (2011-3-2)
MySQL Duplicate Key Error - InnoDB or MyISAM? (2011-1-20)
Linux Tip: awesome and synergy for less mouse/keyboard switching (2011-1-12)
vim and bash (2010-12-7)
--> Linux Tools
Linix tip - stderr skips pipes (2010-7-19)
ndislocate - A distributed service locator, written on top of Node.js (2010-6-17)
I want a tattoo (2010-5-17)
Red Hat Enterprise Linux 5.5 released (2010-3-30)
HP ProLiant Linux repositories (2010-3-24)
Linux tip 4 - bash history timestamps (2010-3-8)
Devops (2009-11-25)
While I wait for the locksmith... (2009-10-24)
Linux tip 3 - rsync gotchas (2009-10-22)
Linux tip 2 - read (2009-10-22)
Linux tip - du versus df (2009-10-12)
MySQL Slave Initialization Do's and Dont's (2009-7-5)
What am I doing? (2009-5-25)

Linux Tools

October 10, 2010

http://github.com/addumb/tools

I’ve gotten tired of re-writing small command-line tools that are missing from most Linux distributions. Too often I’ve found myself trying to pipe through a command that calculates the average or standard deviation of a text file. I re-write the same awk lines over and over to get a distribution of certain columns of log files (like nginx HTTP status codes or of the document size sent).

I know these tools have been written over and over by many many people. The ones I’ve found have been either subject specific (e.g. generate a sideways ASCII histogram from a log file, which is really super sweet and useful, but not always what I’m looking for) or too all-in-one to satisfy my need for a more tool-chainy way to do things. This is 90% for myself, so I don’t have to re-write these over and over. I put them up on github so I can just clone the tools wherever I want to use them. It’d be great if other people found these tools useful, and maybe even wrote some more :)

DISCLAIMER: So far these are basically wrappers around awk. That’s mostly because awk is the awesomest tool in the whole world. Also, it’s because I’ve been dealing with a lot of text processing lately at work. I have dreams of many other tools, but none concrete enough to start writing them. Cross-post from http://github.com/addumb/tools/blob/master/README.md:

avg

Spit out the average of a column in a text file

stddev

Spit out the standard deviation of a column in a text file

column-bin

bin the columns of a text file (VERY useful for quick summaries of apache HTTP status codes and the like) Usage:

$ cat test.csv
log HTTP status: 200 body_size: 1024
log HTTP status: 200 body_size: 1024
log HTTP status: 500 body_size: 0
log HTTP status: 200 body_size: 10000
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 409 body_size: 10
$ column-bin -c 4 > test.csv
409	1
400	3
200	3
500	1

column-histogram

Similar to column-bin, but for continuous variables. Bin the column into bins of a fixed size OR bin them into log-based sizes to make either a histogram or a semi-log histogram. Usage:

$ cat test.csv
log HTTP status: 200 body_size: 1024
log HTTP status: 200 body_size: 1024
log HTTP status: 500 body_size: 0
log HTTP status: 200 body_size: 10000
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 400 body_size: 10
log HTTP status: 409 body_size: 10
$ column-histogram -c 6 -b 1000 > test.csv
10000	1
0	5
1000	2
$ column-histogram -c 6 -L 10 > test.csv
10000	1
0	1
10	4
1000	2