Tip 8: Data Manipulation in Unix

From Vlsiwiki
Revision as of 23:35, 6 April 2010 by Mrg (Talk | contribs)

Jump to: navigation, search

In most projects, you ultimately have some data in rows and columns in text files. How do you get it there? I assume that you are using "grep" to extract it from a file. How do you manipulate it? That is what I will show now.

There are a couple VERY useful unix commands: paste, cut, and sort.

paste

paste allows you to append files horizontally, line-by-line. Suppose you have file1:

1
2
3
4

and file2:

2
4
6
8
10
12
14

and you run

paste file1 file2

It will output:

1	2
2	4
3	6
4	8
5	10 
	12
	14

If one of the files is specified as "-", it will use stdin. This means you can gather data like this:

myprog | grep somekeyword > file.dat
myprog -option2 | grep somekeyword | paste - file.dat > file2.dat

Note, however, that you must redirect it to a separate file (file2.dat cannot be the same as file.dat) or else it will lose the rest of file when it starts over-writing!

cut

cut is the opposite of paste. It allows you to extract columns of data based on delimiters. For example, if you have a file like this:

x = 1
y = 2
z = 3

and you run:

cut -d '=' -f 2

will extract the column 2 and print it out:

1
2
3

You can also specify fixed character widths (-c), byte widths (-b) or tabbed fields (-d).

sort

Sort will, you guessed it, sort data. The only trick is that it uses alphabetical sorting by default. If you want numeric sorting, you must specify "-n". You can also specify "-r" for a reverse sort.