Difference between revisions of "Tip 8: Data Manipulation in Unix"

From Vlsiwiki
Jump to: navigation, search
(sort)
Line 50: Line 50:
  
 
Sort will, you guessed it, sort data. The only trick is that it uses alphabetical sorting by default. If you want numeric sorting, you must specify "-n". You can also specify "-r" for a reverse sort.
 
Sort will, you guessed it, sort data. The only trick is that it uses alphabetical sorting by default. If you want numeric sorting, you must specify "-n". You can also specify "-r" for a reverse sort.
 +
Also the "-k" option allows you to specify the key (which column will be sorted).

Revision as of 21:01, 16 July 2014

In most projects, you ultimately have some data in rows and columns in text files. How do you get it there? I assume that you are using "grep" to extract it from a file. How do you manipulate it? That is what I will show now.

There are a couple VERY useful unix commands: paste, cut, and sort.

paste

paste allows you to append files horizontally, line-by-line. Suppose you have file1:

1
2
3
4

and file2:

2
4
6
8
10
12
14

and you run

paste file1 file2

It will output:

1	2
2	4
3	6
4	8
5	10 
	12
	14

If one of the files is specified as "-", it will use stdin. This means you can gather data like this:

myprog | grep somekeyword > file.dat
myprog -option2 | grep somekeyword | paste - file.dat > file2.dat

Note, however, that you must redirect it to a separate file (file2.dat cannot be the same as file.dat) or else it will lose the rest of file when it starts over-writing!

cut

cut is the opposite of paste. It allows you to extract columns of data based on delimiters. For example, if you have a file like this:

x = 1
y = 2
z = 3

and you run:

cut -d '=' -f 2

will extract the column 2 and print it out:

1
2
3

You can also specify fixed character widths (-c), byte widths (-b) or tabbed fields (-d).

sort

Sort will, you guessed it, sort data. The only trick is that it uses alphabetical sorting by default. If you want numeric sorting, you must specify "-n". You can also specify "-r" for a reverse sort. Also the "-k" option allows you to specify the key (which column will be sorted).