Difference between revisions of "Tip 8: Data Manipulation in Unix"

From Vlsiwiki
Jump to: navigation, search
Line 1: Line 1:
 
In most projects, you ultimately have some data in rows and columns in text files. How do you get it there? I assume that you are using "grep" to extract it from a file. How do you manipulate it? That is what I will show now.
 
In most projects, you ultimately have some data in rows and columns in text files. How do you get it there? I assume that you are using "grep" to extract it from a file. How do you manipulate it? That is what I will show now.
  
There are a couple VERY useful unix commands: paste and cut.
+
There are a couple VERY useful unix commands: paste, cut, and sort.
  
 +
=== paste ===
 
paste allows you to append files horizontally, line-by-line. Suppose you have file1:
 
paste allows you to append files horizontally, line-by-line. Suppose you have file1:
 
  1
 
  1
Line 33: Line 34:
 
Note, however, that you must redirect it to a separate file (file2.dat cannot be the same as file.dat) or else it will lose the rest of file when it starts over-writing!
 
Note, however, that you must redirect it to a separate file (file2.dat cannot be the same as file.dat) or else it will lose the rest of file when it starts over-writing!
  
 +
=== cut ===
 
cut is the opposite of paste. It allows you to extract columns of data based on delimiters. For example, if you have a file like this:
 
cut is the opposite of paste. It allows you to extract columns of data based on delimiters. For example, if you have a file like this:
 
  x = 1
 
  x = 1
Line 44: Line 46:
 
  3
 
  3
 
You can also specify fixed character widths (-c), byte widths (-b) or tabbed fields (-d).
 
You can also specify fixed character widths (-c), byte widths (-b) or tabbed fields (-d).
 +
 +
=== sort ===
 +
 +
Sort will, you guessed it, sort data. The only trick is that it uses alphabetical sorting by default. If you want numeric sorting, you must specify "-n". You can also specify "-r" for a reverse sort.

Revision as of 23:35, 6 April 2010

In most projects, you ultimately have some data in rows and columns in text files. How do you get it there? I assume that you are using "grep" to extract it from a file. How do you manipulate it? That is what I will show now.

There are a couple VERY useful unix commands: paste, cut, and sort.

paste

paste allows you to append files horizontally, line-by-line. Suppose you have file1:

1
2
3
4

and file2:

2
4
6
8
10
12
14

and you run

paste file1 file2

It will output:

1	2
2	4
3	6
4	8
5	10 
	12
	14

If one of the files is specified as "-", it will use stdin. This means you can gather data like this:

myprog | grep somekeyword > file.dat
myprog -option2 | grep somekeyword | paste - file.dat > file2.dat

Note, however, that you must redirect it to a separate file (file2.dat cannot be the same as file.dat) or else it will lose the rest of file when it starts over-writing!

cut

cut is the opposite of paste. It allows you to extract columns of data based on delimiters. For example, if you have a file like this:

x = 1
y = 2
z = 3

and you run:

cut -d '=' -f 2

will extract the column 2 and print it out:

1
2
3

You can also specify fixed character widths (-c), byte widths (-b) or tabbed fields (-d).

sort

Sort will, you guessed it, sort data. The only trick is that it uses alphabetical sorting by default. If you want numeric sorting, you must specify "-n". You can also specify "-r" for a reverse sort.