Difference between revisions of "Tip 8: Data Manipulation in Unix"
(Created page with 'In most projects, you ultimately have some data in rows and columns in text files. How do you get it there? I assume that you are using "grep" to extract it from a file. How do y…') |
|||
Line 26: | Line 26: | ||
12 | 12 | ||
14 | 14 | ||
+ | If one of the files is specified as "-", it will use stdin. This means you can gather data like this: | ||
+ | |||
+ | myprog | grep somekeyword > file.dat | ||
+ | myprog -option2 | grep somekeyword | paste - file.dat > file2.dat | ||
+ | |||
+ | Note, however, that you must redirect it to a separate file (file2.dat cannot be the same as file.dat) or else it will lose the rest of file when it starts over-writing! | ||
+ | |||
+ | cut is the opposite of paste. It allows you to extract columns of data based on delimiters. For example, if you have a file like this: | ||
+ | x = 1 | ||
+ | y = 2 | ||
+ | z = 3 | ||
+ | and you run: | ||
+ | cut -d '=' -f 2 | ||
+ | will extract the column 2 and print it out: | ||
+ | 1 | ||
+ | 2 | ||
+ | 3 | ||
+ | You can also specify fixed character widths (-c), byte widths (-b) or tabbed fields (-d). |
Revision as of 23:34, 6 April 2010
In most projects, you ultimately have some data in rows and columns in text files. How do you get it there? I assume that you are using "grep" to extract it from a file. How do you manipulate it? That is what I will show now.
There are a couple VERY useful unix commands: paste and cut.
paste allows you to append files horizontally, line-by-line. Suppose you have file1:
1 2 3 4
and file2:
2 4 6 8 10 12 14
and you run
paste file1 file2
It will output:
1 2 2 4 3 6 4 8 5 10 12 14
If one of the files is specified as "-", it will use stdin. This means you can gather data like this:
myprog | grep somekeyword > file.dat myprog -option2 | grep somekeyword | paste - file.dat > file2.dat
Note, however, that you must redirect it to a separate file (file2.dat cannot be the same as file.dat) or else it will lose the rest of file when it starts over-writing!
cut is the opposite of paste. It allows you to extract columns of data based on delimiters. For example, if you have a file like this:
x = 1 y = 2 z = 3
and you run:
cut -d '=' -f 2
will extract the column 2 and print it out:
1 2 3
You can also specify fixed character widths (-c), byte widths (-b) or tabbed fields (-d).