流编辑器sed和gawk

sed编辑器是一种流编辑器，区别于交互式vim编辑器，处理数据更快。
注意：sed编辑器并不会修改文本文件的数据。它只会将修改后的数据发送到STDOUT！
sed 命令的格式如下：sed options script file  （sed 选项 脚本 文件）
sed 命令选项：选 项  描 述-e script   在处理输入时，将 script 中指定的命令添加到已有的命令中-f file     在处理输入时，将 file 中指定的命令添加到已有的命令中-n          不产生命令输出，使用 print 命令来完成输出
1. 在命令行定义编辑器命令echo "This is a test" | sed 's/test/big test/'    -- This is a big test  s命令会用斜线间指定的第二个文本字符串来替换第一个文本字符串模式。在本例中是 big test 替换了 test sed 's/dog/cat/' data1.txt    -- 将data1.txt中的dog替换成cat
2. 在命令行使用多个编辑器命令（ -e 选项）sed -e 's/brown/green/; s/dog/cat/' data1.txt    -- The quick green fox jumps over the lazy cat.$ sed -e '    -- 不用分号分隔> s/brown/green/> s/fox/elephant/> s/dog/cat/' data1.txt
3. 从文件中读取编辑器命令（ -f 选项）$ cat script1.seds/brown/green/s/fox/elephant/s/dog/cat/$$ sed -f script1.sed data1.txtThe quick green elephant jumps over the lazy cat.The quick green elephant jumps over the lazy cat.The quick green elephant jumps over the lazy cat.The quick green elephant jumps over the lazy cat.$gawk编辑器
gawk 命令的格式如下：gawk options program file
gawk选项：选 项            描 述-F fs         指定行中划分数据字段的字段分隔符-f file       从指定的文件中读取程序-v var=value  定义gawk程序中的一个变量及其默认值-mf N         指定要处理的数据文件中的最大字段数-mr N         指定数据文件中的最大数据行数-W keyword    指定gawk的兼容模式或警告等级
1. 从命令行读取程序脚本脚本命令需要放入{}中。由于 gawk 命令行假定脚本是单个文本字符串，你还必须将脚本放到单引号中gawk '{print "Hello World!"}'    -- 无论输入什么回车都是Hello World
2.使用数据字段变量$0 代表整个文本行；$1 代表文本行中的第1个数据字段；$2 代表文本行中的第2个数据字段；$n 代表文本行中的第n个数据字段。如：$ cat data2.txt    -- gawk中默认的字段分隔符是任意的空白字符（例如空格或制表符）One line of test text.Two lines of test text.Three lines of test text.$$ gawk '{print $1}' data2.txtOneTwoThree$gawk -F: '{print $1}' /etc/passwd    -- 如果你要读取采用了其他字段分隔符的文件，可以用 -F 选项指定，-F：并以冒号作为分隔符
3.在程序脚本中使用多个命令echo "My name is Rich" | gawk '{$4="Christine"; print $0}'  -- My name is Christine$ gawk '{    -- 也可以逐次输入> $4="Christine"> print $0}'My name is RichMy name is Christine$
4.从文件中读取程序（ -f 选项指定文件）$ cat script2.gawk{print $1 "'s home directory is " $6}$$ gawk -F: -f script2.gawk /etc/passwdroot's home directory is /rootbin's home directory is /bindaemon's home directory is /sbinadm's home directory is /var/admlp's home directory is /var/spool/lpd[...]Christine's home directory is /home/ChristineSamantha's home directory is /home/SamanthaTimothy's home directory is /home/Timothy$$ cat script3.gawk    -- 可以在程序文件中指定多条命令。只要一条命令放一行即可，不需要用分号{text = "'s home directory is "print $1 text $6    -- 注意，gawk程序在引用变量值时并未像shell脚本一样使用美元符}$$ gawk -F: -f script3.gawk /etc/passwdroot's home directory is /rootbin's home directory is /bindaemon's home directory is /sbinadm's home directory is /var/admlp's home directory is /var/spool/lpd[...]Christine's home directory is /home/ChristineSamantha's home directory is /home/SamanthaTimothy's home directory is /home/Timothy$
5.在处理数据前运行脚本（begin）$ cat data3.txtLine 1Line 2Line 3$$ gawk 'BEGIN {print "The data3 File Contents:"}> {print $0}' data3.txtThe data3 File Contents:Line 1Line 2Line 3$
6.在处理数据后运行脚本（end）$ gawk 'BEGIN {print "The data3 File Contents:"}> {print $0}> END {print "End of File"}' data3.txt$ cat script4.gawkBEGIN {print "The latest list of users and shells"print " UserID \t Shell"print "-------- \t -------"FS=":"}{print $1 " \t " $7}END {print "This concludes the listing"}$$ gawk -f script4.gawk /etc/passwd
#sed编辑器基础-------------------------
一、替换选项（s）1.替换标记格式：s/pattern/replacement/flags有4种可用的替换标记：数字，表明新文本将替换第几处模式匹配的地方；g ，表明新文本将会替换所有匹配的文本；p ，表明原先行的内容要打印出来；w file ，将替换的结果写到文件中。测试：$ cat data4.txtThis is a test of the test script.This is a different line.$$ sed 's/test/trial/' data4.txtThis is a trial of the test script.    -- 默认情况下它只替换每行中出现的第一处testThis is the second trial of the test script.$$ sed 's/test/trial/2' data4.txt    -- 替换了第二处的testThis is a test of the trial script.This is the second test of the trial script.$$ sed 's/test/trial/g' data4.txt    -- 替换了所有testThis is a trial of the trial script.This is the second trial of the trial script.$$ sed -n 's/test/trial/p' data5.txt    -- -n 选项将禁止sed编辑器输出。但 p 替换标记会输出修改过的行。将二者配合使用的效果就是只输出被替换命令修改过的行。This is a trial line.$$ sed 's/test/trial/w test.txt' data5.txt    -- 更改过的行将再输出到文件test.txtThis is a trial line.This is a different line.$$ cat test.txtThis is a trial line.$2.替换字符$ sed 's/\/bin\/bash/\/bin\/csh/' /etc/passwd    -- 由于正斜线通常用作字符串分隔符，因而如果它出现在了模式文本中的话，必须用反斜线来转义。麻烦！可以使用下面的方式。$ sed 's!/bin/bash!/bin/csh!' /etc/passwd    -- 感叹号被用作字符串分隔符
二、使用地址1. 数字方式的行寻址sed '2s/dog/cat/' data1.txt      -- 只改变第二行sed '2,3s/dog/cat/' data1.txt    -- 改变第2到3行这个区间的[2,3]sed '2,$s/dog/cat/' data1.txt    -- 改变第2到末尾这个区间[2,$]2. 使用文本模式过滤器$ grep Samantha /etc/passwdSamantha:x:502:502::/home/Samantha:/bin/bash$$ sed '/Samantha/s/bash/csh/' /etc/passwd   -- 必须用正斜线将要指定的 pattern 封起来[...]Samantha:x:502:502::/home/Samantha:/bin/csh[...]$3. 命令组合$ sed '2{    -- 单行上执行多条命令，可以用花括号将多条命令组合在一起> s/fox/elephant/> s/dog/cat/> }' data1.txtThe quick brown fox jumps over the lazy dog.The quick brown elephant jumps over the lazy cat.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.$
三、删除行（d）$ cat data1.txtThe quick brown fox jumps over the lazy dogThe quick brown fox jumps over the lazy dogThe quick brown fox jumps over the lazy dogThe quick brown fox jumps over the lazy dog$$ sed 'd' data1.txt  -- 删除所有行$$ sed '3d' data6.txt -- 删除第三行$$ sed '2,3d' data6.txt  -- 删除2至3行$$ sed '/number 1/d' data6.txt  -- 删除‘number 1’的行$$ sed '/number 1/,/number 3/d' data7.txt  -- ‘number 1’开启删除，‘number 3’结束删除。如果后面出现‘number 1’将再次开启，若未找到‘number 3’结束行将会删除之后所有$
四、插入和附加文本插入（ insert ）命令（ i ）会在指定行前增加一个新行；附加（ append ）命令（ a ）会在指定行后增加一个新行。$ echo "Test Line 2" | sed 'i\Test Line 1'    -- 当使用插入命令时，文本会出现在数据流文本的前面Test Line 1    Test Line 2    $ echo "Test Line 2" | sed 'a\Test Line 1'    -- 当使用附加命令时，文本会出现在数据流文本的后面Test Line 2Test Line 1$ sed '3i\> This is an inserted line.' data6.txt    -- 将一个新行插入到数据流第三行前This is line number 1.This is line number 2.This is an inserted line.This is line number 3.This is line number 4.$ sed '3a\> This is an appended line.' data6.txt    -- 将一个新行附加到数据流中第三行后This is line number 1.This is line number 2.This is line number 3.This is an appended line.This is line number 4.$ sed '$a\> This is a new line of text.' data6.txt    -- 将一个新行附加到数据流末This is line number 1.This is line number 2.This is line number 3.This is line number 4.This is a new line of text.
五、修改行修改（ change ）命令允许修改数据流中整行文本的内容。它跟插入和附加命令的工作机制一样。$ sed '3c\> This is a changed line of text.' data6.txt    -- 修改第三行This is line number 1.This is line number 2.This is a changed line of text.This is line number 4.$ sed '/number 3/c\> This is a changed line of text.' data6.txt    -- 修改匹配模式的行This is line number 1.This is line number 2.This is a changed line of text.This is line number 4.$ sed '2,3c\> This is a new line of text.' data6.txt    -- 修改区间行，结果不如所愿。替换了数据流中的两行文本This is line number 1.This is a new line of text.This is line number 4.
六、转换命令转换（ transform ）命令（ y ）是唯一可以处理单个字符的sed编辑器命令。格式：sed y/inchars/outchars/  转换命令会对 inchars 和 outchars 值进行一对一的映射，所以长度要等同。 inchars 中的第一个字符会被转换为 outchars 中的第一个字符，第二个字符会被转换成 outchars 中的第二个字符$ echo "This 1 is a test of 1 try." | sed 'y/123/456/'    This 4 is a test of 4 try.    -- 1转换成了4$
七、打印p 命令用来打印文本行；等号（ = ）命令用来打印行号；l （小写的L）命令用来列出行。1. 打印行$ echo "this is a test" | sed 'p'    -- 打印输出的一行this is a testthis is a test$$ cat data6.txtThis is line number 1.This is line number 2.This is line number 3.This is line number 4.$ sed -n '/number 3/p' data6.txt    -- 打印命令最常见的用法是打印包含匹配文本模式的行This is line number 3.$$ sed -n '2,3p' data6.txt    -- 打印[2,3]行This is line number 2.This is line number 3.$$ sed -n '/3/{    -- sed编辑器命令会查找包含数字3的行，然后执行两条命令。首先，脚本用 p 命令来打印出原始行；然后它用 s 命令替换文本，并用 p 标记打印出替换结果> p> s/line/test/p> }' data6.txtThis is line number 3.This is test number 3.$2. 打印行号等号命令会打印行在数据流中的当前行号$ cat data1.txtThe quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.$$ sed '=' data1.txt1The quick brown fox jumps over the lazy dog.2The quick brown fox jumps over the lazy dog.3The quick brown fox jumps over the lazy dog.4The quick brown fox jumps over the lazy dog.$$ sed -n '/number 4/{    -- 利用 -n 选项，你就能让sed编辑器只显示包含匹配文本模式的行的行号和文本> => p> }' data6.txt4This is line number 4.$3. 列出行列出（ list ）命令（ l ）可以打印数据流中的文本和不可打印的ASCII字符$ cat data9.txtThis line contains tabs.$$ sed -n 'l' data9.txtThis\tline\tcontains\ttabs.$    -- 制表符的位置会使用 \t 来显示。行尾的美元符表示换行符。$$ cat data10.txtThis line contains an escape character.$$ sed -n 'l' data10.txtThis line contains an escape character. \a$    -- data10.txt文本文件包含了一个转义控制码来产生铃声。当用 cat 命令来显示文本文件时，你看不到转义控制码，只能听到声音（如果你的音箱打开的话）。但是，利用列出命令，你就能显示出所使用的转义控制码$
八、使用 sed 处理文件1. 写入文件（write，w）格式：[address]w filename    -- filename 参数指定了数据文件的绝对路径或相对路径，address指的是外部文件data6.txt$ sed '1,2w test.txt' data6.txt    -- 区间使用。如果你不想让行显示到 STDOUT 上，你可以用 sed 命令的 -n 选项This is line number 1.This is line number 2.This is line number 3.This is line number 4.$$ cat test.txtThis is line number 1.This is line number 2.$$ cat data11.txtBlum, R BrowncoatMcGuiness, A AllianceBresnahan, C BrowncoatHarken, C Alliance$$ sed -n '/Browncoat/w Browncoats.txt' data11.txt    -- 模式使用。$$ cat Browncoats.txtBlum, R BrowncoatBresnahan, C Browncoat$2. 从文件读取数据（read，r）格式：[address]r filename$ cat data12.txtThis is an added line.This is the second added line.$$ sed '3r data12.txt' data6.txt    -- 第三行后This is line number 1.This is line number 2.This is line number 3.This is an added line.This is the second added line.This is line number 4.$$ sed '/number 2/r data12.txt' data6.txtThis is line number 1.This is line number 2.This is an added line.This is the second added line.This is line number 3.This is line number 4.$$ sed '$r data12.txt' data6.txtThis is line number 1.This is line number 2.This is line number 3.This is line number 4.This is an added line.This is the second added line.$$ cat notice.stdWould the following people:LISTplease report to the ship's captain.$$ sed '/LIST/{    -- 在list后添加，并删除list> r data11.txt> d> }' notice.stdWould the following people:Blum, R BrowncoatMcGuiness, A AllianceBresnahan, C BrowncoatHarken, C Allianceplease report to the ship's captain.$sed深入
#多行命令需要对跨多行的数据执行特定操作。如一条短语分布在两行中。三个可用来处理多行文本的特殊命令：N ：将数据流中的下一行加进来创建一个多行组（multiline group）来处理。D ：删除多行组中的一行。P ：打印多行组中的一行。next 命令1. 单行的 next 命令： n 命令会让sed编辑器移动到文本的下一行。$ cat data1.txtThis is the header line.This is a data line.This is the last line.$                                $ sed '/header/{n ; d}' data1.txt    -- 找到header会删除模式匹配成功的下一行This is the header line.This is a data line.This is the last line.$2. 合并文本行：多行版本的 next 命令（用大写N）会将下一文本行添加到模式空间中已有的文本后，作为一行处理。$ cat data4.txtOn Tuesday, the Linux SystemAdministrator's group meeting will be held.All System Administrators should attend.$$ sed '> s/System Administrator/Desktop User/    -- 将单行命令放前面可以解决最后一行找不到下一行合并的情况> N> s/System\nAdministrator/Desktop\nUser/> ' data4.txtOn Tuesday, the Linux DesktopUser's group meeting will be held.All Desktop Users should attend.$多行删除命令sed编辑器提供了多行删除命令 D ，它只删除模式空间中的第一行。该命令会删除到换行符（含换行符）为止的所有字符$ sed 'N ; /System\nAdministrator/D' data4.txtAdministrator's group meeting will be held.All System Administrators should attend.$ cat data5.txtThis is the header line.This is a data line.This is the last line.$$ sed '/^$/{N ; /header/D}' data5.txt    -- 删除数据流中出现在第一行前的空白行This is the header line.This is a data line.This is the last line.$多行打印命令多行打印命令（ P ）沿用了同样的方法。它只打印多行模式空间中的第一行$ cat data3.txtOn Tuesday, the Linux SystemAdministrator's group meeting will be held.All System Administrators should attend.Thank you for your attendance.$$ sed -n 'N ; /System\nAdministrator/P' data3.txtOn Tuesday, the Linux System
#保持空间模式空间（pattern space）是一块活跃的缓冲区，sed编辑器有另一块称作保持空间（hold space）的缓冲区域。sed编辑器的保持空间命令命 令  描 述h      将模式空间复制到保持空间H      将模式空间附加到保持空间g      将保持空间复制到模式空间G      将保持空间附加到模式空间x      交换模式空间和保持空间的内容$ cat data2.txtThis is the header line.This is the first data line.This is the second data line.This is the last line.$$ sed -n '/first/ {h ; p ; n ; p ; g ; p }' data2.txtThis is the first data line.This is the second data line.This is the first data line.$$ sed -n '/first/ {h ; n ; p ; g ; p }' data2.txt    -- 合理使用，以相反方向输出这两行This is the second data line.This is the first data line.$
#排除命令感叹号命令（ ! ）用来排除（ negate ）命令$ sed -n '/header/!p' data2.txt    -- 除了包含单词header那一行外，文件中其他所有的行都被打印出来了This is the first data line.This is the second data line.This is the last line.$$ sed '$!N;    -- 最后一行，不执行N命令> s/System\nAdministrator/Desktop\nUser/> s/System Administrator/Desktop User/> ' data4.txtOn Tuesday, the Linux DesktopUser's group meeting will be held.All Desktop Users should attend.$$ cat data2.txtThis is the header line.This is the first data line.This is the second data line.This is the last line.$$ sed -n '{1!G ; h ; $p }' data2.txt    -- 倒序输出（未理解），可以使用tac达到相同效果cat data2.txtThis is the last line.This is the second data line.This is the first data line.This is the header line.$
#改变流sed编辑器会从脚本的顶部开始，一直执行到脚本的结尾。sed编辑器提供了一个方法来改变命令脚本的执行流程分支（ branch ）命令 b的格式如下：[ address ]b [ label ]address 参数决定了哪些行的数据会触发分支命令。 label 参数定义了要跳转到的位置。如果没有加 label 参数，跳转命令会跳转到脚本的结尾。$ cat data2.txtThis is the header line.This is the first data line.This is the second data line.This is the last line.$$ sed '{2,3b ; s/This is/Is this/ ; s/line./test?/}' data2.txt    -- 分支命令在数据流中的第2行和第3行处跳过了两个替换命令Is this the header test?This is the first data line.This is the second data line.Is this the last test?$    要是不想直接跳到脚本的结尾，可以为分支命令定义一个要跳转到的标签。标签以冒号开始，最多可以是7个字符长度。$ sed '{/first/b jump1 ; s/This is the/No jump on/    -- 自定义标签。标签允许你跳过地址匹配处的命令，但仍然执行脚本中的其他命令> :jump1    > s/This is the/Jump here on/}' data2.txtNo jump on header lineJump here on first data lineNo jump on second data lineNo jump on last line$    $ echo "This, is, a, test, to, remove, commas." | sed -n '{> :start    -- 也可以跳转到脚本中靠前面的标签上，这样就达到了循环的效果> s/,//1p> /,/b start    -- 设置了匹配模式，没有（，）循环就会终止> }'This is, a, test, to, remove, commas.This is a, test, to, remove, commas.This is a test, to, remove, commas.This is a test to, remove, commas.This is a test to remove, commas.This is a test to remove commas.$    测试类似于分支命令，测试（ test ）命令（ t ）也可以用来改变sed编辑器脚本的执行流程格式：[ address ]t [ label ]    $ sed '{> s/first/matched/> t    -- first改行不再执行> s/This is the/No match on/> }' data2.txtNo match on header lineThis is the matched data lineNo match on second data lineNo match on last line$    $ echo "This, is, a, test, to, remove, commas. " | sed -n '{> :start> s/,//1p> t start    -- 当无需替换时，测试命令不会跳转而是继续执行剩下的脚本> }'This is, a, test, to, remove, commas.This is a, test, to, remove, commas.This is a test, to, remove, commas.This is a test to, remove, commas.This is a test to remove, commas.This is a test to remove commas.$
#模式替代& 符号不管模式匹配的是什么样的文本，你都可以在替代模式中使用 & 符号来使用这段文本$ echo "The cat sleeps in his hat." | sed 's/.at/"&"/g'The "cat" sleeps in his "hat".替代单独的单词sed编辑器用圆括号来定义替换模式中的子模式。替代字符由反斜线和数字组成，第一个子模式分配字符 \1 ，给第二个子模式分配字符 \2 ，依此类推。$ echo "The System Administrator manual" | sed '> s/\(System\) Administrator/\1 User/'    -- \1 来提取第一个匹配的子模式The System User manual            $ echo "1234567" | sed '{    -- 大数字中插入逗号> :start> s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/> t start> }'1,234,567
#脚本中使用sed使用包装脚本$ cat sw_cs.shsed -n '{ 1!G ; h ; $p }' $1$$ sh sw_cs.sh data2.txtThis is the last line.This is the second data line.This is the first data line.This is the header line.重定向 sed 的输出$ cat fact.shfactorial=1counter=1number=$1#while [ $counter -le $number ]dofactorial=$[ $factorial * $counter ]counter=$[ $counter + 1 ]done#result=$(echo $factorial | sed '{:starts/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/t start}')#echo "The result is $result"#$$ sh fact.sh 20The result is 2,432,902,008,176,640,000$
#创建 sed 实用工具1.加倍行间距$ sed 'G' data2.txtThis is the header line.This is the first data line.This is the second data line.This is the last line.$$ sed '$!G' data2.txtThis is the header line.This is the first data line.This is the second data line.This is the last line.    -- 最后一行跳过G$2.对可能含有空白行的文件加倍行间距$ cat data6.txtThis is line one.This is line two.This is line three.This is line four.$    $ sed '/^$/d ; $!G' data6.txtThis is line one.This is line two.This is line three.This is line four.$    3.给文件中的行编号$ sed '=' data2.txt | sed 'N; s/\n/ /'1 This is the header line.2 This is the first data line.3 This is the second data line.4 This is the last line.4.打印末尾行(打印末尾10行)$ cat data7.txtThis is line 1.This is line 2.This is line 3.This is line 4.This is line 5.This is line 6.This is line 7.This is line 8.This is line 9.This is line 10.This is line 11.This is line 12.This is line 13.This is line 14.This is line 15.$$ sed '{> :start> $q ; N ; 11,$D> b start> }' data7.txtThis is line 6.This is line 7.This is line 8.This is line 9.This is line 10.This is line 11.This is line 12.This is line 13.This is line 14.This is line 15.$5.删除行删除连续的空白行：$ sed '/./,/^$/!d' data8.txt删除开头的空白行：$ sed '/./,$!d' data9.txt删除结尾的空白行:$ sed '{:start/ ^\n*$/{$d; N; b start }}'6.删除 HTML 标签$ sed 's/<[^>]*>//g ; /^$/d' data11.txtgawk深入
#使用变量一、内建变量1. 字段和记录分隔符变量gawk数据字段和记录变量变 量            描 述FIELDWIDTHS      由空格分隔的一列数字，定义了每个数据字段确切宽度FS               输入字段分隔符RS               输入记录分隔符OFS              输出字段分隔符ORS              输出记录分隔符FS与OPS的使用$ cat data1data11,data12,data13,data14,data15data21,data22,data23,data24,data25data31,data32,data33,data34,data35$ gawk 'BEGIN{FS=","} {print $1,$2,$3}' data1    -- 默认情况下，gawk将 OFS 设成一个空格data11 data12 data13data21 data22 data23data31 data32 data33$ gawk 'BEGIN{FS=","; OFS="-"} {print $1,$2,$3}' data1    -- 修改输出分隔符值data11-data12-data13data21-data22-data23data31-data32-data33FIELDWIDTHS 变量定义了四个字段，根据已定义好的字段长度来分割$ cat data1b1005.3247596.37115-2.349194.0005810.1298100.1$ gawk 'BEGIN{FIELDWIDTHS="3 5 2 5"}{print $1,$2,$3,$4}' data1b100 5.324 75 96.37115 -2.34 91 94.00058 10.12 98 100.1RS与ORS的使用：默认情况下，gawk将 RS 和 ORS 设为换行符$ cat data2Riley Mullen123 Main StreetChicago, IL 60601(312)555-1234Frank Williams456 Oak StreetIndianapolis, IN 46201(317)555-9876Haley Snell4231 Elm StreetDetroit, MI 48201(313)555-4938$ gawk 'BEGIN{FS="\n"; RS=""} {print $1,$4}' data2    -- 现在gawk把文件中的每行都当成一个字段，把空白行当作记录分隔符Riley Mullen (312)555-1234Frank Williams (317)555-9876Haley Snell (313)555-49382. 数据变量更多的gawk内建变量变 量           描 述ARGC            当前命令行参数个数ARGIND          当前文件在 ARGV 中的位置ARGV            包含命令行参数的数组CONVFMT         数字的转换格式（参见 printf 语句），默认值为 %.6 gENVIRON         当前shell环境变量及其值组成的关联数组ERRNO           当读取或关闭输入文件发生错误时的系统错误号FILENAME        用作gawk输入数据的数据文件的文件名FNR             当前数据文件中的数据行数IGNORECASE      设成非零值时，忽略 gawk 命令中出现的字符串的字符大小写NF              数据文件中的字段总数NR              已处理的输入记录数OFMT            数字的输出格式，默认值为 %.6 gRLENGTH         由 match 函数所匹配的子字符串的长度RSTART          由 match 函数所匹配的子字符串的起始位置ARGC与ARGV$ gawk 'BEGIN{print ARGC,ARGV[1]}' data1  -- ARGC 变量表明命令行上有两个参数。 ARGV 数组从索引 0 开始。ENVIRON 变量，使用关联数组来提取shell环境变量$ gawk '> BEGIN{> print ENVIRON["HOME"]> print ENVIRON["PATH"]> }'/home/rich/usr/local/bin:/bin:/usr/bin:/usr/X11R6/binNF 变量可以让你在不知道具体位置的情况下指定记录中的最后一个数据字段$ gawk 'BEGIN{FS=":"; OFS=":"} {print $1,$NF}' /etc/passwd    -- rich:/bin/bashtesty:/bin/cshmark:/bin/bashdan:/bin/bashmike:/bin/bashtest:/bin/bashFNR 和 NR 变量虽然类似，但又略有不同。$ gawk '> BEGIN {FS=","}> {print $1,"FNR="FNR,"NR="NR}> END{print "There were",NR,"records processed"}' data1 data1  data11 FNR=1 NR=1data21 FNR=2 NR=2data31 FNR=3 NR=3data11 FNR=1 NR=4    -- FNR 变量的值在 gawk 处理第二个数据文件时被重置了，而 NR 变量则在处理第二个数据文件时继续计数。data21 FNR=2 NR=5data31 FNR=3 NR=6There were 6 records processed2 data1二、自定义变量gawk自定义变量名可以是任意数目的字母、数字和下划线，但不能以数字开头。注意： gawk 变量名区分大小写！1. 在脚本中给变量赋值（值可以是数字、文本，赋值支持数学算式）$ gawk '> BEGIN{> testing="This is a test"> print testing> testing=45> print testing> }'This is a test45    $ gawk 'BEGIN{x=4; x= x * 2 + 3; print x}'11    2. 在命令行上给变量赋值$ cat script1BEGIN{FS=","}{print $n}$ gawk -f script1 n=2 data1    -- 显示了文件的第二个数据字段data12data22data32用 -v 命令行参数，它允许你在 BEGIN 代码之前设定变量，否则设定的变量值在begin部分不可用。-v 命令行参数必须放在脚本代码之前。$ gawk -v n=3 -f script2 data1The starting value is 3data13data23data33
#处理数组定义数组变量格式：var[index] = element  其中 var 是变量名，index 是关联数组的索引值，element 是数据元素值$ gawk 'BEGIN{> capital["Illinois"] = "Springfield"> print capital["Illinois"]> }'Springfield$$ gawk 'BEGIN{> var[1] = 34> var[2] = 3> total = var[1] + var[2]> print total> }'37        遍历数组变量gwak遍历格式：for (var in array){statements}变量中存储的是索引值而不是数组元素值$ gawk 'BEGIN{> var["a"] = 1> var["g"] = 2> var["m"] = 3> var["u"] = 4> for (test in var)> {> print "Index:",test," - Value:",var[test]> }> }'Index: u - Value: 4Index: m - Value: 3Index: a - Value: 1Index: g - Value: 2            删除数组变量格式：delete array[index]删除命令会从数组中删除关联索引值和相关的数据元素值$ gawk 'BEGIN{> var["a"] = 1> var["g"] = 2> for (test in var)> {> print "Index:",test," - Value:",var[test]> }> delete var["g"]> print "---"> for (test in var)> print "Index:",test," - Value:",var[test]> }'Index: a - Value: 1Index: g - Value: 2---Index: a - Value: 1
#使用模式正则表达式正则表达式必须出现在它要控制的程序脚本的左花括号前$ gawk 'BEGIN{FS=","} /11/{print $1}' data1data11匹配操作符匹配操作符是波浪线（ ~ ），允许将正则表达式限定在记录中的特定数据字段$ gawk –F: '$1 !~ /rich/{print $1,$NF}' /etc/passwd    -- gawk程序脚本会打印/etc/passwd文件中与用户ID  rich 不匹配的用户ID和登录shellroot /bin/bashdaemon /bin/shbin /bin/shsys /bin/sh$ gawk -F: '$1 ~ /rich/{print $1,$NF}' /etc/passwdrich /bin/bash$ gawk 'BEGIN{FS=","} $2 ~ /^data2/{print $0}' data1data21,data22,data23,data24,data25数学表达式可以使用任何常见的数学比较表达式。x == y ：值x等于y。x <= y ：值x小于等于y。x < y ：值x小于y。x >= y ：值x大于等于y。x > y ：值x大于y。$ gawk -F: '$4 == 0{print $1}' /etc/passwd    -- 显示所有属于root用户组（组ID为 0 ）的系统用户rootsyncshutdownhaltoperator也可以对文本数据使用表达式，跟正则表达式不同，表达式必须完全匹配$ gawk -F, '$1 == "data"{print $1}' data1$$ gawk -F, '$1 == "data11"{print $1}' data1data11
#结构命令if语句gawk编程语言支持标准的 if-then-else 格式的 if 语句。你必须为 if 语句定义一个求值的条件，并将其用圆括号括起来格式：if (condition) statement1$ cat data4105135034$ gawk '{if ($1 > 20) print $1}' data45034$ gawk '{> if ($1 > 20)> {    -- 执行多条语句，就必须用花括号将它们括起来> x = $1 * 2> print x> }> }' data410068gawk 的 if 语句也支持 else 子句，允许在 if 语句条件不成立的情况下执行一条或多条语句$ gawk '{> if ($1 > 20)> {> x = $1 * 2> print x> } else> {> x = $1 / 2> print x> }}' data452.56.510068while语句while 循环允许遍历一组数据，并检查迭代的结束条件格式：while (condition){statements}$ cat data5130 120 135160 113 140145 170 215$ gawk '{> total = 0> i = 1> while (i < 4)> {> total += $i> i++> }> avg = total / 3> print "Average:",avg> }' data5Average: 128.333Average: 137.667Average: 176.667gawk编程语言支持在 while 循环中使用 break 语句和 continue 语句，允许你从循环中跳出$ gawk '{> total = 0> i = 1> while (i < 4)> {> total += $i> if (i == 2)> break> i++> }> avg = total / 2> print "The average of the first two data elements is:",avg> }' data5The average of the first two data elements is: 125The average of the first two data elements is: 136.5The average of the first two data elements is: 157.5do—while语句do-while 语句类似于 while 语句，但会在检查条件语句之前执行命令格式：do{statements} while (condition)$ gawk '{> total = 0> i = 1> do> {> total += $i> i++> } while (total < 150)> print total }' data5250160315for语句gawk编程语言支持C风格的 for 循环格式：for( variable assignment; condition; iteration process)$ gawk '{> total = 0> for (i = 1; i < 4; i++)> {> total += $i> }> avg = total / 3> print "Average:",avg> }' data5Average: 128.333Average: 137.667Average: 176.667
#格式化打印格式：printf "format string", var1, var2 . . .    format string 是格式化输出的关键它会用文本元素和格式化指定符来具体指定如何呈现格式化输出。格式化指定符是一种特殊的代码，会指明显示什么类型的变量以及如何显示。gawk程序会将每个格式化指定符作为占位符，供命令中的变量使用。第一个格式化指定符对应列出的第一个变量，第二个对应第二个变量，依此类推格式化指定符格式：%[modifier]control-letter  -- 其中 control-letter 是一个单字符代码，用于指明显示什么类型的数据格式化指定符的控制字母控制字母  描 述c         将一个数作为ASCII字符显示d         显示一个整数值i         显示一个整数值（跟d一样）e         用科学计数法显示一个数f         显示一个浮点值g         用科学计数法或浮点数显示（选择较短的格式）o         显示一个八进制值s         显示一个文本字符串x         显示一个十六进制值X         显示一个十六进制值，但用大写字母A~F    科学计数法显示一个数$ gawk 'BEGIN{> x = 10 * 100> printf "The answer is: %e\n", x> }'The answer is: 1.000000e+03除了控制字母外，还有3种修饰符可以用来进一步控制输出：width ：指定了输出字段最小宽度的数字值。如果输出短于这个值， printf 会将文本右对齐，并用空格进行填充。如果输出比指定的宽度还要长，则按照实际的长度输出。prec ：这是一个数字值，指定了浮点数中小数点后面位数，或者文本字符串中显示的最大字符数。- （减号）：指明在向格式化空间中放入数据时采用左对齐而不是右对齐。在使用 printf 语句时，你可以完全控制输出样式通过添加一个值为 16 的修饰符，我们强制第一个字符串的输出宽度为16个字符。默认情况下，printf 命令使用右对齐来将数据放到格式化空间中。要改成左对齐，只需给修饰符加一个减号即可            $ gawk 'BEGIN{FS="\n"; RS=""} {printf "%-16s %s\n", $1, $4}' data2Riley Mullen (312)555-1234Frank Williams (317)555-9876Haley Snell (313)555-4938
#内建函数数学函数gawk数学函数函 数        描 述atan2(x, y)  x/y的反正切，x和y以弧度为单位cos(x)       x的余弦，x以弧度为单位exp(x)       x的指数函数int(x)       x的整数部分，取靠近零一侧的值log(x)       x的自然对数rand( )      比0大比1小的随机浮点值sin(x)       x的正弦，x以弧度为单位sqrt(x)      x的平方根srand(x)    为计算随机数指定一个种子值gawk还支持一些按位操作数据的函数。and(v1, v2) ：执行值 v1 和 v2 的按位与运算。compl(val) ：执行 val 的补运算。lshift(val, count) ：将值 val 左移 count 位。or(v1, v2) ：执行值 v1 和 v2 的按位或运算。rshift(val, count) ：将值 val 右移 count 位。xor(v1, v2) ：执行值 v1 和 v2 的按位异或运算。字符串函数gawk字符串函数函 数                      描 述asort(s [,d])              将数组s按数据元素值排序。索引值会被替换成表示新的排序顺序的连续数字。另外，如果指定了d，则排序后的数组会存储在数组d中asorti(s [,d])             将数组s按索引值排序。生成的数组会将索引值作为数据元素值，用连续数字索引来表明排序顺序。另外如果指定了d，排序后的数组会存储在数组d中gensub(r, s, h [, t])      查找变量$0或目标字符串t（如果提供了的话）来匹配正则表达式r。如果h是一个以g或G开头的字符串，就用s替换掉匹配的文本。如果h是一个数字，它表示要替换掉第h处r匹配的地方gsub(r, s [,t])            查找变量$0或目标字符串t（如果提供了的话）来匹配正则表达式r。如果找到了，就全部替换成字符串sindex(s, t)                返回字符串t在字符串s中的索引值，如果没找到的话返回 0length([s])                返回字符串s的长度；如果没有指定的话，返回$0的长度match(s, r [,a])           返回字符串s中正则表达式r出现位置的索引。如果指定了数组a，它会存储s中匹配正则表达式的那部分split(s, a [,r])           将s用 FS 字符或正则表达式r（如果指定了的话）分开放到数组a中。返回字段的总数sprintf(format,variables)  用提供的format和variables返回一个类似于printf输出的字符串sub(r, s [,t])             在变量$0或目标字符串t中查找正则表达式r的匹配。如果找到了，就用字符串s替换掉第一处匹配substr(s, i [,n])          返回s中从索引值i开始的n个字符组成的子字符串。如果未提供n，则返回s剩下的部分tolower(s)                 将s中的所有字符转换成小写toupper(s)                 将s中的所有字符转换成大写转换大写，返回长度$ gawk 'BEGIN{x = "testing"; print toupper(x); print length(x) }'TESTING7        排序$ gawk 'BEGIN{> var["a"] = 1> var["g"] = 2> var["m"] = 3> var["u"] = 4> asort(var, test)> for (i in test)> print "Index:",i," - value:",test[i]> }'Index: 4 - value: 4Index: 1 - value: 1Index: 2 - value: 2Index: 3 - value: 3        时间函数gawk的时间函数函 数                           描 述mktime(datespec)                将一个按YYYY MM DD HH MM SS [DST]格式指定的日期转换成时间戳值 ①strftime(format[,timestamp])    将当前时间的时间戳或timestamp（如果提供了的话）转化格式化日期（采用shell函数date()的格式）              systime( )                      返回当前时间的时间戳例如：$ gawk 'BEGIN{> date = systime()> day = strftime("%A, %B %d, %Y", date)> print day> }'Friday, December 26, 2014
#自定义函数定义函数格式：function name([variables]){statements}使用自定义函数在定义函数时，它必须出现在所有代码块之前（包括 BEGIN 代码块），有助于将函数代码与gawk程序的其他部分分开$ gawk '> function myprint()> {> printf "%-16s - %s\n", $1, $4> }> BEGIN{FS="\n"; RS=""}> {> myprint()> }' data2Riley Mullen - (312)555-1234Frank Williams - (317)555-9876Haley Snell - (313)555-4938创建函数库$ cat funclibfunction myprint(){printf "%-16s - %s\n", $1, $4}function myrand(limit){return int(limit * rand())}function printthird(){print $3}$ cat script4BEGIN{ FS="\n"; RS=""}{myprint()}$ gawk -f funclib -f script4 data2Riley Mullen - (312)555-1234Frank Williams - (317)555-9876Haley Snell - (313)555-4938
#实例计算保龄球锦标赛成绩$ cat scores.txtRich Blum,team1,100,115,95Barbara Blum,team1,110,115,100Christine Bresnahan,team2,120,115,118Tim Bresnahan,team2,125,112,116$ cat bowling.shfor team in $(gawk –F, '{print $2}' scores.txt | uniq)dogawk –v team=$team 'BEGIN{FS=","; total=0}{if ($2==team){total += $3 + $4 + $5;}}END {avg = total / 6;print "Total for", team, "is", total, ",the average is",avg}' scores.txtdone$ sh bowling.shTotal for team1 is 635, the average is 105.833Total for team2 is 706, the average is 117.667
转载于:https://www.cnblogs.com/TianMu/p/11199371.html
流编辑器sed和gawk相关推荐

详解流编辑器 sed 和编程语言 awk
一.流编辑器 sed sed 是一个精简的.非交互式的流式编辑器,它在命令行中输入编辑命令和指定文件名,然后在屏幕上查看输出. 逐行读取文件内容存储在临时缓冲区中,称为"模式空间" ...
linux shell中的流编辑器sed的使用
sed流编辑器 23.3.1 sed strem editor 流编辑器 sed编辑器是一行一行的处理文件内容的.正在处理的内容存放在模式空间(缓冲区)内,处理完成后按照选项的规定进行输出或文件的 ...
54. 流编辑器sed技术概览
1.流编辑器sed:不需要与人进行交互,修改文件是重点 sed工作流程因此,sed不可将退出状态作为执行成功与否判断的依据. sed '' /etc/passwd 原封不动的逐行输出 sed 'd' ...
shell脚本编程笔记（九）—— 初识流编辑器 sed
一. 流编辑器 sed编辑器被称作流编辑器(stream editor).在交互式文本编辑器中(比如vim),你可以用键盘命令来交互式地插入.删除或替换数据中的文本.流编辑器则基于预先提供的一组命令来 ...
Linux两个命令行编辑器——sed和gawk
1.sed: sed编辑器被称作流编辑器(stream editor) 2.gawk程序是Unix中的原始awk程序的GNU版本.gawk程序让流编辑迈上了一个新的台阶,它提供了一种编程语言而不只是 ...
shell脚本学习笔记（流编辑器sed）
sed意为流编辑器(Stream Editor),在Shell脚本和Makefile中作为过滤器使用非常普遍,也就是把前一个程序的输出引入sed的输入,经过一系列编辑命令转换为另一种格式输出.sed不 ...
Shell学习总结-流编辑器sed
目录正则表达式定址命令与选项用sed修改文件元字符 sed范例 106- 正则表达式与grep一样,sed在文件中查找模式时也要使用正则表达式(RE)和各种元字符.正则表达式是括在斜杠间的 ...
SED单行脚本快速参考(Unix流编辑器)
SED单行脚本快速参考(Unix流编辑器) « web2.0ã€äº'è®¡ç®-ã€é›†ç¾¤ã€é«˜å¯ç"¨æ€§ SED单行脚本快速参考(Unix流编辑器) 2005年12月29 ...
文本处理三剑客之sed（流编辑器）
文本处理三剑客之sed(流编辑器) - 行编辑器把当前处理的行存储在临时缓冲区,称为模式空间,然后把模式空间的内容送往屏幕,一行一行的处理,主要用来编辑一个或者多个文件. - 用法 sed [opt ...
流编辑器sed和gawk

流编辑器sed和gawk相关推荐

最新文章

热门文章