{"id":867,"date":"2021-02-09T13:04:39","date_gmt":"2021-02-09T21:04:39","guid":{"rendered":"http:\/\/gantovnik.com\/bio-tips\/?p=867"},"modified":"2021-02-09T19:55:29","modified_gmt":"2021-02-10T03:55:29","slug":"156-filtering-content-from-files-using-awk","status":"publish","type":"post","link":"https:\/\/gantovnik.com\/bio-tips\/2021\/02\/156-filtering-content-from-files-using-awk\/","title":{"rendered":"#156 Filtering content from files using awk"},"content":{"rendered":"<p>#156 Displaying and filtering the content of files with awk<\/p>\n<p>Using the following command, we can print all lines from the file:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' { print } ' \/etc\/passwd\r\n<\/pre>\n<p>This is equivalent to using the $0 variable. The $0 variables refers to the complete line.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' { print $0 } ' \/etc\/passwd\r\n<\/pre>\n<p>If we want to print only the first field from the file, we can use the $1 variable. However, we need to specify that in this file the field separator used is a colon.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk -F&quot;:&quot; '{ print $1 }' \/etc\/passwd\r\n<\/pre>\n<p>We can do it using BEGIN block:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' BEGIN {FS=&quot;:&quot;} { print $1 } ' \/etc\/passwd\r\n<\/pre>\n<p>The code with the BEGIN and END blocks is processed just once, whereas the main block is processed for each line. <\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' BEGIN {FS=&quot;:&quot;} { print $1 } END {print NR} ' \/etc\/passwd\r\n<\/pre>\n<p>The awk internal variable NR maintains the number of processed lines. <\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' BEGIN {FS=&quot;:&quot;} { print $1 } END {print &quot;Total number of lines:&quot;,NR} ' \/etc\/passwd\r\n<\/pre>\n<p>We can easily display the running line count with each line:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' BEGIN {FS=&quot;:&quot;} { print NR,$1 } END {print &quot;Total number of lines:&quot;,NR} ' \/etc\/passwd\r\n<\/pre>\n<p>If we want to print only the first five lines, we use the following code<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' NR &lt; 6 ' \/etc\/passwd\r\n<\/pre>\n<p>If we want to print lines 8 through to 12, we use the following code:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' NR==8,NR==12 ' \/etc\/passwd\r\n<\/pre>\n<p>We can use regular expressions to match the text in the lines. Let&#8217;s look for lines that end with the word &#8220;bash&#8221;:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nawk ' \/bash$\/ ' \/etc\/passwd\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>#156 Displaying and filtering the content of files with awk Using the following command, we can print all lines from the file: awk &#8216; { print } &#8216; \/etc\/passwd This is equivalent to using the $0 variable. The $0 variables refers to the complete line. awk &#8216; { print $0 } &#8216; \/etc\/passwd If we [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_et_pb_use_builder":"off","_et_pb_old_content":"#156 Displaying and filtering the content of files with awk\r\n\r\nUsing the following command, we can print all lines from the file:\r\n[code language=\"python\"]\r\nawk ' { print } ' \/etc\/passwd\r\n[\/code]\r\n\r\nThis is equivalent to using the $0 variable. The $0 variables refers to the complete line.\r\n[code language=\"python\"]\r\nawk ' { print $0 } ' \/etc\/passwd\r\n[\/code]\r\n\r\nIf we want to print only the first field from the file, we can use the $1 variable. However, we need to specify that in this file the field separator used is a colon.\r\n[code language=\"python\"]\r\nawk -F\":\" '{ print $1 }' \/etc\/passwd\r\n[\/code]\r\n\r\nWe can do it using BEGIN block:\r\n[code language=\"python\"]\r\nawk ' BEGIN {FS=\":\"} { print $1 } ' \/etc\/passwd\r\n[\/code]\r\n\r\nThe code with the BEGIN and END blocks is processed just once, whereas the main block is processed for each line. \r\n[code language=\"python\"]\r\nawk ' BEGIN {FS=\":\"} { print $1 } END {print NR} ' \/etc\/passwd\r\n[\/code]\r\nThe awk internal variable NR maintains the number of processed lines. \r\n[code language=\"python\"]\r\nawk ' BEGIN {FS=\":\"} { print $1 } END {print \"Total number of lines:\",NR} ' \/etc\/passwd\r\n[\/code]\r\n\r\nWe can easily display the running line count with each line:\r\n[code language=\"python\"]\r\nawk ' BEGIN {FS=\":\"} { print NR,$1 } END {print \"Total number of lines:\",NR} ' \/etc\/passwd\r\n[\/code]\r\n\r\nIf we want to print only the first five lines, we use the following code\r\n[code language=\"python\"]\r\nawk ' NR < 6 ' \/etc\/passwd\r\n[\/code]\r\n\r\nIf we want to print lines 8 through to 12, we use the following code:\r\n[code language=\"python\"]\r\nawk ' NR==8,NR==12 ' \/etc\/passwd\r\n[\/code]\r\n\r\nWe can use regular expressions to match the text in the lines. Let's look for lines that end with the word \"bash\":\r\n[code language=\"python\"]\r\nawk ' \/bash$\/ ' \/etc\/passwd\r\n[\/code]","_et_gb_content_width":"","_lmt_disableupdate":"yes","_lmt_disable":"","jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[18,8,20],"tags":[],"class_list":["post-867","post","type-post","status-publish","format-standard","hentry","category-awk","category-bash","category-linux"],"modified_by":"gantovnik","jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8bH0k-dZ","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":890,"url":"https:\/\/gantovnik.com\/bio-tips\/2021\/02\/158-filtering-content-from-files-based-on-field-value-using-awk\/","url_meta":{"origin":867,"position":0},"title":"#158 Filtering content from files based on field value using awk","author":"gantovnik","date":"2021-02-09","format":false,"excerpt":"#158 Filtering content from files based on field value using awk Request to print lines which has value in field #3 larger than 999: [code language=\"python\"] awk -F\":\" '$3 > 999 ' \/etc\/passwd [\/code] Request to print lines which has value in field #3 smaller than 101: [code language=\"python\"] awk\u2026","rel":"","context":"In &quot;awk&quot;","block_context":{"text":"awk","link":"https:\/\/gantovnik.com\/bio-tips\/category\/awk\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":886,"url":"https:\/\/gantovnik.com\/bio-tips\/2021\/02\/157-formatting-output-in-awk\/","url_meta":{"origin":867,"position":1},"title":"#157 Formatting output in awk","author":"gantovnik","date":"2021-02-09","format":false,"excerpt":"#157 Formatting output in awk Without formatting, the command look like this [code language=\"python\"] awk ' BEGIN {FS=\":\"} { print $1,$3,$7 } ' \/etc\/passwd [\/code] The same command with formatting [code language=\"python\"] awk ' BEGIN {FS=\":\"} { printf \"%10s %4d %17s\\n\",$1,$3,$7 } ' \/etc\/passwd [\/code] With the header information added\u2026","rel":"","context":"In &quot;awk&quot;","block_context":{"text":"awk","link":"https:\/\/gantovnik.com\/bio-tips\/category\/awk\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":892,"url":"https:\/\/gantovnik.com\/bio-tips\/2021\/02\/159-awk-control-file\/","url_meta":{"origin":867,"position":2},"title":"#159 Awk control file","author":"gantovnik","date":"2021-02-09","format":false,"excerpt":"#159 Awk control file Create awk control file func1.awk [code language=\"python\"] function green(s) { printf \"\\033[1;32m\" s \"\\033[0m\\n\" } BEGIN { FS=\":\" green(\" Name: UID: Shell:\") } { printf \"%10s %4d %17s\\n\",$1,$3,$7 } [\/code] Command to run awk control file is [code language=\"python\"] awk -f func1.awk \/etc\/passwd [\/code]","rel":"","context":"In &quot;awk&quot;","block_context":{"text":"awk","link":"https:\/\/gantovnik.com\/bio-tips\/category\/awk\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":613,"url":"https:\/\/gantovnik.com\/bio-tips\/2020\/09\/93-summing-values-of-a-column-using-awk-command\/","url_meta":{"origin":867,"position":3},"title":"#93 Summing values of a column using awk","author":"gantovnik","date":"2020-09-09","format":false,"excerpt":"#93 Summing values of a column using awk command Assume we have \"test1.txt\" file with data in columns: [code language=\"python\"] a,a,aa,1 a,a,aa,2 d,d,dd,7 d,d,dd,9 d,dd,d,0 d,d,dd,23 d,d,dd,152 d,d,dd,7 d,d,dd,5 f2,f2,f2,5.5 [\/code] Save the following awk script in the file \"ex93.txt\" [code language=\"python\"] #The -F',' tells awk that the field separator\u2026","rel":"","context":"In &quot;awk&quot;","block_context":{"text":"awk","link":"https:\/\/gantovnik.com\/bio-tips\/category\/awk\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":842,"url":"https:\/\/gantovnik.com\/bio-tips\/2021\/01\/148-print-lines-containing-words-bar-gap-quad-and-count-number-of-lines-in-each-group-using-awk\/","url_meta":{"origin":867,"position":4},"title":"#148 Print lines containing words &#8220;BAR&#8221;, &#8220;GAP&#8221;, &#8220;QUAD&#8221; and count number of lines in each group using awk","author":"gantovnik","date":"2021-01-15","format":false,"excerpt":"#148 Print lines containing words \"BAR\", \"GAP\", \"QUAD\" and count number of lines in each group using awk Save the following awk script in the file \"ex148.awk\" [code language=\"python\"] #!\/bin\/awk -f \/GAP\/{print;n_gap++} \/BAR\/{print;n_bar++} \/QUAD\/{print;n_quad++} END { printf \"n_gap=%i\\n\",n_gap; printf \"n_bar=%i\\n\",n_bar;printf \"n_quad=%i\\n\",n_quad } [\/code] Run file \"ex148run.txt\" [code language=\"python\"] awk -f\u2026","rel":"","context":"In &quot;awk&quot;","block_context":{"text":"awk","link":"https:\/\/gantovnik.com\/bio-tips\/category\/awk\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":616,"url":"https:\/\/gantovnik.com\/bio-tips\/2020\/09\/94-average-of-values-of-a-column-using-awk\/","url_meta":{"origin":867,"position":5},"title":"#94 Average of  values of a column using awk","author":"gantovnik","date":"2020-09-09","format":false,"excerpt":"#94 Average of values of a column using awk command Assume we have \"test1.txt\" file with data in columns: [code language=\"python\"] a,a,aa,1 a,a,aa,2 d,d,dd,7 d,d,dd,9 d,dd,d,0 d,d,dd,23 d,d,dd,152 d,d,dd,7 d,d,dd,5 f2,f2,f2,5.5 [\/code] Save the following awk script in the file \"ex94.awk\" [code language=\"python\"] #!\/bin\/awk -f { sum += $4 }\u2026","rel":"","context":"In &quot;awk&quot;","block_context":{"text":"awk","link":"https:\/\/gantovnik.com\/bio-tips\/category\/awk\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/posts\/867","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/comments?post=867"}],"version-history":[{"count":0,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/posts\/867\/revisions"}],"wp:attachment":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/media?parent=867"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/categories?post=867"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/tags?post=867"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}