Shell programming II
Pipelines
The core unix bash programming uses streams
to manipulate text data. Since in
unix everything is a file. your data are files. Your directories are files.
Your hard drive is a file. Linux choose this mechanism to allows data to
move from one file to another. The notion of stream is literally what is
sounds like: a little river of bits pouring from one file into another.
The standard streams
Within the Unix world is a general convention that each file is connected to at least three streams.
-
Standard in (stdin): The standard stream for input into a file.
-
Standard out(stdout): The standard stream for output of a file.
-
Standard error (stderr): The standard stream for error output file.
We can use those streams by the following operators:
- process > data file: redirect the output of a process to the data.
#!/usr/local/bash
echo Hello
./process > out
- process » data file: redirect the output at the end(append) of a process to the data.
#!/usr/local/bash
echo Hello
./process >> out
./process >> out # now out has (Hello\n Hello)
- process < data file read the contents from the data file.
more file.txt # show the content of the file
'Hello'
cat < file.txt # Will show the text hello
Try to search how to redirect the to standard error?
Pipelines
Pipelines, often called pipes, is a way to chain commands and connect output
from one command to the input of the next. A pipeline is represented by the
pipe character |
. It’s particularly handy when a complex or long input is
required for a command.
command1 | command2
By default pipelines redirects only the standard output. if you want to include
the standard error you need to use the form |&
which short hand for 2>&1 |
.
Example
Imagine you quickly want to know the number of entries in a directory, you can
use a pipe to redirect the output of the ls
command to the wc
command with
option -l
.
ls | wc -l
Or least imagine, you want to see only the first 15 results.
ls | head -n 15
Exercise
In this exercise, you need to print the number of processors based on the
information in the cpuinof file (/proc/cpuinfo/
).
Hint you can chain more than two commands.
grep
A regular expression, often referred to as regex, is a description of a pattern of text.
-
There are countless applications of regular expressions.
- Search of text in a given file.
- Search/Replace text in a file.
-
Virtually all programming languages implement regular expressin.
- For example in
cpp
, the library regex contains all the tools for defining and searching a regular expression.
- For example in
Simple regex
Let’s imagine that we have the file candies.txt
with the following contents.
candies.txt
Twix
Sweet Tarts
Chocolate
Almond Joy
Jolly Ranchers
Kit Kat.
Dark chocolate
Lets search for the name Chocolate in this file.
grep -E "Chocoloate" candies.txt
- grep is case_sensitive.
- Remark that the command print the full line.
How about if we want to make the search case insensitive.
grep -Ei "Chocoloate" candies.txt
Let’s imagine that we search for line that contain
- Any character followed by an a.
grep -E ".a" candies.txt
The character . matches anything.
Now we want a candy that start by K.
grep -E "^K" candies.txt
The character ^ matches the beginning of a line.
Imagine that we want to match a T but the in the start of a word.
grep -E "\<K" candies.txt
A word is considered as a string of characters consisting of letter, numbers, and underscores.
And for the end of a word.
grep -E "t\>" candies.txt
Here is a simple table for special character in regular expression.
- . : any character.
- ^ : start of line
- $: end of line
- <: start of a word.
- >: end of a word.
- **: Escape character.
We could search for multiples match using the or operator.
grep -E "Twix|Tarts" candies.txt
Be careful of leading spaces
What will be the result of this regular expression:
grep -E "(e|a)t" candies.txt
Quantifier.
If we want to match an instance that we don’t know the number of occurrences, we could use quantifiers to simplify our search.
grep -E "e*t" candies.txt
The operator * means zero or more of the previous pattern.
grep -E "e+t" candies.txt
The operator + means one or more of the previous pattern.
grep -E "r?t" candies.txt
The operator ? means zero or one or more of the previous pattern.
What will be the result of the command:
grep -E "(es)+" candies.txt
Groups
If we want to specify a group or of characters. we use the [].
grep -E "[abc]" candies.txt
the [abc] = (a b c)
We can use the - to specify ranges.
grep -E "[a-z]" candies.txt
[a-z]
is all lowercase letters.[A-Z]
is all uppercase letters.[0-9]
is all digits.[a-zA-Z]
is all letters.
How about if we want to match anything but a set of given pattern. for that we use the ^ character.
grep -E "[^ao]" candies.txt
[^ao] matches anything that doesnt contain the ao.
We also can use the {} to specify the number of repetition of a pattern.
grep -E "[0-9]{2}" passwords.txt
- {x}: matches exactly x times.
- {x,}: matches x or more.
- {,x}: matches x or fewer.
- {x,y} : matches a number between x and y.
Backreference
Imagine that we wants to search for any succession of any two characters.
grep -E "(..)\1" passwords.txt
The \1 special character is a back reference for the previous match.