dreamsys software

UNIX & Linux Shell Scripting Tutorial

Searching & Sorting

grep

The unix grep command is a simple but powerful tool for using in your scripts. You can use it to search in files for certain words or even search by regular expression. Before we get into using these tools, let's define a file that we can manipulate. Create a file on your filesystem with the following contents:

root 192.168.1.1 10/11/2005 /usr/local/bin/one_app
root 192.168.1.1 10/12/2005 /usr/local/bin/two_app
root 192.168.1.1 10/12/2005 /var/logs/system.log
root 192.168.1.2 10/13/2005 /var/logs/approot.log
user1 192.168.1.3 10/13/2005 /usr/local/bin/one_app
user1 192.168.1.3 10/13/2005 /usr/local/src/file.c
user1 192.168.1.3 10/14/2005 /var/logs/system.log
user2 192.168.1.4 10/14/2005 /var/logs/approot.log
user2 192.168.1.5 10/15/2005 /usr/local/bin/two_app
user2 192.168.1.5 10/15/2005 /usr/local/bin/two_app

Save this file as "testfile". The file is an access log for the file system. It has four fields separated by spaces: user, ip address, date and filename. Files such as this can get very large and hard to find things when using just a file editor. Suppose that we wanted to find all rows that for the file "/usr/local/bin/one_app". To do this, we would use the following command and get the following results:

$ grep "/usr/local/bin/one_app" testfile
root 192.168.1.1 10/11/2005 /usr/local/bin/one_app
user1 192.168.1.3 10/13/2005 /usr/local/bin/one_app

This makes the file much easier to search through. You can also redirect the output of a command by using > filename after the command. For instance, let's say that we want to find all rows for the user name "root" and redirect it to a file. If we simply did a grep for "root", we would also pick up any rows that are for access to the file "/var/logs/approot.log". What we really want to do is find any line that starts with "root". To do this, we will use the regular expression character ^, which means "starts with". We will call this command and have it redirect the output to the file "output.txt", then we will use the unix cat command to display the output.txt file.

$ grep "^root" testfile > output.txt
$ cat output.txt
root 192.168.1.1 10/11/2005 /usr/local/bin/one_app
root 192.168.1.1 10/12/2005 /usr/local/bin/two_app
root 192.168.1.1 10/12/2005 /var/logs/system.log
root 192.168.1.2 10/13/2005 /var/logs/approot.log

awk

If you only want to view certain fields in the file, you will want to use the unix awk command. If you are using linux, this will most likely be called gawk, but for this tutorial I will use the unix name for the command. Let's say that we want to see all files that were accessed that end with the text "_app". We don't want to see the whole rows, we only want to see column number 4 in the file (the filename). In order to do this, we will need to use both grep and awk, then we will need to pipe the output from one command to the other. To find a line that ends with a certain text, we use the regular expression character $ at the end of the text. See the following example:

$ grep "_app$" testfile | awk '{print $4}'
/usr/local/bin/one_app
/usr/local/bin/two_app
/usr/local/bin/one_app
/usr/local/bin/two_app
/usr/local/bin/two_app

We use the | character to pipe the output of the grep command to the input of the awk command. To print the fourth column of the input data, we give awk the script contents '{print $4}'. If we want to display the user name as well, we can print column $1, but we must separate the two columns with a comma.

$ grep "_app$" testfile | awk '{print $1, $4}'
root /usr/local/bin/one_app
root /usr/local/bin/two_app
user1 /usr/local/bin/one_app
user2 /usr/local/bin/two_app
user2 /usr/local/bin/two_app

sort

Another common need in scripts is the ability to sort input. Luckily unix has a sort command. All you need to do is pipe your output to the sort command and it will be sorted. If you want to see all files (column 4) in the file and you want them sorted, use the following command:

$ awk '{print $4}' testfile | sort
/usr/local/bin/one_app
/usr/local/bin/one_app
/usr/local/bin/two_app
/usr/local/bin/two_app
/usr/local/bin/two_app
/usr/local/src/file.c
/var/logs/approot.log
/var/logs/approot.log
/var/logs/system.log
/var/logs/system.log

Another common need related to sorting is to get only unique items. The unix sort command has a flag -u that tells it to only display unique items. Let's use this command to only see unique file names.

$ awk '{print $4}' testfile | sort -u
/usr/local/bin/one_app
/usr/local/bin/two_app
/usr/local/src/file.c
/var/logs/approot.log
/var/logs/system.log

Using these commands in a script

All of these commands can be used inside of your scripts and can make for a very powerful toolset for developing programs in unix. For an example of using these commands in a script, let's write a script that uses our current data file. The script will get all users that are in the file and will then display how many files that user accessed. We will also have the script get all files in the file and display how many times each file was accessed. There are much more efficient ways of doing these specific functions, but for this example we will do it to better show how you can use these commands in a script.

#!/bin/sh

# First let's get the list of unique users:
USERS=`awk '{print $1}' testfile | sort -u`

echo "Users:"

# Now loop through each user.
for USER in $USERS
do
	# Get the number of lines that start with the user name.
	NUM=`grep -c "^$USER" testfile`

	echo " - $USER: $NUM files accessed."
done

# Now let's get the list of unique files:
FILES=`awk '{print $4}' testfile | sort -u`

echo ""
echo "Files:"

# And loop through each file.
for FILE in $FILES
do
	# Get the number of lines that end with the file name.
	NUM=`grep -c "$FILE$" testfile`

	echo " - $FILE: $NUM accesses."
done

Notice that we use the command line parameter -c for grep, this returns a row count instead of a list of rows. Another thing to notice is that we are reading the whole file 2 + (num_users * 2) times. In our case, that's 8 times. A smarter program would be able to read the file once and get the data it needs to parse out, because I/O calls on a system (such as reading a file) are always slower than reading from memory (variables).

Now let's save the script as test.sh and run it. We get the following output:

$ ./test.sh
Users:
 - root: 4 files accessed.
 - user1: 3 files accessed.
 - user2: 3 files accessed.

Files:
 - /usr/local/bin/one_app: 2 accesses.
 - /usr/local/bin/two_app: 3 accesses.
 - /usr/local/src/file.c: 1 accesses.
 - /var/logs/approot.log: 2 accesses.
 - /var/logs/system.log: 2 accesses.

Prev (Reading & Writing Files) | Next (Advanced)


Blog Entries
Blob Entry 1
Blob Entry 2
Blob Entry 3
Blob Entry 4
Blob Entry 5
Blob Entry 6