We have talked about quite a few things in our CnC articles so far. However, there is one that even I struggle with sometimes, mainly because I don’t get to use it often. So as a refresher, let’s talk about grep. As an Ubuntu newbie, it will not be high on your priority list; however, it is something you will use if you ever work in an environment that is Linux-based. Even if you never do, it is a rather weird thing to master. When you use it, to untrained eyes, it will seem like magic. Grep is rather basic, but if you looked at the man page, you may have seen: “grep, egrep, fgrep, rgrep - print lines that match patterns” The command egrep is the same as grep -E as far as I know... and does not mean ‘extended grep’ as I heard one of my co-workers name it. Rather it is extended regular expressions, but in essence, he is also right, as it is an extension of the regular grep we looked at in the last issue. (My proofreader gave me carrots about this, but I left it as is, as I’d like your opinions on the matter – you know where to post ‘em: misc@fullcirclemagazine.org - I’m not starting a flame war, I’m just gathering points of view!) The same is true for the other names you see in the man page. It is just an easier way to type them. It is easier to type fgrep instead of grep -F As a newbie, I want you to think of egrep as “understanding more”, when you fire off your query. So far, we did very basic queries and we will work our way up the chain, don’t worry! However, I want us to step back here, just for a moment. If you don’t know what regular expressions are, grep is going to stay very basic. I mean, the name literally means (G)lobal (RE)gular ex(P)ressions. If you get that, then grep really shines when you take advantage of these regular expressions. Special characters let you match wider patterns than we did before. For example, you can find lines that contain a number "[0-9]", or maybe start with a capital letter "^[A-Z]", or end with a full stop "\.$". That last one needed the “\” as an escape character, as grep incorporates things from the shell. If I needed an exclamation mark in my query, I’d need to put an escape character before it, as the “!” has a special meaning in the shell. However, instead of having like ten escape characters in a line, it's a lot easier to enclose the entire pattern within single or double-quotes (depending on your purpose). Then we can take it up another notch with PERL regular expressions, but that is beyond the scope of a newbie introduction. I want you to understand where things come from so you don’t just copy/paste queries you find on the internet, but make your own. So let’s quickly talk about these “special” characters. The full stop or dot (.) matches any single character. For example, to match lines that contain the text "f.m": grep "f.m" This will find “fcm” in our imaginary file, but it will also find “fhm” (wrong magazine!!) if it was present. Obviously it’s not going to find a newline in there, so the exception to the rule would be a newline. However, if I wanted to find f.c.m, I’d need to add the escape character as now I want it (the dot) specifically. So “\.”, not just “.” – is the difference clear? The other wildcard is the asterisk (*). So my query: grep "jav*" - could find “java” as well as “javascript”. The asterisk (star) means any characters. You need to use this wisely, as it can *really slow down simple searches and should be avoided in queries that contain random letter combinations, like say 10000000 UUIDs or millions of transaction numbers. In my ramblings above, you saw me use “[0-9]”; yep, we will quickly touch on the square brackets. The 0-9 is what’s known as a “range”, but you can be specific and search for [321], for instance, and as you saw, this does not apply to only numbers, [a-z] is just as valid! We can even lump them all in: [a-zA-Z0-9] – now we are searching for alphanumerics. There is a rather odd one that everyone I know forgets about, and that is the “+” plus sign. If I were to say, query: egrep "is+" looking for “issue” (random I know, but it is just to illustrate a point). This will match "is", "iss", "isssss", and so forth. This differs from the question mark (?) as you want at least one match. The question mark (?) matches zero or one occurrence of the previous character. Meaning, it may be skipped. If I were looking for “fcm” and I mistakenly used the question mark in my query, like so: egrep "f?m" , the result “fm” is valid. Is the difference clear? Now one of the things you learn early on in Linux is to “pipe” (|) one command’s output to another, but it is also the symbol used for “or” in some languages. Some double up and you may see “|| and &&” - “or” and “and”. This principle is also valid in a query; egrep "bob|alice" - now we are searching for bob or alice. (Yes, don’t ask why the cryptography lecture is now incorporated here, but it is what it is). So if we were to repeat our mistake above, egrep "f?m|F?M" what are some of the possible outcomes? Answers in a e-mail on misc@fullcirclemagazine.org So remember that the pipe symbol is also known as the OR operator. It matches the entire expression before *or after the pipe, not just the character before it. Well, that’s my time again in the magazine, so we will continue this in the next issue, keep practising!