Full Circle Magazine FR

Ceci est une ancienne révision du document !

Last month I received an email from John, a reader of C&C. He had turned to me for advice on using Sed to insert semi-colons within the text file created by Task Warrior. The reason he wanted to do this was to use the conkytext script to format the To-Do list nicely for his Conky. Included in the email was the file as created by Task Warrior. We then spent a couple of days putting together a functioning Sed script (and going through a few format changes), and the end result was an excellent basis for an article. Hopefully by the end of this article, the reader will have an idea how to approach Sed expressions in order to tackle tasks that may at first seem complex.

The Task

We want to add a semi-colon after the contents of every column (in the text shown top right, ignoring the white space). As you can imagine, the fact that the number of spaces vary can make this a difficult task. Also, the last line (tasks) is supposed to be preceded by three semi-colons (“;;;10 tasks”). After our first attempt, John came back to me and told me he'd decided to leave the first column semi-colon-less (shown above).

My Script

Due to the fact that the script is rather long, as it offers extra functionality (supports some arguments, outputting to a file, etc), I've put it up on pastebin: http://pastebin.com/SHTVjDTM.

The Thought Process

There are a few things worth noting before we begin: • The typical format of a sed command is: sed s/<search>/<replace>/g Sed calls replace “substitute”, hence the s at the beginning. The left hand side (LHS) is the search section – here you declare what it is you're trying to match. The right hand side (RHS) is the replace section – here you tell Sed what the found line should look like afterwards. The “g” at the end tells Sed to replace all instances (as it would otherwise quit after the first match). • Putting anything in \(\) will allow you to refer back to it on the RHS of the expression. There are certain special characters that can be used in sed. We mainly need the “\s” expression, which stands for any space. • Declaring a set number of repetitions can be done with: \{3\} for 3 repetitions, \{3,\} for three or more, and \{3,6\} for three to six. • You must escape the semi-colon.

Some tips as to how I decide on each expression: • Figure out where you need to insert the character, as that defines where you group (in our case before the spaces, hence the second group is almost always started before the space character) • Work bit-by-bit. Start with a simple sed command like: sed -e “s/^[0-9]*/FC/g” (FC for first column). This just matches any line started with a number, and replaces it with “FC”, so you can visually check what is being matched. Doing so let me realize that all single-digit ID's started with a space, and helped form an expression for it. It's not included in the actual file, since our end formatting has changed since then. Once you have a working command for the task you outlined, you can move onto a second expression. • If you have issues with step 2 because you can't get the regular expressions working, try using grep and the same regular expression. This lets you rule out the expression itself being wrong, and indicates it's a quirk of Sed's you haven't accounted for yet. • If you want the same formatting at the end, the RHS of the expression should almost always be the same, and if it isn't, it's an indicator that you're either going too complicated, or the chunk you're working on is too big, so try to break it down some more.

The Expressions

first_expression=“s/\([a-zA-Z0-9]\)\(\s\{2,15\}\)/\1\;\2/g” second_expression=“s/\([0-9]\{3\}\)\(\s[a-zA-Z0-9]\)/\1\;\2/g”

third_expression=“s/\([a-zA-Z]\)\(\s[0-9]\{1,2\}\/\)/\1\;\2/g”

fourth_expression=“s/\(^[0-9]*\stasks\)/\;\;\;\1/g”

fifth_expression=“s/\(^[A-Z]*\)\(\s*[a-zA-Z]\)/\1\;\2/g” # Check for any number of capital letters at the start of a line, followed by a space and more text, and insert a semicolon.

The explanations

The first expression tells Sed “Look for any character (a-z, A-Z, or 0-9), and see if it's followed by 2 or more spaces, then add a semi-colon before the spaces”. The trick to this is knowing that Sed can group matches to the regular expressions. This is why we have escaped brackets around the expressions. “\(a-zA-Z0-9]\)” then becomes match “\1” in the replacement section of Sed. We are essentially forming two groups – the character that precedes the spaces, and the spaces themselves. Then, in the replacement step, we're inserting a semi-colon between the two groups. This corresponds to column 2 and column 4 in our file, as well as all the headers except ID. The reason why ID isn't included is due to the fact that we state 2 or more spaces, and changing that to one or more would cause issues in all the descriptions. Note: The semi-colon must be escaped (have a backslash in front of it). Also, if you want to match more than 15 spaces, simply leave that side of the comma empty - \{2,\}.

The second expression tells Sed “Look for any 3 consecutive digits that are followed by a space and a letter or number, then insert a semi-colon”. What this matches is the date – the format of the date is always going to be so long that only one space is inserted between columns. Naturally, you could check for any number of spaces, but that could cause issues if you use numbers in your Projects. This will apply to any format of date where the year is at the end. This handles column 3 in our file.

The third expression can be translated as “Find all letters followed by a 1 or 2 digit number, followed by a slash, and insert the semi-colon.” The only column that contains a slash is our formatted date column – this applies therefore to the column before it (Project). The reason why I didn't include numbers in this case, is because the second expression could handle this if you tell Sed to accept any number of spaces after the 3 digits. This handles column 2 in our file.

The fourth expression handles the last line of the file, and inserting the 3 semi-colons before tasks. It essentially groups the entire line (10 tasks) and then inserts three semi-colons before that group. If you're adding semi-colons before any lines starting with numbers, then you should move this expression to the start of the list of expressions, so Sed doesn't match it.

The fifth expression simply states “Find the line that starts with any number of capital letters, and insert a space afterwards”. I go a little more specific, and state “followed by any number of spaces and more letters”. However, it's not necessary in our example, and is simply there to be a bit more robust.

That about covers the steps I undertook in this scenario. I realize that this is a relatively specific occasion, and not everyone will want to have this exact formatting. My hope is that following my process will help you understand how to approach these sorts of problems. If it's wished for, I can spend an article focusing on short formatting problems, and working through it step by step. If anyone is interested in that sort of article, please let me know via email. As always, any questions/concerns or requests can be directed to me at lswest34+fcm@gmail.com.