[]RSS

About Archives Artwork Comic Contact Philosophy Projects Tags

QOTW: Regex pain

[Comment]

April 29th, 2007 in Quotes

Regex is latin for “saw off thine limbs” Delvan (via bash.org)

String matching engines compared

[Comment]

January 24th, 2007 in Links

A comparison of the Perl 5 regex and the Thomson NFA string matching engines. Note that the NFA matching is measured in microseconds (and the regex in milliseconds).

Google’s code repo

[Comment]

October 5th, 2006 in Links

Google code search, a full-text (with regex) search of their new public project repository.

Use it or lose it

[Comment]

April 20th, 2006 in Programming. Weblog

I’m forgetting my already.

I’ve been burried in and land for a few months now, and today one of our younger developers asked me a simple Perl/Regex question. I was stumped. I misplaced my memories on the differences between regex in Perl, Php, Grep, and others (they were somehow all jumbly in my head). I happened upon an answer; it stumbled out of my mind, and only by chance was it correct. I realized that I was forgetting Perl, one of my favorite languages.

Luckily forgetting something you know isn’t at all like not knowing something you should, and with some practice my skill popped back to the surface.

Tonight I whipped up a few scripts to extract several dozen constants from some C++ sources, to generate some SQL statements. At first my Perl memories trickled back into view (I confused a few small things like C++’s continue for Perl’s next), but eventually it all came back and I was able to hack several dozen lines of useful script.

I have to remember to take time to practice with the tools I of the trade, otherwise they may start to disappear from my limiting human consciousness.

Regex grapher

[Comment]

March 1st, 2006 in Links

A very cool regular expression automata visualizer that graphs both the non-deterministic and deterministic finite-state automata for a given regex. Not only does the tool graph the regex, it animates the execution of the regex as you type a string for it to match.

Regex of the day: Optional HTML tags

[Comment]

November 11th, 2005 in Perl. Weblog

It’s one of those -laden days, and I’m really starting to more complicated expressions:

^(?:\s+|)<(\w+)(?:\s+|)((?:.*?|))>(.*?)(?:<\/(.*?)>|)(?:\s+|)$

This expression parses a line that contains tags based on the following logic, expecting that:

  1. There will be a start tag near the beginning of the line, possibly padded on the left with spaces that are ignored
  2. The opening tag may contain some HTML parameters
  3. There may be a closing tag on the line
  4. There may be spaces on the right of the closing tag that will be ignored

The expression will parse the following example into 4 parts:

<h1 id="test">This is a test</h1>
  1. h1
  2. id=”test”
  3. This is a test
  4. h1

Learning regex to the point of being able to write complex expressions has taken a couple of years, but has been well worth the effort. To define the same parsing logic in C or C++ (using standard mechanisms) would take 20-30 minutes, and would occupy a page of code. You just have to remember that a regex is a small script, and that it should be tested (and documented) like one.

Regex is like a lot of little languages too (like SQL, bash, m4). It’s terribly useful, succinct, and worth having in your toolkit. It’s not something to hide in layers of abstraction either, rather it’s something that deserves use alongside your ‘real’ tools. I find that developers are in the habit of hiding (or hiding from) little languages, something that results in the too-many-elbows syndrome: insulating yourself from the real power of your tools, making things more complicated in the process.

Simple, in the end, is in the knowledge of the beholder. If you understand regex, code that contains it can be simpler.

Mining For Expressions

[Comment]

November 11th, 2005 in Perl. Weblog

I always forget a few things about . Regexes are a large specification, I only ever use a portion of it at any one time, and I’m forgetful (there’s just too much to remember all at once).

Luckily, tools allow me to easily find nuggets of previous knowledge. Today’s exapmle:

find -name "*pl" -exec grep =~ {} \;
          1              2  3  4
  1. Look through all my old Perl scripts
  2. Search in the scripts
  3. Look for lines with regular expressions
  4. In the files we found

In the five seconds it takes me to type the find command , I have a full list of thousands of regular expressions. I can forget most of what I know, as long as what I’ve done is stored in text on a system with good text tools.