[]RSS

About Archives Artwork Comic Contact Philosophy Projects Tags

Regex of the day: Optional HTML tags

[Comment]

November 11th, 2005 in Perl. Weblog

It’s one of those -laden days, and I’m really starting to more complicated expressions:

^(?:\s+|)<(\w+)(?:\s+|)((?:.*?|))>(.*?)(?:<\/(.*?)>|)(?:\s+|)$

This expression parses a line that contains tags based on the following logic, expecting that:

  1. There will be a start tag near the beginning of the line, possibly padded on the left with spaces that are ignored
  2. The opening tag may contain some HTML parameters
  3. There may be a closing tag on the line
  4. There may be spaces on the right of the closing tag that will be ignored

The expression will parse the following example into 4 parts:

<h1 id="test">This is a test</h1>
  1. h1
  2. id=”test”
  3. This is a test
  4. h1

Learning regex to the point of being able to write complex expressions has taken a couple of years, but has been well worth the effort. To define the same parsing logic in C or C++ (using standard mechanisms) would take 20-30 minutes, and would occupy a page of code. You just have to remember that a regex is a small script, and that it should be tested (and documented) like one.

Regex is like a lot of little languages too (like SQL, bash, m4). It’s terribly useful, succinct, and worth having in your toolkit. It’s not something to hide in layers of abstraction either, rather it’s something that deserves use alongside your ‘real’ tools. I find that developers are in the habit of hiding (or hiding from) little languages, something that results in the too-many-elbows syndrome: insulating yourself from the real power of your tools, making things more complicated in the process.

Simple, in the end, is in the knowledge of the beholder. If you understand regex, code that contains it can be simpler.

Mining For Expressions

[Comment]

November 11th, 2005 in Perl. Weblog

I always forget a few things about . Regexes are a large specification, I only ever use a portion of it at any one time, and I’m forgetful (there’s just too much to remember all at once).

Luckily, tools allow me to easily find nuggets of previous knowledge. Today’s exapmle:

find -name "*pl" -exec grep =~ {} \;
          1              2  3  4
  1. Look through all my old Perl scripts
  2. Search in the scripts
  3. Look for lines with regular expressions
  4. In the files we found

In the five seconds it takes me to type the find command , I have a full list of thousands of regular expressions. I can forget most of what I know, as long as what I’ve done is stored in text on a system with good text tools.

Replace in files

[Comment]

December 1st, 2004 in Perl. Weblog

A great one-liner:

perl -pi -e 's@foo@bar@g' *txt

This replaces instances of foo with bar in text files found in the current directory.

The -pi option is a shortcut for emulating the behaviour of tools like sed. It wraps your code in a while loop, processing the filenames passed on the command line. For more details, see the man perlrun page.

Calling Perl from Php

[Comment]

May 20th, 2004 in Howto. Perl

I’m working on extending Textpattern with some scripts I developed for Blosxom. Textpattern is a Php-based weblogging tool, and my scripts are all Perl-based plugins and command-line utilities. I don’t really want to port the scripts to Php, so I decided to find a way to call Perl from Php.

In a few minutes of searching, I only found one Perl binding for Php. It isn’t considered stable, and it isn’t available from my web host (Dreamhost). I did find an answer in the Php manual, but a googling on the specific topic of calling Perl from Php came up dry. So I decided to make it a bit more obvious.

Pipe dreams

All of my scrips are simple text processors, so the inputs and outputs can be passed using the stdin/stdout pipes. Using both the input and output pipes of a process is a bit more than the standard Php exec functions can handle, so we’ll be using the proc function family. Nearly every language has a set of these functions (and that’s a good thing).

You can download the demo script here. The script will need to be executable, it needs the correct path to the Perl interpreter, and the log file folder needs to be writable by the web-server process.

The Perl script reads text from stdin, and replaces spaces with underscores. From Php, we can call the Perl script and manage the standard in, out, and error pipes. The example is slightly modified from the Php manual.

The basic process:

  1. Define what to do with the process’s pipes. Notice that the error pipe is mapped to a log file (which is appended on each call).
  2. Define some text to test with the Perl script.
  3. Open the process, which is our Perl script. This will fail if the script can’t be found, or if it isn’t executable.
  4. Write the test text to the input pipe of the Perl script.
  5. Read the output of the Perl script.
  6. Close the process. This is best done after all of the pipes are closed (otherwise it causes deadlock).

The PHP script:

< ?php
$handles = array(                             // 
   0 => array("pipe", "r"),                   // stdin 
   1 => array("pipe", "w"),                   // stdout 
   2 => array("file", "test-errors.txt", "a") // stderr 
);  
$test_text = "This is a test";
    
$process = proc_open("./test.pl", $handles, $pipes);
if (is_resource($process)) {
    fwrite($pipes[0], "$test_text");
    fclose($pipes[0]);    
    while (!feof($pipes[1])) {
        $output .= fgets($pipes[1], 1024);
    }
    fclose($pipes[1]);    
    $r = proc_close($process);    
    echo "Before: $test_text<br />";
    echo "After : $outputn<br />";
}
?>

And the perl script:

#!/usr/local/bin/perl

# replace spaces with _s in stdin
while(<>) {
    s/ /_/g;
    print;
}