[]RSS

About Archives Artwork Comic Contact Philosophy Projects Tags

A new Javascript HTML parser

[Comment]

May 6th, 2008 in Links

A pure Javascript HTML parser, by the author of jQuerry.

Yet another compiler compiler

[Comment]

March 25th, 2008 in Links

The ACCENT compiler compiler has fewer grammar expression restrictions than standard generators, and supports EBNF.

Another BNF converter

[Comment]

March 17th, 2008 in Links

Here’s another feature-rich BNF converter, able to generate multiple lexers, parsers, documentation, and even pretty-printers.

Parsing Perl impossible?

[Comment]

January 28th, 2008 in Links

Perl Cannot Be Parsed: A Formal Proof, a proof of the impossible complexity of Perl’s syntax. It’s a deep, flexible grammar, that can be extended by its delegates.

A Python Interactive Fiction engine

[Comment]

November 7th, 2007 in Links

BUS is a engine written in Python. It includes a basic world model and NL parser.

Perl6, recursive

[Comment]

February 14th, 2007 in Links

A Perl6 parser written in Perl6 (coded by Mr. Larry Wall himself).

The Lemon Parser Generator

[Comment]

February 4th, 2006 in Links

A Lemon parser generator tutorial, which is the parser generator used by . See the Lemon parser home to learn more.

Regex of the day: Optional HTML tags

[Comment]

November 11th, 2005 in Perl. Weblog

It’s one of those -laden days, and I’m really starting to more complicated expressions:

^(?:\s+|)<(\w+)(?:\s+|)((?:.*?|))>(.*?)(?:<\/(.*?)>|)(?:\s+|)$

This expression parses a line that contains tags based on the following logic, expecting that:

  1. There will be a start tag near the beginning of the line, possibly padded on the left with spaces that are ignored
  2. The opening tag may contain some HTML parameters
  3. There may be a closing tag on the line
  4. There may be spaces on the right of the closing tag that will be ignored

The expression will parse the following example into 4 parts:

<h1 id="test">This is a test</h1>
  1. h1
  2. id=”test”
  3. This is a test
  4. h1

Learning regex to the point of being able to write complex expressions has taken a couple of years, but has been well worth the effort. To define the same parsing logic in C or C++ (using standard mechanisms) would take 20-30 minutes, and would occupy a page of code. You just have to remember that a regex is a small script, and that it should be tested (and documented) like one.

Regex is like a lot of little languages too (like SQL, bash, m4). It’s terribly useful, succinct, and worth having in your toolkit. It’s not something to hide in layers of abstraction either, rather it’s something that deserves use alongside your ‘real’ tools. I find that developers are in the habit of hiding (or hiding from) little languages, something that results in the too-many-elbows syndrome: insulating yourself from the real power of your tools, making things more complicated in the process.

Simple, in the end, is in the knowledge of the beholder. If you understand regex, code that contains it can be simpler.

SimpleLink .04 released

[Comment]

May 1st, 2005 in Projects. WP-Plugins. Weblog

A few more improvements for my SimpleLink

  • Fixed _ parsing bug ([Markdown] was munging underscores, now plays nicer with Markdown)
  • Added open links in new window for pkzip.

You can download wp-simplelink-0.4.tar.gz, or visit the SimpleLink page for more details.