In Praise of Perl

Gene Wilburn
5 min readMay 30, 2023
Mascot of the Perl Programming Language

In Praise of Perl

Programming languages rise and fall in popularity. For awhile everyone is keen on C, then C++, followed by languages such as Golang, Rust, Haskell, Python, or [fill in the blank with your favourite language]. Their popularity is frequently driven by need — the desire to find the best tools for highly complex programming projects.

Some languages fall by the wayside. Few programmers still use Algol, Fortran, COBOL, Basic, or Pascal. Though they were widely used in their day, they now appear mainly in legacy programs.

And some languages, while no longer as widely used, still persist among those who appreciate their special qualities. These include Common Lisp and, my favourite, Perl.

What is Perl?

Perl is an interpreted general purpose language similar in some ways to Python. It began life as scripting language, which it still excels at. It doesn’t force any particular programming paradigm on the developer and is well known for its motto: “There’s more than one way to do it.”

Created by Larry Wall in 1987, Perl has, from the start, been used for system scripting and text handling. With a C-like syntax, it combined elements of sed, awk, and shell scripting. It has become known for its extensive and elegant support of regular expressions, also called “regex” and “re”.

Its many built-in structures, such as support for arrays and “associative arrays” (similar to Python “dictionaries”), is augmented by a very large selection of libraries or modules (PerlMods) that can be used for specialized applications, such as reading and traversing hierarchical data structures such as XML or json files. There are thousands of specialized modules available at CPAN, the Comprehensive Perl Archive Network.

Of course Perl supports all widely used database products such as MySQL and Postgres, and in the early days of web development, many interactive websites were developed in PerlCGI.

Practical Programming

Not all programs are big, though. Little programs, too, have their place. Perl is ideal for crafting utilities that you can use at the command line, often as filter programs that take “standard in” (stdin) data, do something with the content, then pipe it out to “standard out” (stdout).

Example of setting up a filter program

while (my $line = <>) {
# Do something to the line from stdin if warranted
# then send the line back out to stdout with
print "$line\n";
}
exit;

There. That’s the skeleton for a Perl filter program. All you need to do is fill in the task(s) you want done on the line. (It can get more elaborate, certainly, but that’s true of all languages.)

Suppose you frequently write reports, articles, blog entries, or daily entries in a journal and that, being text oriented, use Markdown notation to mark how text should be formatted for display. Along the way you may have used two hyphens to indicate an em-dash (—).

Then you learn that Pandoc, which is widely used to convert Markdown to other formats, follows the TeX typesetting convention of using three hyphens to indicate an em-dash and two hyphens to indicate an en-dash, the shorter than em-dash, longer than hyphen dash that separates number ranges in academic and formal typesetting, e.g., Pages 2–5. How do you easily fix your documents so that Pandoc assigns em-dashes where you intended them?

You could open each document and do a find-and-replace operation. That’s okay for a few documents but suppose you needed a way to check this in multiple documents on a regular basis? Enter a few commands in the Perl script, using Perl’s regular expressions. You can, at the same time, also remove any blank spaces before and after the hyphens, putting the resulting dashes tight against the words — also standard for traditional typesetting. (Why do this? It makes the document look better, especially when the lines of output are justified, as in a report, printed book, or PDF.)

Example of using regexes in a filter program

All we need to do in Perl is inject some if logic and some regular expressions:

# Remove any spaces around " - " or " -- "
$line =~ s/\-\- /\-\-/g;
$line =~ s/ \-\-/\-\-/g;

# Test for " -- " and expand to " --- "
if ($line =~ m/\-\-/ and $line !~ m/\-\-\-/)
{
$line =~ s/\-\-/\-\-\-/g;
}

Note: I had to insert backslash escape characters in this snippet because Medium screws up the formatting otherwise.

Then you can use your filter program, let’s call it em-dash.pl, at the command line using pipes and redirection:

$ cat <filename> | em-dash.pl > <newfilename>

For a routine job you could then call this filter program in a shell script. Here’s an actual example I use to annually groom my monthly journal Markdown text files into Pandoc-ready versions, before turning my journals into an annual ebook:

#!/bin/bash

# Wrapper for em-dash.pl

for i in *.md
do
bname=$(basename "$i" | cut -d. -f1)
echo $bname # Display which file is being checked
em-dash.pl < $i > ./$bname.mmd
done

Converting the entire year’s journal entries with this script takes maybe 2 seconds. You’ll notice that I convert all my files that have the suffix .md and write out files with the same basenane but with an .mmd suffix. This preserves the original file in case something goes wrong. After a positive review, I then delete all the .md files with:

$ rm *.md

These examples were meant simply to give you a slight introduction to Perl, in case it catches your fancy. Perl is open source and free. It is packaged with any *nix-based operating system, including Linux, FreeBSD, and MacOS.

For an in-depth look at the language visit Perl.org. The most popular book on Perl is the “Camel Book”, Programming Perl, an O’Reilly book by Larry Wall and contributors.

Happy Perling!

Gene Wilburn is a retired Canadian IT specialist, writer, photographer, and occasional folksinger. He is the author of Markdown for Writers, 2nd Ed., Rev.

--

--

Gene Wilburn

I am a writer, photographer, semi-retired IT specialist, and occasional folksinger.