Dawn of the Data: May 2014

One of the recommendations in The Pragmatic Programmer is to learn at least one new programming language every year. According to the book:

Different languages solve the same problems in different ways. By learning several different approaches, you can help broaden your thinking and avoid getting stuck in a rut.

Over the years, I've taken that advice to heart. I guess you could say that I'm an expert language learner at this point. There are many languages that I'm proficient in (C, C++, Lua, Common Lisp, Python), and many more that I've at least written some non-trivial code in (Perl, JavaScript, LabVIEW, Java, Prolog, Haskell, Erlang, Ruby, Scheme, and more...).

A programming language is more than a tool for writing programs. It forms a mental framework for solving problems. When faced with a problem, a dyed-in-the-wool C++ programmer will start mentally formulating an object-oriented solution with virtual functions, overloading, templates, etc. A "functional" solution to the same problem may never occur to them.

The hardest thing about learning a new programming language is that there are usually at least one or two novel features that break the mold of our current mental framework. If not, the language may not be worth learning in the first place.

So, given that learning new languages can be challenging, I would like to share a tip that has served me well over the years.

One of the best ways to really understand a new or novel language feature is to think of ways to twist and abuse it.

I know it sounds odd, but I like doing this for several reasons.

It gives me an interesting mini-project to work on which usually leaves me with a better understanding of how the feature works.
I'm likely to run into the limitations and corner cases of the language (which is something I enjoy exploring)
Thinking of ways to abuse a feature is a creative exercise that allows my brain to leave the confines of its normal working space. This puts me in a good frame-of-mind for breaking my existing mental molds.

So, enough with justifications, let's jump right into today's victim, the Common Lisp condition system.

Intro to Conditions

At a very high level, Common Lisp's conditions are analogous to exceptions in other languages. They can be used to interrupt the control flow of an application and pass error information up the call stack.

Here's a simple example comparing C++ exceptions and Lisp conditions so you can compare the basic usage.

// C++
try {
   LogData d = parseLog(log);
   doSomethingWithData(d);
}
catch (const LogParseError& e) {
   fprintf(stderr, "Failed to parse data out of log");
}

;; Common Lisp
(handler-case
    (let ((d (parse-log log)))
      (do-something-with-data d))
  (log-parse-error (e) 
    (format *error-output* "Failed to parse data out of log")))

As you can see, these examples are very similar. However, Lisp conditions are more than just exceptions by a different name.

Enter the Restart and the Condition Handler

One of the biggest differences between Lisp conditions and traditional exceptions are features of the condition system called restarts and condition handlers.

These features, used in conjunction, can be used to invoke custom logic when a particular exceptional situation is encountered. If a given condition type has a handler "bound" to it, when this condition is signaled, the corresponding condition handler is run. This handler has the opportunity to examine the condition signaled and invoke a restart (if provided by the lower level code). A key thing to note is that the condition handler is run before unwinding the stack, and by invoking a restart can handle the error right where it occurred.

That last part is an important distinction. Condition handlers and restarts allow you to handle exceptional situations right where they happen, as opposed to higher up the call stack.

Separating Mechanism from Policy

Before getting into the specifics of why someone would want to use a restart, let's digress for a minute and talk about something more philosophical.

In general, well-designed software should clearly separate the high-level rules and requirements from the low-level mechanics of what is being done. A common way to refer to this is separating mechanism from policy.

By allowing for high-level code to dictate how lower-level code recovers from errors, condition handlers and restarts can provide a better separation between the mechanism and policy of your code.

Anyway, enough talk about what condition handlers and restarts are, let's see some abuse!

Abuse: C-Style Hex Literals in Common Lisp

Given the age and lineage of Lisp, there are some things that seem odd when viewed through the lense of newer languages. One such thing is how hexadecimal literals are specified.

Rather than using something like 0x100, which you could use in just about any other language. In Common Lisp, you have to type #x100. Once you get used to it, it's not terrible, but it would be nicer if I could specify hex literals using the more common c-style notation.

In this first example, I will use restarts to allow for c-style hex literals in Lisp code.

(Note: All of my examples are using GNU Clisp. The default restarts provided by each Common Lisp implementation will unfortunately be different so this may not work exactly as-is in different implementations.)

In GNU Clisp, when the evaluator encounters a variable it doesn't recognize, it signals the unbound-variable condition. When running code in the interpreter, Clisp provides a restart called use-value. This restart allows you to provide a value to substitute for the unknown variable. We can take advantage of this mechanism in this case because when the evaluator encounters something like 0x100, it will try to interpret it as a variable name.

First, let's create a helper function that will convert a c-style hex string into a number.

(defun hex-val (str)
  (multiple-value-bind (match regs)
      (cl-ppcre:scan-to-strings "^0[xX]([0-9a-fA-F]+)$" str)
    (if match
      (parse-integer (aref regs 0) :radix 16)
      nil)))

Next, we need a condition handler function that can take an "unbound-variable" condition and invoke the "use-value" restart with the appropriate value if this looks like a hex literal.

(defun c-like-hex-literal-handler (condition)
  (let* ((var-name (symbol-name (cell-error-name condition)))
         (val (hex-val var-name)))
    (when val
      (invoke-restart 'use-value val))))

With these 2 functions in place, we can now do things like this:

(handler-bind ((unbound-variable #'c-like-hex-literal-handler))
  (format t "Answer is ~A~%" (+ 0x01 0x100 10)))
;; --> "Answer is 267"

This being lisp though, we can also create a macro that does this.

(defmacro with-hex-literals (&body body)
  `(handler-bind ((unbound-variable #'c-like-hex-literal-handler))
     ,@body))

Now we can just do:

(with-hex-literals
  (format t "Answer is ~A~%" (+ 0x01 0x100 10)))
;; --> "Answer is 267"

Note that this would work in more than just the trivial cases described above. To use some Lisp terminology, the condition handler binding exists in the dynamic environment (as opposed to the lexical one). This means that the c-style hex literals can be used at any point between when the handler binding is made and when the handler-bind form completes.

Abuse: Limited Infix Notation

As you've surely grasped by now, Lisp uses what is called prefix notation. This just means that rather than expressing things like "1 + 1", Lisp puts the function first, like "+ 1 1".

In a similar vein to our c-style hex literal code, let's add support for simple infix literals, just for fun. This code will take advantage of the infix library by Mark Kantrowitz.

First, let's write a function that will evaluate a string as infix notation (or return nil if it doesn't look like an infix expression).

(defun infix-val (str)
  (let ((infix-form (handler-case (infix:string->prefix str)
                      (condition () nil))))
    (when infix-form
      (eval infix-form))))

Note that we're using handler-case here to convert any conditions signaled by the prefix conversion to nil.

Next, we'll write a condition handler function that takes an unbound-variable condition and invokes the use-value restart with the appropriate value (if this is infix). Notice that this looks very similar to the c-like-hex-literal-handler.

(defun infix-literal-handler (condition)
  (let* ((var-name (symbol-name (cell-error-name condition)))
         (val (infix-val var-name)))
    (when val
      (invoke-restart 'use-value val))))

Finally, we'll create a macro that establishes an environment in which we can use infix notation.

(defmacro with-infix-literals (&body body)
  `(handler-bind ((unbound-variable #'infix-literal-handler))
     ,@body))

Now we can do things like this:

(with-infix-literals
  (format t "Answer is ~A~%" (+ 1 10*10)))
;; --> "Answer is 101"

Note that this particular use has many limitations.

There cannot be any spaces in the infix notation since the evaluator must treat the whole thing like a variable name.
We cannot use parenthesis for grouping since the lisp reader breaks up tokens at parens.

As an interesting aside, we can actually use lisp's pipe ('|') notation to work around both of these limitations...but that somewhat defeats the purpose of having a simple in-line notation without special syntax.

(with-infix-literals
  (format t "Answer is ~A~%" (+ 1 |10 * 10 * (1 + 1)|)))
;; --> "Answer is 201"

Supreme Abuse: Putting it all together

Let's create one macro that combines these things to show how these 2 examples can actually work together.

(defmacro with-crazy-stuff (&body body)
  `(with-hex-literals
     (with-infix-literals
       ,@body)))

This lets us use c-style hex literals in infix expressions

(with-crazy-stuff
  (format t "Answer is ~A~%" (+ 1 10+0x3*0xa)))
;; --> "Answer is 41"

Let's go over what's going on behind the scenes here.

During evaluation, the unbound-variable condition is signaled for the 10+0x3*0xa "variable".
The infix-literal-handler transforms this into
```
(+ 10 (* 0x3 0xa))
```
by invoking the use-value restart
This triggers another unbound-variable condition to be signaled for the 0x3 and 0xa "variables" (each in turn).
The outer c-like-hex-literal-handler transforms these into the numeric versions of these hex values (again by invoking the use-value restart)

Conclusion

I hope that this leads some people to a little better of an understanding of conditions, condition handlers, and restarts. However, please don't do this in production code :). This was purely intended as a vehicle for learning and nothing more.

That being said, if you'd like to try this code for yourself, I've packaged it as a GitHub Gist. Are there other language features you would like me to abuse? Let me know in the comments.

Friday, May 16, 2014

(Ab)Using Language Features: The Common Lisp Condition System