Author:
- Name: Tony Finch
Location: GB - United Kingdom of Great Britain and Northern Ireland (United Kingdom)
To build:
make all
NOTE: this may take a while. Some systems may have problems building this entry because of the system resources it requires.
To use:
./fanf
Try:
Enter an expression on standard input. To try some we have selected:
./try.sh
Judges’ remarks:
This program translates lambda expressions into combinator expressions. But you do not need to know Lambda Calculus to be impressed by this program!
See the file fanf.lambda for more examples of lambda calculus source.
Notice how large the code grows from the fanf.c into the final
fanftmp2.c
C program. Take a look at that final C program again,
can you begin to understand what it is doing?
Look at the first stage of the C pre-processing which was done like:
cc fanf.c -E > fanftmp1.c
Skip to the bottom of fanftmp1.c
(after all of the C pre-processor header
stuff … look for a line of the form: # 38 "fanf.c"
near the bottom)
and look at the resulting code. This code, when C pre-processed
will expand into over 342 times the original code (ignoring #include
headers) to produce fanftmp2.c
. That program in turn, when compiled
translates a single lambda expression on standard input into
combinator expressions on standard output.
Extra credit question:
What do you have to do to make this program work with an old non-ANSI C preprocessor?
Author’s remarks:
My program is a demonstration of some practical applications of theoretical computer science. The application it implements is a program for translating lambda expressions into combinator expressions.
What the program does
The syntax of lambda expressions recognised by the program is as
follows: The basic atoms are variables which are single characters other
than backslash or parentheses, e.g. a
or b
or @
. Variables are
combined by function application which is written as e.g. (f a)
which is
the function f
applied to the argument a
. Application groups to the
left, so (f a b c)
is equivalent to (((f a) b) c)
. Functions are
created from lambda terms which are written in the form \a(expr)
which
is the function taking argument a
and having value expr
. Lambda is
spelt \
because ASCII isn’t good enough. Lambda binds tightly, so
\a\bcd
is equivalent to ((\a(\b(c))) d)
. Whitespace is ignored.
For example, the identity function, called I
, is written \x(x)
. A
simple expression is (I a)
which is equivalent to (\x(x) a)
, which
has the value a
. Another function called K
is defined as \x\y(x)
and it evaluates as follows:
K p q
-> \x\y(x) p q
-> \y(p) q
-> p
Another function of interest is S
, defined as \f\g\x((f x) (g x))
.
My program demonstrates the fact that any lambda expression can be
translated into an equivalent “combinator expression” which involves
only S
, K
, I
(the combinators), and application, with no
variables or lambda abstractions. In fact, even I
is unnecessary,
since it is equivalent to S K K
:
I z
-> \a(a) z
-> z
S K K z
-> \f\g\x((f x) (g x)) K K z
-> (K z) (K z)
-> \x\y(x) z (K z)
-> z
The algorithm for translating lambda expressions into combinator expressions works as follows. There are three forms of lambda expression to consider: variables, applications, and abstractions.
trans v -> v (variable)
trans a b -> (trans a) (trans b ) (application)
trans \ab -> abs a (trans b) (abstraction)
There are a further three cases to consider for the body of lambda expressions, where we need to do the magic that transforms away the variables.
abs a f x -> S (abs a f) (abs a x)
abs a b -> K b (b != a)
abs a a -> I
E.g. suppose we had combinator expressions for +
and 3
and we wanted
to see what the combinator expression for doubling 3 looked like:
trans \x(+ x x) 3
-> (trans \x(+ x x)) (trans 3)
-> (abs x (trans + x x)) 3
-> (abs x + x x) 3
-> S (abs x + x) (abs x x) 3
-> S (S (abs x +) (abs x x)) I 3
-> S (S (K +) I) I 3
-> S (S (K +) I) I 3
We can then check that this evaluates to the expected result:
S (S (K +) I) I 3
-> S (K +) I 3 (I 3)
-> K + 3 (I 3) (I 3)
-> + (I 3) (I 3)
-> + 3 3
It is possible to perform a slightly more compact translation with a
couple of simple optimisations. For example, note that K (a b)
is the
same as S (K a) (K b)
, because
K (a b) x
-> a b
and
S (K a) (K b) x
-> K a x (K b x)
-> a b
Also, S (K a) I
is the same as just a
because
S (K a) I x
-> K a x (I x)
-> a x
The program simply reads a lambda expression from the standard input, translates it into a combinator expression using the transformation and optimisations described above, and prints out the result.
Why this is a practical application
Combinator expressions are easier to evaluate than lambda expressions
because there are no problems with variables. For example, if I naively
evaluated the expression \x\y(x y) y
I might get \y(y y)
which is
not the same as the correct result because the first y
in the body
should refer to a global y
rather than the local one. David Turner was
the first person to realise that because of this fact, SK combinators
provide a simple way to implement non-strict functional programming
languages, either in software or directly in hardware.
My program demonstrates this by being based on a combinator evaluation
engine which is used to implement a language called OFL
(pronounced
like “offal” and short for “Obfuscated Functional Language”). I then
used OFL
to implement all the high-level functionality described above.
This contributes a lot to the obfuscation because even small amounts of
OFL
are quite unpalatable, and large expressions are completely
incomprehensible.
It is fairly well known that the lambda calculus (and hence SK
combinators) can compute anything, but mere computation is no use if you
cannot communicate with the world. This is why OFL
includes a few
concessions to reality: a combinator E
for representing characters and
testing them for equality, two I/O combinators, G
and P
, for reading
and writing characters respectively, and a combinator J
for
representing false
. I also implemented the Y
combinator directly
rather than in terms of the primitive combinators, since it only
required five characters to do so (viz. Y W q Q g
). However this code
is not enabled because the definition of Y
in terms of other combinators
is more cute.
The E
combinator comes in a large number of forms, one for each
character and one for end of file. It is written E(c)
for the variant
of E
corresponding to the character c
. An expression of the form
E(c) x
is evaluated by first evaluating x
. If x
doesn’t evaluate
to some form of E
, then evaluation of the expression terminates. If
x
evaluates to E(c)
, then the expression evaluates to K
. If x
evaluates to E(d)
(where d
is a different character to c
), then
the expression evaluates to J
.
K
and J
are used to represent true
and false
respectively; they
correspond exactly to the standard lambda calculus representations of
true
and false
, viz. \t\f(t)
and \t\f(f)
respectively. In this way a
conditional expression can just be written cond then else
, which is
even more terse than C cond ? then : else
. An expression that
compares a character with a
, say, can be written in the rather
obfuscated form E('a') char then else
.
The P
combinator, like the E
combinator, starts off by evaluating
its first argument and checking that it is some form of E
. It then
writes that character on stdout
, and finally it calls its second
argument as a function with the argument I
, i.e.
P x f -> f I
with a side-effect.
The G
combinator just reads a character from stdin
then calls its
argument as a function with the appropriate E
combinator as an
argument, i.e.
G f -> f E(c)
with a side-effect.
One of the problems with non-strict languages is that it is hard to predict in advance in which order side-effects will occur. In my program I used a monadic I/O structure like that used by the programming language Haskell; this allows one to write programs that manipulate external state in a manner remarkably similar to imperative programming languages like C. An examination of my program should provide an illustration of how well this technique works in practice. (In particular, I used the CPS form of the I/O monad from page 6 of “Imperative Functional Programming”.)
The Y
combinator is used for implementing recursive functions. It has
the evaluation rule
Y f -> f (Y f)
so that it can be used to call a function with itself as an argument. For example, if I had the appropriate arithmetic operators defined, I could write a factorial function in the lambda calculus like this:
Y \f\n((= n 0) 1 (* n (f (- n 1))))
OFL
provides a facility for naming expressions so that they can be used
more than once without writing them in full. A definition must be written
on one line in the form l name b expression
where l
is short for
“let” and b
is short for “be”. Names must be more than one character
long. A complete OFL
program starts off by defining the primitives it
will use with likes of the form p (S)
etc., then goes on to define the
names it will use, and finishes off with the main expression followed by
a single e
short for “end”. OFL
expressions must use the minimum
number of brackets needed to be unambiguous, except that the whole
expression must be surrounded by brackets.
OFL
has a standard library which adds another five combinators to the
basic eight. These combinators provide greater scope for optimisation
when translating an expression from the lambda calculus to combinators,
and therefore more compact programs. Their evaluation rules are:
B f g x -> f (g x)
C f g x -> (f x) g
BB c f g x -> c (f (g x))
CC c f g x -> c (f x) g
SS c f g x -> c (f x) (g x)
I used them to help fit my program into the required space; unfortunately I couldn’t fit the version that fully optimises the lambda expressions it reads into the required space. The complete optimisation rules including the ones it doesn’t use are:
S (K p) I -> p
S (K p) (K r) -> K ( p r )
S (K p) (B r s) -> BB p r s
S (K p) r -> B p r
S (B p q) (K r) -> CC p q r
S p (K r) -> C p r
S (B p q) r -> SS p q r
S p r -> S p r
The implementation of OFL
Although OFL
has completely different syntax and semantics to C, all you
need to compile OFL
programs is a C compiler and a header file (which in
my entry to the competition has been included in-line). The header file
implements a two-pass optimising compiler which produces a representation
of the program that is then interpreted.
The first pass of the compiler is implemented via the C preprocessor. It implements one optimisation, namely function in-lining. Unfortunately, this optimisation always increases object code size and execution time. The opposite optimisation, common sub-expression elimination (which would offer improvements in code size and execution time), has not been implemented. The output of this pass is a C program that may be compiled by a normal C compiler.
The second pass is performed by the initialisation phase of the C program. It builds a data structure representing the abstract syntax tree of the program.
The program is then executed by an interpreter. This operates by performing manipulations on the AST that correspond to the evaluation rules of the combinators that I described above.
The first 54 lines of the program are mostly the interpreter; the next 8
lines are pass one of the compiler; the next nine declare the OFL
primitives; the next five define the OFL
standard library; and the rest
implement the application described in the first part of this document.
This part of the program is transformed into the second pass of the
compiler by the first pass of the compiler.
The first pass of the compiler relies on some magic macros that expand in a mutually-overlapping and almost recursive way in order to perpetrate a complete change of syntax. Interestingly enough, if I hadn’t used the same identifier for more than one thing in various places in the program this transformation would work with old-style cpp as well and ANSI cpp. It expands the program from about 2.2KB to about 740KB. The resulting code does some tricky stuff to build a data structure of the same form as the code itself.
The interpreter is written in fairly straightforward C, although it is highly compressed. The C itself is written in a fairly functional style, in that every function consists of a single expression. In the tradition of functional programming languages it assumes that memory is infinite and so never frees anything. It also assumes that the C compiler implements the tail recursion optimisation, which is probably not true.
References
I used these references in the course of writing this program and they explain some of the ideas behind it in more detail. Arthur Norman (one of the authors of the first paper) did a lot to enthuse me about this topic.
Clarke, T. J. W., et al., 1980. SKIM - the S, K, I Reduction Machine. Proceedings of the 1980 ACM LISP Conference, pp. 128-135.
Curry, H. B. and Feys, R., 1958. Combinatory Logic, Vol. 1. North-Holland.
Gordon, M. J. C., 1988. Programming Language Theory and its Implementation. Prentice-Hall.
Peterson, J, et al., 1997. Report on the Programming Language Haskell, version 1.4.
Peyton Jones, S. L., 1987. The implementation of Functional Programming Languages, Chapters 2, 11, 12, 16. Prentice-Hall.
Peyton Jones, S. L., and Wadler, P., 1992. Imperative Functional Programming. Proceedings of the ACM Symposium on the Principles of Programming Languages 1993, pp. 71-84.
Turner, D. A., 1979. A new implementation technique for applicative languages. Software- Practice and Experience. Vol. 9, pp. 31-49.
Turner, D. A., 1979. Another algorithm for bracket abstraction. Journal of Symbolic Logic. Vol. 44, no. 2, pp. 67-270.
Wadler, P., 1990. Comprehending Monads. Proceedings of the ACM conference on Lisp and Functional Programming, 1990.
Wadler, P., 1995. How to Declare an Imperative. Proceedings of the ACM International Logic Programming Symposium, 1995.
Inventory for 1998/fanf
Primary files
- fanf.c - entry source code
- Makefile - entry Makefile
- fanf.orig.c - original source code
- try.sh - script to try entry
- fanf.lambda - lambda expressions
Secondary files
- 1998_fanf.tar.bz2 - download entry tarball
- README.md - markdown source for this web page
- .entry.json - entry summary and manifest in JSON
- .gitignore - list of files that should not be committed under git
- .path - directory path from top level directory
- index.html - this web page