Most Obfuscated Translator: Tony Finch Demon Internet 322 Regent's Park Road Finchley London N3 2QQ United Kingdom http://www.inch.demon.co.uk/ Judges' comments: To build: make fanf Some systems may have problems building this entry because of the system resources can require. To use: Enter an expression on standard input. Here are some sample expressions that you can use: ((\a(\b(\c(d)))) e) (\f\g\x ( (g x) (f x) ) K K z ) (\f(\f\g\x( (f((\a(g(b))) e)) (g x) ) K K z)) (Y\f\n ((= n 0) 1 (* n (f (- n 1))) ) ) This program translares lambda expressions into combinator expressions. But you do not need to know Lambda Calculus to be impressed by this program! See the file: fanf.lambda for more examples of lambda calculus source. Notice how large the code grows from the fanf.c into the final fanftmp2.c C program. Take a look at that final C program again, can you begin to understand what it is doing? Look at the first stage of the C pre-processing: gcc -ansi fanf.c -E > fanftmp1.c Skip to the bottom of fanftmp1.c (after all of the #include header stuff ... look for a line of the form: # 2 "fanf.c" near the bottom) and look at the resulting code. This code, when C pre-processed will expand into over 342 times the original code (ignoring #include headers) to produce fanftmp2.c. That program in turn, when compiled translates a single lambda expression on standard input into combinator expressions on standard output. Extra credit question: What do you have to do to make this program work with an old non-ANSI C preprocessor? Selected Author's comments: My program is a demonstration of some practical applications of theoretical computer science. The application it implements is a program for translating lambda expressions into combinator expressions. What the program does ===================== The syntax of lambda expressions recognised by the program is as follows: The basic atoms are variables which are single characters other than backslash or parentheses, e.g. `a' or `b' or `@'. Variables are combined by function application which is written e.g. `(f a)' which is the function `f' applied to the argument `a'. Application groups to the left, so `(f a b c)' is equivalent to `(((f a) b) c)'. Functions are created from lambda terms which are written in the form `\a(expr)' which is the function taking argument `a' and having value `expr'. Lambda is spelt `\' because ASCII isn't good enough. Lambda binds tightly, so `\a\bcd' is equivalent to `((\a(\b(c))) d)'. Whitespace is ignored. For example, the identity function, called `I', is written `\x(x)'. A simple expression is `(I a)' which is equivalent to `(\x(x) a)', which has the value `a'. Another function called `K' is defined as `\x\y(x)' and it evaluates as follows: K p q -> \x\y(x) p q -> \y(p) q -> p Another function of interest is `S', defined as `\f\g\x((f x) (g x))'. My program demonstrates the fact that any lambda expression can be translated into an equivalent "combinator expression" which involves only `S', `K', `I' (the combinators), and application, with no variables or lambda abstractions. In fact, even `I' is unnecessary, since it is equivalent to `S K K': I z -> \a(a) z -> z S K K z -> \f\g\x((f x) (g x)) K K z -> (K z) (K z) -> \x\y(x) z (K z) -> z The algorithm for translating lambda expressions into combinator expressions works as follows. There are three forms of lambda expression to consider: variables, applications, and abstractions. trans `v' -> `v' (variable) trans `a b' -> (trans `a') (trans `b') (application) trans `\ab' -> abs a (trans `b') (abstraction) There are a further three cases to consider for the body of lambda expressions, where we need to do the magic that transforms away the variables. abs a `f x' -> `S' (abs a `f') (abs a `x') abs a `b' -> `K b' (b != a) abs a `a' -> `I' E.g. suppose we had combinator expressions for `+' and `3' and we wanted to see what the combinator expression for doubling 3 looked like: trans `\x(+ x x) 3' -> (trans `\x(+ x x)') (trans `3') -> (abs x (trans `+ x x')) `3' -> (abs x `+ x x') `3' -> `S' (abs x `+ x') (abs x `x') `3' -> `S' (`S' (abs x `+') (abs x `x')) `I' `3' -> `S' (`S' (`K +') `I') `I' `3' -> `S (S (K +) I) I 3' We can then check that this evaluates to the expected result: S (S (K +) I) I 3 -> S (K +) I 3 (I 3) -> K + 3 (I 3) (I 3) -> + (I 3) (I 3) -> + 3 3 It is possible to perform a slightly more compact translation with a couple of simple optimisations. For example, note that `K (a b)' is the same as `S (K a) (K b)', because K (a b) x -> a b and S (K a) (K b) x -> K a x (K b x) -> a b Also, `S (K a) I' is the same as just `a' because S (K a) I x -> K a x (I x) -> a x The program simply reads a lambda expression from the standard input, translates it into a combinator expression using the transformation and optimisations described above, and prints out the result. Why this is a practical application =================================== Combinator expressions are easier to evaluate than lambda expressions because there are no problems with variables. For example, if I naively evaluated the expression `\x\y(x y) y' I might get `\y(y y)' which is not the same as the correct result because the first `y' in the body should refer to a global `y' rather than the local one. David Turner was the first person to realise that because of this fact, SK combinators provide a simple way to implement non-strict functional programming languages, either in software or directly in hardware. My program demonstrates this by being based on a combinator evaluation engine which is used to implement a language called "OFL" (pronounced like "offal" and short for "Obfuscated Functional Language"). I then used OFL to implement all the high-level functionality described above. This contributes a lot to the obfuscation because even small amounts of OFL are quite unpalatable, and large expressions are completely incomprehensible. It is fairly well known that the lambda calculus (and hence SK combinators) can compute anything, but mere computation is no use if you cannot communicate with the world. This is why OFL includes a few concessions to reality: a combinator `E' for representing characters and testing them for equality, two IO combinators, `G' and `P', for reading and writing characters respectively, and a combinator `J' for representing "false". I also implemented the `Y' combinator directly rather than in terms of the primitive combinators, since it only required five characters to do so (viz. `Y W q Q g'). However this code is not enabled because the definition of Y in terms of other combinators is more cute. The `E' combinator comes in a large number of forms, one for each character and one for end of file. It is written `E(c)' for the variant of `E' corresponding to the character `c'. An expression of the form `E(c) x' is evaluated by first evaluating `x'. If `x' doesn't evaluate to some form of `E', then evaluation of the expression terminates. If `x' evaluates to `E(c)', then the expression evaluates to `K'. If `x' evaluates to `E(d)' (where `d' is a different character to `c'), then the expression evaluates to `J'. `K' and `J' are used to represent "true" and "false" respectively; they correspond exactly to the standard lambda calculus representations of true and false, viz. `\t\f(t)' and `\t\f(f)' respectively. In this way a conditional expression can just be written `cond then else', which is even more terse than C's `cond ? then : else'. An expression that compares a character with `a', say, can be written in the rather obfuscated form `E('a') char then else'. The `P' combinator, like the `E' combinator, starts off by evaluating its first argument and checking that it is some form of `E'. It then writes that character on stdout, and finally it calls its second argument as a function with the argument `I', i.e. P x f -> f I with a side-effect. The `G' combinator just reads a character from stdin then calls its argument as a function with the appropriate `E' combinator as an argument, i.e. G f -> f E(c) with a side-effect. One of the problems with non-strict languages is that it is hard to predict in advance in which order side-effects will occur. In my program I used a monadic IO structure like that used by the programming language Haskell; this allows one to write programs that manipulate external state in a manner remarkably similar to imperative programming languages like C. An examination of my program should provide an illustration of how well this technique works in practice. (In particular, I used the CPS form of the IO monad from page 6 of "Imperative Functional Programming".) The `Y' combinator is used for implementing recursive functions. It has the evaluation rule Y f -> f (Y f) so that it can be used to call a function with itself as an argument. For example, if I had the appropriate arithmetic operators defined, I could write a factorial function in the lambda calculus like this: Y \f\n((= n 0) 1 (* n (f (- n 1)))) OFL provides a facility for naming expressions so that they can be used more than once without writing them in full. A definition must be written on one line in the form `l name b expression' where `l' is short for "let" and `b' is short for "be". Names must be more than one character long. A complete OFL program starts off by defining the primitives it will use with likes of the form `p (S)' etc., then goes on to define the names it will use, and finishes off with the main expression followed by a single `e' short for "end". OFL expressions must use the minimum number of brackets needed to be unambiguous, except that the whole expression must be surrounded by brackets. OFL has a standard library which adds another five combinators to the basic eight. These combinators provide greater scope for optimisation when translating an expression from the lambda calculus to combinators, and therefore more compact programs. Their evaluation rules are: B f g x -> f (g x) C f g x -> (f x) g BB c f g x -> c (f (g x)) CC c f g x -> c (f x) g SS c f g x -> c (f x) (g x) I used them to help fit my program into the required space; unfortunately I couldn't fit the version that fully optimises the lambda expressions it reads into the required space. The complete optimisation rules including the ones it doesn't use are: S (K p) I -> p S (K p) (K r) -> K ( p r ) S (K p) (B r s) -> BB p r s S (K p) r -> B p r S (B p q) (K r) -> CC p q r S p (K r) -> C p r S (B p q) r -> SS p q r S p r -> S p r The implementation of OFL ========================= Although OFL has completely different syntax and semantics to C, all you need to compile OFL programs is a C compiler and a header file (which in my entry to the competition has been included in-line). The header file implements a two-pass optimising compiler which produces a representation of the program that is then interpreted. The first pass of the compiler is implemented via the C preprocessor. It implements one optimisation, namely function in-lining. Unfortunately, this optimisation always increases object code size and execution time. The opposite optimisation, common sub-expression elimination, (which would offer improvements in code size and execution time) has not been implemented. The output of this pass is a C program that may be compiled by a normal C compiler. The second pass is performed by the initialisation phase of the C program. It builds a data structure representing the abstract syntax tree of the program. The program is then executed by an interpreter. This operates by performing manipulations on the AST that correspond to the evaluation rules of the combinators that I described above. The first 54 lines of the program are mostly the interpreter; the next 8 lines are pass one of the compiler; the next nine declare the OFL primitives; the next five define the OFL standard library; and the rest implement the application described in the first part of this document. This part of the program is transformed into the second pass of the compiler by the first pass of the compiler. The first pass of the compiler relies on some magic macros that expand in a mutually-overlapping and almost recursive way in order to perpetrate a complete change of syntax. Interestingly enough, if I hadn't used the same identifier for more than one thing in various places in the program this transformation would work with old-style cpp as well and ANSI cpp. It expands the program from about 2.2KB to about 740KB. The resulting code does some tricky stuff to build a data structure of the same form as the code itself. The interpreter is written in fairly straightforward C, although it is highly compressed. The C itself is written in a fairly functional style, in that every function consists of a single expression. In the tradition of functional programming languages it assumes that memory is infinite and so never frees anything. It also assumes that the C compiler implements the tail recursion optimisation, which is probably not true. References ========== I used these references in the course of writing this program and they explain some of the ideas behind it in more detail. Arthur Norman (one of the authors of the first paper) did a lot to enthuse me about this topic. Clarke, T. J. W., et al., 1980. SKIM - the S, K, I Reduction Machine. Proceedings of the 1980 ACM LISP Conference, pp. 128-135. Curry, H. B. and Feys, R., 1958. Combinatory Logic, Vol. 1. North-Holland. Gordon, M. J. C., 1988. Programming Language Theory and its Implementation. Prentice-Hall. Peterson, J, et al., 1997. Report on the Programming Language Haskell, version 1.4. Peyton Jones, S. L., 1987. The implementation of Functional Programming Languages, Chapters 2, 11, 12, 16. Prentice-Hall. Peyton Jones, S. L., and Wadler, P., 1992. Imperative Functional Programming. Proceedings of the ACM Symposium on the Principles of Programming Languages 1993, pp. 71-84. Turner, D. A., 1979. A new implementation technique for applicative languages. Software- Practice and Experience. Vol. 9, pp. 31-49. Turner, D. A., 1979. Another algorithm for bracket abstraction. Journal of Symbolic Logic. Vol. 44, no. 2, pp. 67-270. Wadler, P., 1990. Comprehending Monads. Proceedings of the ACM conference on Lisp and Functional Programming, 1990. Wadler, P., 1995. How to Declare an Imperative. Proceedings of the ACM International Logic Programming Symposium, 1995.