Wednesday, October 11, 2006 4:50 PM bart

DynCalc - Dynamic compilation illustrated - Part 1: Infix to postfix

Introduction

A few days ago I blogged about dynamic languages and Lightweight Code Generation in .NET 2.0. Today, I'm providing you with the first part of a sample called DynCalc to perform calculations dynamically by compiling the calculation to IL. No interpretation, just some level of code generation. This post is part of a personal project that focuses on compiler techniques and runtime engines (SSCLI to be exact). Some point in time I might publish a bit more on that project.

Infix versus postfix

Typical mathematical calculations are written in an infix notation, which looks familiar to everyone with some basic notion of maths. An example is shown below:

((1+2)+3)*2-8/4

Binary operators appear in between the operands. Computers can't deal with this syntax directly, rather they need a postfix notation, also known as "reverse Polish notation". The sample above translated into postfix notation is:

1 2 + 3 + 2 * 8 4 / -

To make the conversion it's important to take care of the operator precedence (e.g. 8/4 is calculated instead of the ... - 8 portion). Parentheses can help to override default precedence rules.

This postfix notation has the characteristic of being stack-alike. Basically, values are pushed on the stack and operations are executed on the last two values on the stack:

                                            / 
        +           +           *           4       -
    2   2       3   3       2   2       8   8   2   2
1   1   1   3   3   3   6   6   6  12  12  12  12  12  10

The goal of this post is to provide you with a simple piece of conversion code that translates an infix notation into the corresponding postfix notation.

Parsing an infix expression

So, how to parse a given string with an infix expression? The basic algorithm is outlined below:

  1. Read a character from the input.
  2. If it's a whitespace character, go back to 1.
  3. If it's a digit from 0 to 9, read till a non-digit character is found and write out the accumulated integer value. Go to 7.
  4. If it's a left parenthesis '(', put it on the stack. Go to 7.
  5. If it's an operator (+, -, * or /), do:
    • If the stack is empty, put the operator on the stack and go to 7.
    • If the top of the stack is '(', put the operator on the stack and go to 7.
    • If the operator on top of the stack takes precedence over the read parameter, put the operator on the stack and go to 7.
    • Else, pop the last item from the stack and write it out. Go back to 5.
  6. If it's a right parenthesis ')', pop the stack till the matching '(' is found while writing out the popped items. Go to 7.
  7. If there's more to read from the input expression, go to 1, else go to 8.
  8. Pop all remaining characters from the stack and write out.

An example - convert ((1+2)+3)*2-8/4 to postfix:

(   (   1   +   2   )   +   3   )   *   2   -   8   /   4

            +   +
    (   (   (   (       +   +                       /   /
(   (   (   (   (   (   (   (       *   *   -   -   -   -
S   S   S   S   S   S   S   S   S   S   S   S   S   S   S

        1   1   1   1   1   1   1   1   1   1   1   1   1
                2   2   2   2   2   2   2   2   2   2   2
                    +   +   +   +   +   +   +   +   +   +
                            3   3   3   3   3   3   3   3
                                +   +   +   +   +   +   +
                                        2   2   2   2   2
                                            *   *   *   *
                                                8   8   8
                                                        4
                                                        /
                                                        -

The result can be built using a Queue collection object. When reading input from the given infix expression, the algorithm is applied and "write out" then means enqueuing "something". What's that something? A token in parser speak:

internal class MathOpOrVal
{
   public MathOp
? Op;
   public int
? Value;

   public MathOpOrVal(MathOp
op)
   {
      Op = op;
   }

   public MathOpOrVal(int
value)
   {
      Value = value;
   }

   public override string
ToString()
   {
      return (Op != null
? Op.Value.ToString() : Value.Value.ToString());
   }
}

internal class MathOpOrVal
{
   public MathOp
? Op;
   public int
? Value;

   public MathOpOrVal(MathOp
op)
   {
      Op = op;
   }

   public MathOpOrVal(int
value)
   {
      Value = value;
   }

   public override string
ToString()
   {
      return (Op != null
? Op.Value.ToString() : Value.Value.ToString());
   }
}

In C/C++ one would typically use a union to represent this token, but we stick with two nullable fields wrapped in a class.

Next, we have a trivial parser exception object:

class ParseException : Exception
{
   public ParseException(string
message)
      :
base
(message)
   {
   }
}

class ParseException : Exception
{
   public ParseException(string
message)
      :
base
(message)
   {
   }
}

And now the real stuff: the parser code. Let's just stick with a static class "Expression" with one static method called "InfixToPostfix":

static class Expression
{
   private static Queue<MathOpOrVal
> queue;

   public static Queue<MathOpOrVal> InfixToPostfix(string
expr)
   {
      if (expr == null
)
         throw new ArgumentNullException
();

      if (expr == ""
)
         throw new ArgumentException("Empty infix expression"
);

      queue =
new Queue<MathOpOrVal
>();

      StringReader sr = new StringReader
(expr);
      Stack<char> stack = new Stack<char
>();

      do
      {
         char c = (char
)sr.Read();

         if
(char.IsWhiteSpace(c))
         {
            continue
;
         }
         else if (char
.IsDigit(c))
         {
            int val = (int)(c - '0'
);

            char p = (char
)sr.Peek();
            while (p >= '0' && p <= '9'
)
            {
               val = val * 10 + (
int)(sr.Read() - '0'
);
               p = (
char
)sr.Peek();
            }
            Append(val);
         }
        
else if (c == '('
)
         {
            stack.Push(c);
         }
         else if (c == '+' || c == '-' || c == '*' || c == '/'
)
         {
            bool
repeat;
            do
            {
               repeat =
false
;
               if
(stack.Count == 0)
                  stack.Push(c);
               else if (stack.Peek() == '('
)
                  stack.Push(c);
               else if
(Precedence(c) > Precedence(stack.Peek()))
                  stack.Push(c);
               else
               {
                  Append(stack.Pop());
                  repeat =
true
;
               }
            }
while
(repeat);
         }
         else if (c == ')'
)
         {
            bool ok = false
;
            while
(stack.Count > 0)
            {
               char
p = stack.Pop();
               if (p == '('
)
               {
                  ok =
true
;
                  break
;
               }
               else
                  Append(p);
            }
            if
(!ok)
               throw new ParseException("Unbalanced parentheses."
);
         }
         else
            throw new ParseException("Invalid character encountered: "
+ c);
      }
while
(sr.Peek() != -1);

      while
(stack.Count > 0)
      {
         char
p = stack.Pop();
         if (p == '('
)
            throw new ParseException("Unbalanced parentheses."
);
         Append(p);
      }

      return
queue;
   }

   private static void Append(char
c)
   {
      switch
(c)
      {
         case '+'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Add));
            break
;
         case '-'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Sub));
            break
;
         case '*'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Mul));
            break
;
         case '/'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Div));
            break
;
         default
:
            throw new ParseException("Parser error."
);
      }
   }

   private static void Append(int
v)
   {
      queue.Enqueue(
new MathOpOrVal
(v));
   }

   private static int Precedence(char
c)
   {
      switch
(c)
      {
         case '*'
:
         case '/'
:
            return
2;
         case '+'
:
         case '-'
:
            return
1;
         default
:
            throw new Exception
(); //should not happen
      }
   }
}

static class Expression
{
   private static Queue<MathOpOrVal
> queue;

   public static Queue<MathOpOrVal> InfixToPostfix(string
expr)
   {
      if (expr == null
)
         throw new ArgumentNullException
();

      if (expr == ""
)
         throw new ArgumentException("Empty infix expression"
);

      queue =
new Queue<MathOpOrVal
>();

      StringReader sr = new StringReader
(expr);
      Stack<char> stack = new Stack<char
>();

      do
      {
         char c = (char
)sr.Read();

         if
(char.IsWhiteSpace(c))
         {
            continue
;
         }
         else if (char
.IsDigit(c))
         {
            int val = (int)(c - '0'
);

            char p = (char
)sr.Peek();
            while (p >= '0' && p <= '9'
)
            {
               val = val * 10 + (
int)(sr.Read() - '0'
);
               p = (
char
)sr.Peek();
            }
            Append(val);
         }
        
else if (c == '('
)
         {
            stack.Push(c);
         }
         else if (c == '+' || c == '-' || c == '*' || c == '/'
)
         {
            bool
repeat;
            do
            {
               repeat =
false
;
               if
(stack.Count == 0)
                  stack.Push(c);
               else if (stack.Peek() == '('
)
                  stack.Push(c);
               else if
(Precedence(c) > Precedence(stack.Peek()))
                  stack.Push(c);
               else
               {
                  Append(stack.Pop());
                  repeat =
true
;
               }
            }
while
(repeat);
         }
         else if (c == ')'
)
         {
            bool ok = false
;
            while
(stack.Count > 0)
            {
               char
p = stack.Pop();
               if (p == '('
)
               {
                  ok =
true
;
                  break
;
               }
               else
                  Append(p);
            }
            if
(!ok)
               throw new ParseException("Unbalanced parentheses."
);
         }
         else
            throw new ParseException("Invalid character encountered: "
+ c);
      }
while
(sr.Peek() != -1);

      while
(stack.Count > 0)
      {
         char
p = stack.Pop();
         if (p == '('
)
            throw new ParseException("Unbalanced parentheses."
);
         Append(p);
      }

      return
queue;
   }

   private static void Append(char
c)
   {
      switch
(c)
      {
         case '+'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Add));
            break
;
         case '-'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Sub));
            break
;
         case '*'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Mul));
            break
;
         case '/'
:
            queue.Enqueue(
new MathOpOrVal(MathOp
.Div));
            break
;
         default
:
            throw new ParseException("Parser error."
);
      }
   }

   private static void Append(int
v)
   {
      queue.Enqueue(
new MathOpOrVal
(v));
   }

   private static int Precedence(char
c)
   {
      switch
(c)
      {
         case '*'
:
         case '/'
:
            return
2;
         case '+'
:
         case '-'
:
            return
1;
         default
:
            throw new Exception
(); //should not happen
      }
   }
}

If you do understand the algorithm aforementioned, this piece of code should be rather straightforward. It reads the supplied string character per character, while it builds a stack to deal with operator precedence (and the use of parentheses to override precedence).

There are a few limitations however:

  • No % operator or ^ operator.
  • Only works with integer values (no decimal separator character parsing); / is an integer division.
  • No support for mathematical functions like sin, cos, tan, etc.
  • No unary negation operator support.

Although it's no so difficult to extend the code to deal with the limitation mentioned above, I kept things easy because of the end goal: an example of dynamic compilation.

In the next post we'll take a look at the creation of an expression tree out of the postfix translation. This might seem useless because the postfix form can be translated into IL rather directly because of its stack-based character. However, I find it useful to discuss the use of trees as well, as a basic concept used in compiler creation.

To wet appetite, here's the end result we'll obtain:

You should be able to produce the postfix representation out of the code above. Tree construction will follow in part 2 and the mekka of this journey, dynamic calculation using IL translation, will be covered in part 3.

See you tomorrow!

kick it on DotNetKicks.com

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under: ,

Comments

# Getting started with C# 3.0 Expression Trees

Wednesday, November 22, 2006 2:20 PM by B# .NET Blog

Somewhere in October this year, a series of three posts appeared online on this blog, covering "DynCalc"

# DynCalc - Dynamic Compilation Illustrated - Part 4: C# 3.0 Expression Trees

Thursday, November 23, 2006 9:09 AM by B# .NET Blog

Introduction In the previous posts on DynCalc , we explored how to write a simple parser for simple calculations,

# The IQueryable tales - LINQ to LDAP - Part 3: Why do we need entities?

Saturday, April 07, 2007 6:32 PM by B# .NET Blog

Introduction Welcome back to the LINQ-to-LDAP series. So far, we've been discussing: Part 0: Introduction