Evaluating Infix Expressions

# Evaluating Infix Expressions

Infix expressions are, put simply, expressions written in the 'normal' way, with operators placed between operands. (1 + 2) * 3 is an example of an infix expression. This isn't the only way to write an arithmetic expression. Other options include prefix and postfix. A postfix expression has the operators placed after their operands (e.g. 1 2 + 3 *). Prefix expressions place operators before their operands (e.g. * 3 + 2 1).

Postfix expressions are trivial for computers to evaluate as they do away with all the complex features of infix expressions like parenthesis and order of operations. Infix expressions, in contrast, are incredibly difficult to process. The bulk of that complexity comes from operator precidence, which is not an issue for postfix expressions. We'll use an extended version of the shunting yard algorithm to convert our infix expression into a postfix expression. While all of our examples here will produce a postfix expression as output, the resulting postfix expression can be evaluated as it's being produced.

## Evaluating Postfix Expressions

Postfix expressions are evaluated strictly from left to right, using a stack to hold unprocessed operands and intermediate values.

The method is simple. Tokens are read one at a time, starting from the left-hand side of the expression. If the token is an operand, we push it on to the stack. If a token is an operator, we pop the number of operands it takes (according to the operator's arity) and push the computed result on to the stack. If the stack underflows, or the stack has more than one value after we've processed every token, we know the expression was invalid.

Let's try a simple example: 1 2 + 3 *

 Stack Expression Description [1] 2 + 3 * The first token is a number, so it gets pushed to the stack. 1 1 [2] + 3 * The second token is a number, so it gets pushed to the stack. 1 2 1 2 [+] 3 * The third token is a binary operator (it needs two operands), so ... 1 2 [+] 3 * ... pop the top two numbers off the stack and add them together ... 3 1 2 [+] 3 * ... push the result on to the stack. 3 1 2 + [3] * The forth token is a number, so push it to the stack 3 3 1 2 + 3 [*] The final token is a binary operator, so ... 1 2 + 3 [*] ... pop the top two numbers off the stack and multiply them ... 9 1 2 + 3 [*] ... push the result on to the stack.

We have just one value on the stack, the result, so that means our expression is correctly formed.

You can see how easy it would be to extend this to include unary operators (operators that take one value, like a unary minus) and fixed-arity functions.

## Infix to Postfix

We'll start with an extremely simplified version of the shunting yard algorithm and add to it as we go. For our first version, we'll just handle binary operators (following proper order of operations) without handling parenthesis, right-associative operators, functions, or unary operators.

#### Shunting yard version 1:

Using a single stack for operators and stream of input tokens:

1. For each token in the input stream:
1. If the token is an operand:
1. Output the token
2. If the token is an operator:
1. Pop any operators of greater or equal precedence off the stack and output them
2. Push the operator on to stack
2. When there are no more tokens:
1. Pop any remaining tokens off the stack and output them

Order of operations demands that multiplication and division operations are evaluated before addition and subtraction. That is, multiplication and division have a higher precedence than addition and subtraction. A table of common operators and their relative precidence is included at the end of this section.

 Output Stack Expression Description [1] + 2 * 3 The first token is a number. Output it. 1 1 [+] 2 * 3 The next token is an operator. The stack is empty, so there are no operators of greater or equal precedence on the stack. Push it to the stack. 1 + 1 + [2] * 3 The next token is a number. Output it. 1 2 + 1 + 2 [*] 3 The next token is an Operator: 1 2 [+] 1 + 2 [*] 3 The operator on the top of the stack has a lower precedence than the current operator, so push the current token on to the stack. 1 2 + * 1 + 2 * [3] The next token is a number. Output it. 1 2 3 + * 1 + 2 * 3 There are no more tokens, pop and output any tokens on the stack. 1 2 3 * + 1 + 2 * 3 ... 1 2 3 * + 1 + 2 * 3 ...

### Handling Parentheses

Adding support for parentheses from this point is simple, requiring the addition of just two new rules. If we encounter an open parenthesis, push it to the stack. If we encounter a close parenthesis, pop and output anything on the stack until we find an open parenthesis at the top of the stack (which is then discarded).

You only want to pop an open parenthesis off the stack when you're handling its matching close parenthesis. Consequently, when you're popping other operators off the stack for other reasons, you'll need to stop and exit your loop when you find an open parenthesis.

#### Shunting yard version 2:

Using a single stack for operators and stream of input tokens:

1. For each token in the input stream:
1. If the token is an operand:
1. output the token.
2. If the token is an operator:
1. Pop any operators of greater or equal precedence off the stack and output them
2. Push the operator on to stack
3. If the token is an open parenthesis:
1. push it on to the stack
4. If the token is a close parenthesis:
1. Pop any operators off the stack and output them until an open parenthesis is on the top of the stack.
2. Pop and discard the open parenthesis.
2. When there are no more tokens:
1. Pop any remaining tokens off the stack and output them
Let's try it out, using parenthesis to modify the order in which our operations are performed from the previous example by evaluating (1 + 2) * 3:
 Output Stack Expression Description [(] 1 + 2 ) * 3 The first token is an open parenthesis. Push it to the stack. ( ( [1] + 2 ) * 3 The next token is a number. Output it. 1 ( 1 [+] 2 * 3 The next token is an operator. The top of the stack has an open parenthesis, so there is nothing to pop. Push the token on to the stack. 1 ( + ( 1 + [2] ) * 3 The next token is a number. Output it. 1 2 ( + ( 1 + 2 [)] * 3 The next token is a close parenthesis: 1 2 ( [+] ( 1 + 2 [)] * 3 ...The operator on the top of the stack is not an open parenthesis, so pop and output it. 1 2 + [(] ( 1 + 2 [)] * 3 ...The top of the stack is an open parenthesis. Pop and discard it. 1 2 + ( 1 + 2 ) [*] 3 The next token is an Operator: 1 2 + ( 1 + 2 ) [*] 3 ...The stack is empty, so push it. 1 2 + * ( 1 + 2 ) * [3] The next token is a number. Output it. 1 2 + 3 * ( 1 + 2 ) * 3 There are no more tokens, pop and output any tokens on the stack. 1 2 + 3 * ( 1 + 2 ) * 3 ...

If you don't find a matching open parenthesis when handling a close parenthesis (stack underflow) or if you find an open parenthesis during the last step (while you're outputting any operators remaining on the stack) you'll know there was no matching close parenthesis, then you know that the expression wasn't correctly formed.

### Unary Operators

There are two kinds of unary operators, prefix and postfix. Prefix unary operators precede their operand (e.g. -3) while postfix unary operators follow their operand (e.g. 3!).

When you encounter a prefix unary operator, just push it on to the stack. There are, however, some important things to consider.

You'd expect a prefix unary operator to be applied immediately to the operand or expression immediately to the right, so unary operators should have a higher precedence than any other operator, though this isn't always desirable. Consider -32 (i.e. -3^2 or -3**2). Many programming languages, like Fortran and Cobol, will handle the unary minus first giving (-3)2 rather than the expected -(32). Other languages, like VBA, give exponentiation higher precedence than negation. Some scientific calculators will even prioritize implicit multiplication over negation interpreting -3x as -(3*x) instead of (-3)*x. A few languages, like Ada, prioritize both exponentiation and multiplication over negation.

Some prefix unary operators look identical to other operators. Unary minus and binary minus, for example. As you've probably already guessed, it's important to be able to distinguish between the two. Fortunately, you can easily identify a prefix unary operator in a properly formed infix expression by looking at the previous token in the input stream. If the previous token was an operator or an open parenthesis, we know that the current operator is a unary operator. If the previous token was a close parenthesis or an operand, we know that the operator can not be unary operator.

Postfix unary operators aren't terribly common. The postfix increment and decrement operators (e.g. x++, y--) are probably the most widely supported. Handling them is just as easy, as when you encounter one of those you need only output it as you generally want it to be applied to the operand you just output. As a side effect, this gives postfix unary operators the highest precedence, which you might not want. The alternative, naturally, is to push it to the stack after popping operators of greater (or greater or equal) precedence, if you want more control. Be aware, also, that you should not consider postfix unary operators as operators when trying to identify prefix unary operators using the method described above.

#### Shunting yard version 3:

Using a single stack for operators and a stream of input tokens:

1. For each token in the input stream:
1. If the token is an operand:
1. Output the token
2. If the token is a binary operator:
1. Pop any operators of greater or equal precedence off the stack and output them
2. Push the operator on to stack
3. If the token is a prefix unary operator:
1. Push the operator on to stack
4. If the token is a postfix unary operator:
1. Output the token
5. If the token is an open parenthesis
1. push it on to the stack
6. If the token is a close parenthesis:
1. Pop any operators off the stack and output them until an open parenthesis is on the top of the stack.
2. Pop and discard the open parenthesis.
2. When there are no more tokens:
1. Pop any remaining tokens off the stack and output them

Let's try the following expression: 1 - -3. We'll denote unary minus as u- to distinguish it from the binary minus operator.

 Output Stack Expression Description [1] - u- 3 The first token is a number. Output it. 1 1 [-] u- 3 The next token is a binary operator. The stack is empty, so push it to the stack. 1 - 1 - [u-] 3 The next token is a unary operator. There are no operators of higher precidence on the stack, so push it on to the stack. 1 - u- 1 - u- [3] The last token is a number. Output it. 1 3 - u- 1 - u- 3 There are no more tokens. Pop and output any remaining operators. 1 3 u- - 1 - u- 3 ... 1 3 u- - 1 - u- 3 ...

### Right-Associative Operators

Another bit of ambiguity infix notation introduces is operator associativity. Associativity determines how operators of equal precedence are prioritized. Consider two expressions 3 + 2 + 1 and 3 - 2 - 1. Addition has the property of associativity, so it doesn't matter if you evaluate the first expression as (3 + 2) + 1 or as 3 + (2 + 1). The order in which the operations are carried out doesn't affect the result. Subtraction, however, is not associative, so (3 - 2) - 1 is not equivalent to 3 - (2 - 1).

The general rule is to assume that operators of equal precedence are left associative and handle operations of equal precedence from left to right. There are, however, cases where we expect operations to be grouped right to left. The exponent operator (typically ^ or **), for example, should be right-associative. Stacked exponents like 321 should be evaluated as 3^(2^1), like Fortran and D, not as (3^2)^1, though many other languages, like Simula and ALGOL, do exactly this.

Right-associative binary operators, unsurprisingly, can be added very easily. They're handled exactly like left-associative binary operators execpt that you only pop operators of lower precidence off the stack.

#### Shunting yard version 4:

Using a single stack for operators and a stream of input tokens:

1. For each token in the input stream:
1. If the token is an operand:
1. Output the token
2. If the token is a left-associative binary operator:
1. Pop any operators of greater or equal precedence off the stack and output them
2. Push the operator on to stack
3. If the token is a right-associative binary operator:
1. Pop any operators of greater precedence off the stack and output them
2. Push the operator on to stack
4. If the token is a prefix unary operator:
1. Push the operator on to stack
5. If the token is a postfix unary operator:
1. Output the token
6. If the token is an open parenthesis
1. push it on to the stack
7. If the token is a close parenthesis:
1. Pop any operators off the stack and output them until an open parenthesis is on the top of the stack.
2. Pop and discard the open parenthesis.
2. When there are no more tokens:
1. Pop any remaining tokens off the stack and output them

Let's try this with our earlier example: 321

 Output Stack Expression Description [3] ^ 2 ^ 1 The first token is a number. Output it. 3 3 [^] 2 ^ 1 The next token is a right-associative operator. As there are no operators on the stack of greater precedence, push it to the stack. 3 ^ 3 ^ [2] ^ 1 The next token is a number. Output it. 3 2 ^ 3 ^ 2 [^] 1 The next token is a right-associative operator. As there are no operators on the stack of greater precedence, push it to the stack. 3 2 ^ ^ 3 ^ 2 ^ [1] The final token is a number. Output it. 3 2 1 ^ ^ 3 ^ 2 ^ 1 There are no more tokens, Pop and output any remaining operators. 3 2 1 ^ ^ 3 ^ 2 ^ 1 ... 3 2 1 ^ ^ 3 ^ 2 ^ 1 ...

Compare the final postfix expression to the resulting postfix expression derived from a similar expression using left-associative operators: 3 + 2 + 1

 Output Stack Expression Description [3] + 2 + 1 The first token is a number. Output it. 3 3 [+] 2 + 1 The next token is a left-associative operator. As there are no operators on the stack of greater or equal precedence, push it to the stack. 3 + 3 + [2] + 1 The next token is a number. Output it. 3 2 + 3 + 2 [+] 1 The next token is a left-associative operator. Pop any operators of greater or equal precidence and output them: 3 2 [+] 3 + 2 [+] 1 ... 3 2 + 3 + 2 [+] 1 ... 3 2 + 3 + 2 [+] 1 ... There are no more operators of greater or equal precidence left on the stack, so push the token on to the stack. 3 2 + + 3 + 2 + [1] The final token is a number. Output it. 3 2 + 1 + 3 + 2 + 1 There are no more tokens, Pop and output any remaining operators. 3 2 + 1 + 3 + 2 + 1 ...

### Functions and Commas

Functions naturally group their operands together in parenthesis. You might think of them as prefix unary operators that can act on more than one operand. As functions can take expressions as arguments, commas are used to separate function arguments. This is common to most languages.

Commas are easy to handle in the general case. When you encounter a comma, you simply treat it like a close parenthesis with the exception that you don't discard the open parenthesis when you find it.

As for functions themselves, when one is encountered you need only push it on to the stack. When you handle a close parenthesis, after popping the corresponding open parenthesis off the stack, you need to check to see if there is a function on top of the stack. If you find one, it should be output. This ensures that the function will immediately follow its operands.

As neat and clean as this process appears to be, there is a problem. At present, there is no way to determine if the function has been passed the correct number of arguments. There's also the problem of variadic functions, or functions without a fixed number of parameters.

In both cases, it's necessary to count the number of arguments being passed to the function. As functions can be nested, we'll need an additonal stack, we'll call the argument stack, to count arguments. We'll pop a value from this new stack and output it before popping our function off the stack. This way, when we later evaluate the postfix expression, the first operand on the stack when we encounter a function will be the number of arguments earlier passed to the function. For functions with a fixed arity, we can compare that number to the number of parameters the function has. For variadic functions, we'll know how many arguments to read from the operand stack.

Counting the number of arguments is simple. When we first encounter a function, we push it to the stack. When we encounter an open parenthesis, and the top of the stack contains a function, we'll push a 1 on to the argument stack. When we encounter a comma, we'll increment the number on the top of argument stack.

Even with all that additional complexity, we're still left with a major problem. That is, how do we handle functions that have no parameters or variadic functions that will accept zero arguments? One option is to define () as an operator, so when we encounter it we know to immediately pop and output the function. Unfortunately, this solution has its own problems. A better, if more complex, option is to push a 0 instead of a 1 on to the argument stack. Then, if we find a 0 there when processing any token other than a close parenthesis, we'll know there was at least one argument and we should increment it.

#### Shunting yard version 5:

Using a stack for operators, a stack for function arguments, and a stream of input tokens:

1. For each token in the input stream:
1. If the token is an operand:
1. Output the token
2. If the token is a left-associative binary operator:
1. Pop any operators of greater or equal precedence off the stack and output them
2. Push the operator on to stack
3. If the token is a right-associative binary operator:
1. Pop any operators of greater precedence off the stack and output them
2. Push the operator on to stack
4. If the token is a prefix unary operator:
1. Push the operator on to stack
5. If the token is a postfix unary operator:
1. Output the token
6. If the token is a function:
1. Push the token on to the stack
7. If the token is an open parenthesis:
1. If the top of the stack contains a function:
1. Push a 0 on to the argument stack
2. Push the token on to the stack
8. If the token is a close parenthesis:
1. Pop any operators off the stack and output them until an open parenthesis is on the top of the stack.
2. Pop and discard the open parenthesis.
3. If the top of the stack contains a function:
1. Pop and output the top of the argument stack
2. Pop and output the function on the stack
9. If the token is a comma:
1. Pop any operators off the stack and output them until an open parenthesis is on the top of the stack.
2. increment the value on the top of the argument stack
10. If the token is NOT a close parenthesis AND there is a 0 on top of the argument stack:
1. Pop the 0 off the argument stack and push a 1 in its place
2. When there are no more tokens:
1. Pop any remaining tokens off the stack and output them
 Output Stack Argument Stack Expression Description [3] + atan2 ( 2 , 5 ) The first token is a number. Output it. 3 3 [+] atan2 ( 2 , 5 ) The next token is an operator. The stack is empty, so push it to the stack. 3 + 3 + [atan2] ( 2 , 5 ) The next token is an function. Push it to the stack. 3 + atan2 3 + atan2 [(] 2 , 5 ) The next token is an open parenthesis: 3 + [atan2] 3 + atan2 [(] 2 , 5 ) ...There is a function on top of the stack. Push a 0 on to the Argument stack. 3 + atan2 0 3 + atan2 [(] 2 , 5 ) ...Push the token on to the stack. 3 + atan2 ( 0 3 + atan2 ( [2] , 5 ) The next token is a number: 3 + atan2 ( [0] 3 + atan2 ( [2] , 5 ) ...There is a 0 on top of the argument stack AND the current token is not a close parenthesis. Increment the value on top of the argument stack. 3 + atan2 ( 1 3 + atan2 ( [2] , 5 ) ...Output the token. 3 2 + atan2 ( 1 3 + atan2 ( 2 [,] 5 ) The next token is a comma: 3 2 + atan2 ( [1] 3 + atan2 ( 2 [,] 5 ) ...Increment the top of the argument stack 3 2 + atan2 ( 2 3 + atan2 ( 2 [,] 5 ) ...Pop and output from the stack until an open parenthesis is found: 3 2 + atan2 [(] 2 3 + atan2 ( 2 [,] 5 ) ... 3 2 + atan2 ( 2 3 + atan2 ( 2 , [5] ) The next token is a number. Output it. 3 2 5 + atan2 ( 2 3 + atan2 ( 2 , 5 [)] The next token is a close parenthesis. Pop and output from the stack until an open parenthesis is found: 3 2 5 + atan2 [(] 2 3 + atan2 ( 2 , 5 [)] ... Discard the close parenthesis. 3 2 5 + [atan2] 2 3 + atan2 ( 2 , 5 [)] ... There is a function on top of the stack: 3 2 5 + [atan2] 2 3 + atan2 ( 2 , 5 [)] ... There is a function on top of the stack: 3 2 5 + atan2 [2] 3 + atan2 ( 2 , 5 [)] ... Pop and output the top of the argument stack. 3 2 5 2 + [atan2] 3 + atan2 ( 2 , 5 [)] ... Pop and output the top of the stack. 3 2 5 2 atan2 + 3 + atan2 ( 2 , 5 ) There are no more tokens, pop and output any tokens remaining on the stack: 3 2 5 2 atan2 [+] 3 + atan2 ( 2 , 5 ) ... 3 2 5 2 atan2 + 3 + atan2 ( 2 , 5 ) ...

## Operator Precidence Table

Below is a table of common operators, their relative precidence, and associativity. As we've seen, not all languages assign the same precidence or associativity to these operators. You shouldn't consider this table definitive. It is merely a guide to help you when implementing the shunting yard algorithm in your own projects.

 Precidence Symbol Arity Associativity Description 14 ( ) 2 Left Parenthesis (grouping) 13 () 2 Left Function call 12 ^, ** 2 Right Power 11 - 1 Right Negation 11 !, ~, not 1 Right Not (logcial or bitwise) 11 ++ 1 Left Postfix Increment 11 -- 1 Left Postfix Decrement 10 * 2 Left Multiplication 10 / 2 Left Division 10 % 2 Left Modulo 9 + 2 Left Addition 9 - 2 Left Subtraction 8 < 2 Left Less-than (relational) 8 > 2 Left Greater-than (relational) 8 <= 2 Left Less-than or equal (relational) 8 >= 2 Left Greater-than or equal (relational) 8 !=, <> 2 Left Not equal (relational) 8 =, == 2 Left Equality (relational) 7 &, &&, and 2 Left And (bitwise) 6 ^, xor 2 Left Exclusive-or (bitwise) 5 |, ||, or 2 Left Or (bitwise) 4 &, &&, and 2 Left And (logcial) 3 ^, xor 2 Left Exclusive-or (logical) 2 |, ||, or 2 Left Or (logcial) 1 = 2 Right Assignment

## Verification

Verifying that the expression was correctly formed is only partially built-in to the algorithm (some errors will result in a stack underflow). You won't know for certain that the expression was correctly formed without a few additional checks.

• When emptying the stack after all the input tokens have been processed, if you encounter an open parenthesis, you'll know there was a missing close parenthesis. (You find missing open parentheses by detecting a stack underflow when processing close parentheses.)
• After you've produced the final output, if there are any items remaining on the parameter stack, you know that the expression wasn't correctly formed.
• For functions of fixed arity, you'll want to check the value on top of the argument stack against the known arity for the function before you output them. A mismatch indicates that too many or too few arguments were passed. Missing parameters (e.g. fn(,2)) won't be caught until you check the final postfix expression.
• You can complete the evaluation by checking to see if the resultant postfix expression is correctly formed. You can even do this as the output is being produced. Using a single scalar variable to represent the size of an imaginary stack, increment it when an operand is output, and decrement it by the operator's arity (or the value of the top of the argument stack, in the case of functions) when an operator is output. When you're done, it should have a value of 1, if the final postfix expression is correctly formed. Note, however, that if that value ever drops below 0 (indicating a stack underflow) the expression isn't correctly formed.

*Note: You could, optionally, evalute a postfix expression as it's being produced. Just use an additonal stack for output. Push output operands on to the stack and pop and process them as operators are output by your shunting yard implementation.