RESSA

$slides-only$

  • impl Iterator for Parser
  • Converts stream of Tokens into AST
  • Significantly more context $slides-only-end$ $web-only$ Before we get into how to use ressa, It is a good idea to briefly touch on the scope of a parser or syntax analyzer. The biggest thing to understand is that we still are not dealing with the semantic meaning of the program. That means ressa itself won't discover things like assigning to undeclared variables or attempting to call undefined functions because that would require more context. To that end, ressa's true value isn't realized until it is embedded into another program that provide that context.

With that said ressa is providing a larger context as compared to what is provided by ress. It achieves that by wrapping the Scanner in a struct called Parser. Essentially Parser provides a way to keep track of what any given set of Tokens might mean. Parser also implements Iterator over the enum Result<ProgramPart, Error>, the Ok variant has 3 cases representing the 3 different top level JavaScript constructs.

  • Decl - a variable/function/class declaration
    • Var - A top level variable declaration e.g. let x = 0;
    • Class - A named class definition at the top level
    • Func - A named function definition at the top level
    • Import - An ES Module import statement
    • Export - An ES Module export statement
  • Dir - A script directive, pretty much just 'use strict'
  • Stmt - A catch all for all other statements
    • Block - A collection of statements wrapped in curly braces
    • Break - A break statement will exit a loop or labeled statement early
    • Continue - A continue statement will short circuit a loop
    • Debugger - the literal text debugger
    • DoWhile - A do loop executes the body before testing whether to continue
    • Empty - A single semicolon
    • Expr - A catch-all for everything else
    • For - A c-style for loop e.g. for (var i = 0; i < 100; i++) ;
    • ForIn - A for loop that assigns the key of an enumerable at the top of each iteration
    • ForOf - A for loop that assigns the value of an iterable at the top of each iteration
    • If - A set of if/else if/else statements
    • Labeled - A statement that has been named by an attached identifier
    • Return - The return statement that resolves a function's value
    • Switch - A test Expression and a collection of CaseStmts
    • Throw - The throw keyword followed by an Expression
    • Try - A try/catch/finally block for catching Thrown items
    • Var - A non-top level variable declaration
    • While - A loop which continues based on a test Expression
    • With - An antiquated statement that changes the order of identifier resolution

Stmt being the real work-horse of the group, while a top level function definition would be a Decl, a non-top level function definition would be a Stmt. Both Decl and Stmt themselves are enums representing the different possible variations. Looking further into the Stmt variants, you may notice there is another catch all in the Expr variant which contains an Expr (expression) enum which defines an even more granular set of program parts.

  • Expr
    • Assign - Assigning a value to a variable, this includes any update & assign operations e.g. x = 1, x +=1, etc
    • Array - An array literal e.g. [1,2,3,4]
    • ArrowFunc - An arrow function expression
    • Await - Any expression preceded by the await keyword
    • Call - Calling a function or method
    • Class - A class expression is a class definition with an optional identifier that is assigned to a variable or used as an argument in a Call expression
    • Conditional - Also known as the "ternary" operator e.g. test ? consequent : alternate
    • Func - A function expression is a function definition with an optional identifier that is either self executing, assigned to a variable or used as a Call argument
    • Ident - The identifier of a variable, call argument, class, import, export or function
    • Lit - A primitive literal
    • Logical - Two expressions separated by && or ||
    • Member - Accessing a sub property on something. e.g. [0,1,2][1] or console.log
    • MetaProp - Currently the only MetaProperty is in a function body you can check new.target to see if something was called with the new keyword
    • New - A Call expression preceded by the new keyword
    • Obj - An object literal e.g. {a: 1, b: 2}
    • Seq - Any sequence of expressions separated by commas
    • Spread - the ... operator followed by an expression
    • Super - The super pseudo-keyword used for accessing properties of a super class
    • TaggedTemplate - An identifier followed by a template literal see MDN for more info
    • This - The this pseudo-keyword used for accessing instance properties
    • Unary - An operation (that is not an update) that requires on expression as an argument e.g. delete x, !true, etc
    • Update - An operation that uses the ++ or -- operator
    • Yield - the yield contextual keyword followed by an optional expression for use in generator function

Most of the Expr, Stmt, and Decl variants have associated values, to see more information about them check out the documentation. There should be an example and description provided for each of the possible combinations.

With that long winded explanation of the basic structure we are working with let's take a look at how we would use the Parser. In this example we have a javascript snippet that defines a function 'Thing', it will assign the first argument stuff to a property of the function this.stuff. $web-only-end$

use ressa::*;

static JS: &str = "
function Thing(stuff) {
    this.stuff = stuff;
}
";

fn main() {
    let parser = Parser::new(JS).expect("Failed to create parser");
    for part in parser {
        let part = part.expect("Failed to parse part");
        println!("{:?}", part);
    }
}

$web-only$ If we were to run the above we would get the following output. $web-only-end$

Decl(
    Func(
        Func {
            id: Some(
                Ident {
                    name: "Thing",
                },
            ),
            params: [
                Pat(
                    Ident(
                        Ident {
                            name: "stuff",
                        },
                    ),
                ),
            ],
            body: FuncBody(
                [
                    Stmt(
                        Expr(
                            Assign(
                                AssignExpr {
                                    operator: Equal,
                                    left: Expr(
                                        Member(
                                            MemberExpr {
                                                object: This,
                                                property: Ident(
                                                    Ident {
                                                        name: "stuff",
                                                    },
                                                ),
                                                computed: false,
                                            },
                                        ),
                                    ),
                                    right: Ident(
                                        Ident {
                                            name: "stuff",
                                        },
                                    ),
                                },
                            ),
                        ),
                    ),
                ],
            ),
            generator: false,
            is_async: false,
        },
    ),
)

$web-only$ If we walk through the output, we start by seeing that the

  1. This program consists of a single part which is a ProgramPart::Decl
  2. Inside of that is a Decl::Func
  3. Inside of that is a Func
    1. It has an id, which is an optional Ident, with the name of Some("Thing")
    2. It has a one item vec of Pats in params
      1. Which is a Pat::Identifier
      2. Inside of that is an Identifier with the value of "stuff"
    3. It has a body that is a one item vec of ProgramParts
      1. The item is a ProgramPart::Stmt
      2. Which is a Stmt::Expr
      3. Inside of that is an Expr::Assign
      4. Inside of that is an AssignExpr
        1. Which has an operator of Equal
        2. The left hand side is an Expr::Member
        3. Inside of that is a MemberExpr
          1. The object being Expr::This
          2. The property being Expr::Ident with the name of "stuff"
        4. The right hand side is an Expr::Ident with the name of "stuff"
        5. computed is false
    4. It is not a generator
    5. is_async is false

Phew! That is quite a lot of information! A big part of why we need to be that verbose is because of the "you can do anything" nature of JavaScript. Let's use the MemberExpr as an example, below are a collection of ways to write a MemberExpr in JavaScript.

console.log;//member expr
console['log']; //member expr
const logVar = 'log';
console[logVar];//member expr
console[['l','o','g'].join('')];//member expr
class Log {
    toString() {
        return 'log';
    }
}
const logToString = new Log();
console[logToString];//member expr
function logFunc() {
    return 'log';
}
console[logFunc()];//member expr
function getConsole() {
    return console
}
getConsole()[logFunc()];//member expr
getConsole().log;//member expr

And with the way JavaScript has evolved this probably isn't an exhaustive list of ways to construct a MemberExpr. With the level of information ressa provides we have enough to truly understand the syntactic meaning of the text. This will enable us to build more powerful tools to analyze and/or manipulate any given JavaScript program. With the pervasiveness of print debugging, wouldn't it be nice if we had a tool that would automatically insert a console.log at the top of every function and method in a program? We could make it print the name of that function and also each of the arguments, let's try and build one. $web-only-end$