RESSA
$slides-only$
impl Iterator for Parser- Converts stream of
Tokens into AST - Significantly more context
$slides-only-end$
$web-only$
Before we get into how to use
ressa, It is a good idea to briefly touch on the scope of a parser or syntax analyzer. The biggest thing to understand is that we still are not dealing with the semantic meaning of the program. That meansressaitself won't discover things like assigning to undeclared variables or attempting to call undefined functions because that would require more context. To that end,ressa's true value isn't realized until it is embedded into another program that provide that context.
With that said ressa is providing a larger context as compared to what is provided by ress. It achieves that by wrapping the Scanner in a struct called Parser. Essentially Parser provides a way to keep track of what any given set of Tokens might mean. Parser also implements Iterator over the enum Result<ProgramPart, Error>, the Ok variant has 3 cases representing the 3 different top level JavaScript constructs.
Decl- a variable/function/class declarationVar- A top level variable declaration e.g.let x = 0;Class- A named class definition at the top levelFunc- A named function definition at the top levelImport- An ES Module import statementExport- An ES Module export statement
Dir- A script directive, pretty much just 'use strict'Stmt- A catch all for all other statementsBlock- A collection of statements wrapped in curly bracesBreak- A break statement will exit a loop or labeled statement earlyContinue- A continue statement will short circuit a loopDebugger- the literal textdebuggerDoWhile- A do loop executes the body before testing whether to continueEmpty- A single semicolonExpr- A catch-all for everything elseFor- A c-style for loop e.g.for (var i = 0; i < 100; i++) ;ForIn- A for loop that assigns the key of an enumerable at the top of each iterationForOf- A for loop that assigns the value of an iterable at the top of each iterationIf- A set of if/else if/else statementsLabeled- A statement that has been named by an attached identifierReturn- The return statement that resolves a function's valueSwitch- A testExpressionand a collection ofCaseStmtsThrow- The throw keyword followed by anExpressionTry- A try/catch/finally block for catchingThrown itemsVar- A non-top level variable declarationWhile- A loop which continues based on a testExpressionWith- An antiquated statement that changes the order of identifier resolution
Stmt being the real work-horse of the group, while a top level function definition would be a Decl, a non-top level function definition would be a Stmt. Both Decl and Stmt themselves are enums representing the different possible variations. Looking further into the Stmt variants, you may notice there is another catch all in the Expr variant which contains an Expr (expression) enum which defines an even more granular set of program parts.
ExprAssign- Assigning a value to a variable, this includes any update & assign operations e.g.x = 1,x +=1, etcArray- An array literal e.g.[1,2,3,4]ArrowFunc- An arrow function expressionAwait- Any expression preceded by theawaitkeywordCall- Calling a function or methodClass- A class expression is a class definition with an optional identifier that is assigned to a variable or used as an argument in aCallexpressionConditional- Also known as the "ternary" operator e.g.test ? consequent : alternateFunc- A function expression is a function definition with an optional identifier that is either self executing, assigned to a variable or used as aCallargumentIdent- The identifier of a variable, call argument, class, import, export or functionLit- A primitive literalLogical- Two expressions separated by&&or||Member- Accessing a sub property on something. e.g.[0,1,2][1]orconsole.logMetaProp- Currently the onlyMetaPropertyis in a function body you can checknew.targetto see if something was called with thenewkeywordNew- ACallexpression preceded by thenewkeywordObj- An object literal e.g.{a: 1, b: 2}Seq- Any sequence of expressions separated by commasSpread- the...operator followed by an expressionSuper- Thesuperpseudo-keyword used for accessing properties of asuperclassTaggedTemplate- An identifier followed by a template literal see MDN for more infoThis- Thethispseudo-keyword used for accessing instance propertiesUnary- An operation (that is not an update) that requires on expression as an argument e.g.delete x,!true, etcUpdate- An operation that uses the++or--operatorYield- theyieldcontextual keyword followed by an optional expression for use in generator function
Most of the Expr, Stmt, and Decl variants have associated values, to see more information about them check out the documentation. There should be an example and description provided for each of the possible combinations.
With that long winded explanation of the basic structure we are working with let's take a look at how we would use the Parser. In this example we have a javascript snippet that defines a function 'Thing', it will assign the first argument stuff to a property of the function this.stuff.
$web-only-end$
use ressa::*; static JS: &str = " function Thing(stuff) { this.stuff = stuff; } "; fn main() { let parser = Parser::new(JS).expect("Failed to create parser"); for part in parser { let part = part.expect("Failed to parse part"); println!("{:?}", part); } }
$web-only$ If we were to run the above we would get the following output. $web-only-end$
Decl(
Func(
Func {
id: Some(
Ident {
name: "Thing",
},
),
params: [
Pat(
Ident(
Ident {
name: "stuff",
},
),
),
],
body: FuncBody(
[
Stmt(
Expr(
Assign(
AssignExpr {
operator: Equal,
left: Expr(
Member(
MemberExpr {
object: This,
property: Ident(
Ident {
name: "stuff",
},
),
computed: false,
},
),
),
right: Ident(
Ident {
name: "stuff",
},
),
},
),
),
),
],
),
generator: false,
is_async: false,
},
),
)
$web-only$ If we walk through the output, we start by seeing that the
- This program consists of a single part which is a
ProgramPart::Decl - Inside of that is a
Decl::Func - Inside of that is a
Func- It has an
id, which is an optionalIdent, with the name ofSome("Thing") - It has a one item vec of
Pats inparams- Which is a
Pat::Identifier - Inside of that is an
Identifierwith the value of "stuff"
- Which is a
- It has a body that is a one item vec of
ProgramParts- The item is a
ProgramPart::Stmt - Which is a
Stmt::Expr - Inside of that is an
Expr::Assign - Inside of that is an
AssignExpr- Which has an
operatorofEqual - The
lefthand side is anExpr::Member - Inside of that is a
MemberExpr- The
objectbeingExpr::This - The
propertybeingExpr::Identwith the name of "stuff"
- The
- The
righthand side is anExpr::Identwith the name of "stuff" computedis false
- Which has an
- The item is a
- It is not a
generator is_asyncis false
- It has an
Phew! That is quite a lot of information! A big part of why we need to be that verbose is because of the "you can do anything" nature of JavaScript. Let's use the MemberExpr as an example, below are a collection of ways to write a MemberExpr in JavaScript.
console.log;//member expr
console['log']; //member expr
const logVar = 'log';
console[logVar];//member expr
console[['l','o','g'].join('')];//member expr
class Log {
toString() {
return 'log';
}
}
const logToString = new Log();
console[logToString];//member expr
function logFunc() {
return 'log';
}
console[logFunc()];//member expr
function getConsole() {
return console
}
getConsole()[logFunc()];//member expr
getConsole().log;//member expr
And with the way JavaScript has evolved this probably isn't an exhaustive list of ways to construct a MemberExpr. With the level of information ressa provides we have enough to truly understand the syntactic meaning of the text. This will enable us to build more powerful tools to analyze and/or manipulate any given JavaScript program. With the pervasiveness of print debugging, wouldn't it be nice if we had a tool that would automatically insert a console.log at the top of every function and method in a program? We could make it print the name of that function and also each of the arguments, let's try and build one.
$web-only-end$