RESSA
impl Iterator for Parser
- Converts stream of
Token
s into AST - Significantly more context
Before we get into how to use ressa
, It is a good idea to briefly touch on the scope of a parser or syntax analyzer. The biggest thing to understand is that we still are not dealing with the semantic meaning of the program. That means ressa
itself won't discover things like assigning to undeclared variables or attempting to call undefined functions because that would require more context. To that end, ressa
's true value isn't realized until it is embedded into another program that provide that context.
With that said ressa
is providing a larger context as compared to what is provided by ress
. It achieves that by wrapping the Scanner
in a struct called Parser
. Essentially Parser
provides a way to keep track of what any given set of Token
s might mean. Parser
also implements Iterator
over the enum ProgramPart
, which has 3 cases representing the 3 different top level JavaScript constructs.
Decl
- a variable/function/class declarationVariable
- A top level variable declaration e.g.let x = 0;
Class
- A named class definition at the top levelFunction
- A named function definition at the top levelImport
- An ES Module import statementExport
- An ES Module export statement
Dir
- A script directive, pretty much just 'use strict'Stmt
- A catch all for all other statementsBlock
- A collection of statements wrapped in curly bracesBreak
- A break statement will exit a loop or labeled statement earlyContinue
- A continue statement will short circuit a loopDebugger
- the literal textdebugger
DoWhile
- A do loop executes the body before testing whether to continueEmpty
- A single semicolonExpr
- A catch-all for everything elseFor
- A c-style for loop e.g.for (var i = 0; i < 100; i++) ;
ForIn
- A for loop that assigns the key of an enumerable at the top of each iterationForOf
- A for loop that assigns the value of an iterable at the top of each iterationIf
- A set of if/else if/else statementsLabeled
- A statement that has been named by an attached identifierReturn
- The return statement that resolves a function's valueSwitch
- A testExpression
and a collection ofCaseStatements
Throw
- The throw keyword followed by anExpression
Try
- A try/catch/finally block for catchingThrow
n itemsVar
- A non-top level variable declarationWhile
- A loop which continues based on a testExpression
With
- An antiquated statement that changes the order of identifier resolution
Stmt
being the real work-horse of the group, while a top level function definition would be a Decl
, a non-top level function definition would be a Statement
. Both Decl
and Statement
themselves are enums representing the different possible variations. Looking further into the Statement
variants, you may notice there is another catch all in the Expr
variant which contains an Expr
(expression) enum which defines an even more granular set of program parts.
Expression
Assignment
- Assigning a value to a variable, this includes any update & assign operations e.g.x = 1
,x +=1
, etcArray
- An array literal e.g.[1,2,3,4]
ArrowFunction
- An arrow function expressionAwait
- Any expression preceded by theawait
keywordCall
- Calling a function or methodClass
- A class expression is a class definition with an optional identifier that is assigned to a variable or used as an argument in aCall
expressionConditional
- Also known as the "ternary" operator e.g.test ? consequent : alternate
Function
- A function expression is a function definition with an optional identifier that is either self executing, assigned to a variable or used as aCall
argumentIdent
- The identifier of a variable, call argument, class, import, export or functionLiteral
- A primitive literalLogical
- Two expressions separated by&&
or||
Member
- Accessing a sub property on something. e.g.[0,1,2][1]
orconsole.log
MetaProperty
- Currently the onlyMetaProperty
is in a function body you can checknew.target
to see if something was called with thenew
keywordNew
- ACall
expression preceded by thenew
keywordObject
- An object literal e.g.{a: 1, b: 2}
Sequence
- Any sequence of expressions separated by commasSpread
- the...
operator followed by an expressionSuperExpression
- Thesuper
pseudo-keyword used for accessing properties of asuper
classTaggedTemplate
- An identifier followed by a template literal see MDN for more infoThisExpression
- Thethis
pseudo-keyword used for accessing instance propertiesUnary
- An operation (that is not an update) that requires on expression as an argument e.g.delete x
,!true
, etcUpdate
- An operation that uses the++
or--
operatorYield
- theyield
contextual keyword followed by an optional expression for use in generator function
Most of the Expr
, Stmt
, and Decl
variants have associated values, to see more information about them check out the documentation. There should be an example and description provided for each of the possible combinations.
With that long winded explanation of the basic structure we are working with let's take a look at how we would use the Parser
.
use ressa::*; static JS: &str = " function Thing(stuff) { this.stuff = stuff; } "; fn main() { let parser = Parser::new(JS).expect("Failed to create parser"); for part in parser { let part = part.expect("Failed to parse part"); println!("{:?}", part); } }
If we were to run the above we would get the following output.
Script([
Decl(
Function(
Function {
id: Some(
"Thing"
),
params: [
Pat(
Identifier(
"stuff"
)
)
],
body: [
Stmt(
Expr(
Assignment(
AssignmentExpr {
operator: Equal,
left: Expr(
Member(
MemberExpr {
object: This,
property: Ident(
"stuff"
),
computed: false
}
)
),
right: Ident(
"stuff"
)
}
)
)
)
],
generator: false,
is_async: false
}
)
)
])
If we walk through the output, we start by seeing that the
- This program consists of a single part which is a
ProgramPart::Decl
- Inside of that is a
Decl::Function
- Inside of that is a
Function
- It has an
id
, which is an optionalIdentifier
, with the value ofSome("Thing")
- It has a one item vec of
Pat
s inparams
- Which is a
Pat::Identifier
- Inside of that is an
Identifier
with the value of "stuff"
- Which is a
- It has a body that is a one item vec of
ProgramPart
s- The item is a
ProgramPart::Stmt
- Which is a
Stmt::Expr
- Inside of that is an
Expr::Assignment
- Inside of that is an
AssignmentExpr
- Which has an
operator
ofEqual
- The
left
hand side is anExpr::Member
- The
object
beingExpr::This
- The
property
beingExpr::Ident
with the value of "stuff"
- The
- The
right
hand side is anExpr::Ident
with the value of "stuff" computed
is false
- Which has an
- The item is a
- It is not a
generator
is_async
is false
- It has an
Phew! That is quite a lot of information! A big part of why we need to be that verbose is because of the "you can do anything" nature of JavaScript. Let's use the MemberExpr
as an example, below are a collection of ways to write a MemberExpr
in JavaScript.
console.log;
console['log'];
const logVar = 'log';
console[logVar];
console[['l','o','g'].join('')];
class Log {
toString() {
return 'log';
}
}
const logToString = new Log();
console[logToString];
function logFunc() {
return 'log';
}
console[logFunc()];
function getConsole() {
return console
}
getConsole()[logFunc()];
getConsole().log;
And with the way JavaScript has evolved this probably isn't an exhaustive list of ways to construct a MemberExpr
. With the level of information ressa
provides we have enough to truly understand the syntactic meaning of the text. This will enable us to build more powerful tools to analyze and/or manipulate any given JavaScript program. With the pervasiveness of print debugging, wouldn't it be nice if we had a tool that would automatically insert a console.log
at the top of every function and method in a program? We could make it print the name of that function and also each of the arguments, let's try and build one.