An Overview
$web-only$
To get started building development tools using the Rust programming language, we are going to be utilizing 3 crates. The first is a crate called ress
or Rusty ECMAScript Scanner, this crate is used to convert JavaScript text into a series of Token
s. Next is ressa
or Rusty ECMAScript Syntax Analyzer, this crate will take that series of Token
s and build an Abstract Syntax Tree or AST. This AST is provided by a third crate resast
. Either of these tools will be useful for building development tools however since the output of ress
is essentially flat it means we can only build a much simpler kind of tool. Over the course of this book we will cover the basics of how to build a development tool with either of these crates.
$web-only-end$
$slides-only$
- What is RESS
- Overview
- Demo Project
- What is RESSA
- Overview
- Demo Project
- What is RESW (maybe)
- Overview $slides-only-end$
RESS
$slides-only$
impl Iterator for Scanner
- Converts text into
Token
s - Flat Structure
$slides-only-end$
$web-only$
Before we start on any examples let's dig a little into what
ress
does. The job of a scanner (sometimes called a tokenizer or lexer) in the parsing process is to convert raw text or bytes into logically separated parts called tokens andress
does just that. It reads your JavaScript text and then tells you what a given word or symbol might represent. It does this through theScanner
interface, to construct a scanner you pass it the text you would like it to tokenize.
$web-only-end$
#![allow(unused)] fn main() { let js = "var i = 0;"; let scanner = Scanner::new(js); }
$web-only$
Now that you have prepared a scanner, how do we use it? Well, the Scanner
implements Iterator
so we can actually use it in a for loop like so.
#![allow(unused)] fn main() { for token in scanner { println!("{:#?}", token); } }
If we were to run the above program it would print to the terminal the following. $web-only-end$
Item {
token: Keyword(
Var,
),
span: Span {
start: 0,
end: 3,
},
location: SourceLocation {
start: Position {
line: 1,
column: 1,
},
end: Position {
line: 1,
column: 4,
},
},
}
Item {
token: Ident(
Ident(
"i",
),
),
span: Span {
start: 4,
end: 5,
},
location: SourceLocation {
start: Position {
line: 1,
column: 5,
},
end: Position {
line: 1,
column: 6,
},
},
}
Item {
token: Punct(
Equal,
),
span: Span {
start: 6,
end: 7,
},
location: SourceLocation {
start: Position {
line: 1,
column: 7,
},
end: Position {
line: 1,
column: 8,
},
},
}
Item {
token: Number(
Number(
"0",
),
),
span: Span {
start: 8,
end: 9,
},
location: SourceLocation {
start: Position {
line: 1,
column: 9,
},
end: Position {
line: 1,
column: 10,
},
},
}
Item {
token: Punct(
SemiColon,
),
span: Span {
start: 9,
end: 10,
},
location: SourceLocation {
start: Position {
line: 1,
column: 10,
},
end: Position {
line: 1,
column: 11,
},
},
}
Item {
token: EoF,
span: Span {
start: 10,
end: 10,
},
location: SourceLocation {
start: Position {
line: 1,
column: 11,
},
end: Position {
line: 1,
column: 11,
},
},
}
$web-only$
The scanner's ::next()
method returns an Result<Item, Error>
the Ok
variant has 3 properties token
, span
and location
. The span
is the byte index that starts and ends the token, the location
property is the human readable location of the token, the token
property is going to be one variant of the Token
enum which has the following variants.
Token::Boolean(BooleanLiteral)
- The texttrue
orfalse
Token::Ident(Ident)
- A variable, function, or class nameToken::Null
- The textnull
Token::Keyword(Keyword)
- One of the 42 reserved words e.g.function
,var
,delete
, etcToken::Numeric(Number)
- A number literal, this can be an integer, a float, scientific notation, binary notation, octal notation, or hexadecimal notation e.g.1.5e9
,0xfff
, etcToken::Punct(Punct)
- One of the 52+ reserved symbols or combinations of symbols e.g.*
,&&
,=>
, etcToken::String(StringLit)
- Either a double or single quoted stringToken::RegEx(RegEx)
- A Regular Expression literal e.g./.+/g
Token::Template(Template)
- A template string literal e.g.one ${2} three
Token::Comment(Comment)
- A single line, multi-line or html comment
For a more in depth look at these tokens, take a look at the Appendix
Overall the output of our scanner isn't going to provide any context for these tokens, that means when we are building our development tools it is going to be a little harder to figure out what is going on with any given token. One way we could take that is to just build a tool that is only concerned with the token level of information. Say you work on a team of JavaScript developers that need to adhere to a strict code style because the organization needs their website to be usable in Internet Explorer 8. With that restriction there are a large number of APIs that are off the table, looking over this list we can see how big that really is. It could be useful to have a linter that will check for the keywords and identifiers that are not available in IE8. let's try and build one.
$web-only-end$
Building an IE8 Linter
$web-only$
To get started we need to add ress
to our dependencies. This project is also going to need serde
, serde_derive
and toml
because it will rely on a .toml
file to make the list of unavailable tokens configurable.
[package]
name = "lint-ie8"
version = "0.1.0"
authors = ["Robert Masen <r@robertmasen.pizza>"]
edition = "2018"
[dependencies]
ress = "0.7"
serde = "1"
serde_derive = "1"
toml = "0.5"
Next we want to use the Scanner
and Token
from ress
, we can do this by importing all the contents of the prelude
.
#![allow(unused)] fn main() { use ress::prelude::*; }
Since we are using a .toml
file to provide the list of banned tokens, let's create a struct that will represent our configuration.
#![allow(unused)] fn main() { #[derive(Deserialize)] struct BannedTokens { idents: Vec<String>, keywords: Vec<String>, puncts: Vec<String>, strings: Vec<String>, } }
The toml file we are going to use is pretty big so but if you want to see what it looks like you can check it out here. Essentially it is a list of identifiers, strings, punctuation, and keywords that would cause an error when trying to run in IE8.
To start we need to deserialize that file, we can do that with the std::fs::read_to_string
and toml::from_str
functions.
#![allow(unused)] fn main() { let config_text = ::std::fs::read_to_string("banned_tokens.toml").expect("failed to read config"); let banned: BannedTokens = from_str(&config_text).expect("Failed to deserialize banned tokens"); }
Now that we have a list of tokens that should not be included in our javascript, let's get the js text. It would be useful to be able to take a path argument or read the raw js from stdin. This function will check for an argument first and fallback to reading from stdin, it looks something like this.
#![allow(unused)] fn main() { fn get_js() -> Result<String, ::std::io::Error> { let mut cmd_args = args(); let _ = cmd_args.next(); //discard bin name let js = if let Some(file_name) = cmd_args.next() { let js = read_to_string(file_name)?; js } else { let mut std_in = ::std::io::stdin(); let mut ret = String::new(); if std_in.is_terminal() { return Ok(ret) } std_in.read_to_string(&mut ret)?; ret }; Ok(js) } }
we will call it like this.
#![allow(unused)] fn main() { let js = match get_js() { Ok(js) => if js.len() == 0 { print_usage(); std::process::exit(1); } else { js }, Err(_) => { print_usage(); std::process::exit(1); } }; let finder = BannedFinder::new(&js, banned); }
We want to handle the failure when attempting to get the js, so we will match on the call to get_js
. If everything went well we need to check if the text is an empty string, this means no argument was provided but the program was not pipped any text. In either of these failure cases we want to print a nice message about how the command should have been written and then exit with a non-zero status code. print_usage
is a pretty simple function that will just print to stdout the two ways to use the program.
#![allow(unused)] fn main() { fn print_usage() { println!("banned_tokens <infile> cat <path/to/file> | banned_tokens"); } }
With that out of the way, we now can get into how we are going to solve the actual problem of finding these tokens in a javascript file. There are many ways to make this work but for this example we are going to wrap the Scanner
in another struct that implements Iterator
. First here is what that struct is going to look like.
#![allow(unused)] fn main() { struct BannedFinder<'a> { scanner: Scanner<'a>, banned: BannedTokens, } }
Before we get into the impl Iterator
we should go over an Error
implementation that we are going to use. It is relatively straight forward, the actual struct is going to be a tuple struct with three items. The first item is going to be a message that will include the token and type, the second and third are going to be the column/row of the banned token. We need to implement display (Error
requires it) which will just create a nice error message for us.
#![allow(unused)] fn main() { #[derive(Debug)] pub struct BannedError(String, usize, usize); impl ::std::error::Error for BannedError { } impl ::std::fmt::Display for BannedError { fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result { write!(f, "Banned {} found at {}:{}", self.0, self.1, self.2) } } }
Now we can add a method to BannedFinder
that will take an index and return the row/column pair.
Ok, now for the exciting part; we are going to impl Iterator for BannedFinder
which will look like this.
#![allow(unused)] fn main() { impl<'a> Iterator for BannedFinder<'a> { type Item = Result<(), BannedError>; fn next(&mut self) -> Option<Self::Item> { if let Some(item) = self.scanner.next() { match item { Ok(item) => { Some(match &item.token { Token::Ident(ref id) => { let id = id.to_string(); if self.banned.idents.contains(&id) { Err(BannedError(format!("identifier {}", id), item.location.start.line, item.location.start.column)) } else { Ok(()) } }, Token::Keyword(ref key) => { if self.banned.keywords.contains(&key.to_string()) { Err(BannedError(format!("keyword {}", key.to_string()), item.location.start.line, item.location.start.column)) } else { Ok(()) } }, Token::Punct(ref punct) => { if self.banned.puncts.contains(&punct.to_string()) { Err(BannedError(format!("punct {}", punct.to_string()), item.location.start.line, item.location.start.column)) } else { Ok(()) } }, Token::String(ref lit) => { match lit { StringLit::Double(inner) | StringLit::Single(inner) => { if self.banned.strings.contains(&inner.to_string()) { Err(BannedError(format!("string {}", lit.to_string()), item.location.start.line, item.location.start.column)) } else { Ok(()) } } } }, _ => Ok(()), }) }, Err(_) => { None } } } else { None } } } }
First we need to define what the Item
for our Iterator
is. It is going to be a Result<(), BannedError>
, this will allow the caller to check if an item passed inspection. Now we can add the fn next(&mut self) -> Option<Self::Item>
definition. Inside that we first want to make sure that the Scanner
isn't returning None
, if it is we can just return None
. If the scanner returns and Result<Item, Error>
we first need to check that it is Ok
, in this example we are just going to ignore the Err
case. Once we have an actual Item
we want to check what kind of token it is, we can do that by matching on &item.token
. We only care if the token is a Keyword
, Ident
, Punct
or String
, other wise we can say that the token passed. For each of these tokens we are going to check if the actual text is included in any of the Vec<String>
properties of self.banned
, if it is included we return a BannedError
where the first property is a message containing the name of the token type and the text that token represents.
Now that we have all of the underlying infrastructure setup, let's use the BannedFinder
in our main
.
#![allow(unused)] fn main() { let finder = BannedFinder::new(&js, banned); for item in finder { match item { Ok(_) => (), Err(msg) => println!("{}", msg), } } }
That is pretty much it. If you wanted to see the full project you can find it in the lint-ie8 folder of this book's github repository.
$web-only-end$ $slides-only$
Demo
$slides-only-end$
RESSA
$slides-only$
impl Iterator for Parser
- Converts stream of
Token
s into AST - Significantly more context
$slides-only-end$
$web-only$
Before we get into how to use
ressa
, It is a good idea to briefly touch on the scope of a parser or syntax analyzer. The biggest thing to understand is that we still are not dealing with the semantic meaning of the program. That meansressa
itself won't discover things like assigning to undeclared variables or attempting to call undefined functions because that would require more context. To that end,ressa
's true value isn't realized until it is embedded into another program that provide that context.
With that said ressa
is providing a larger context as compared to what is provided by ress
. It achieves that by wrapping the Scanner
in a struct called Parser
. Essentially Parser
provides a way to keep track of what any given set of Token
s might mean. Parser
also implements Iterator
over the enum Result<ProgramPart, Error>
, the Ok
variant has 3 cases representing the 3 different top level JavaScript constructs.
Decl
- a variable/function/class declarationVar
- A top level variable declaration e.g.let x = 0;
Class
- A named class definition at the top levelFunc
- A named function definition at the top levelImport
- An ES Module import statementExport
- An ES Module export statement
Dir
- A script directive, pretty much just 'use strict'Stmt
- A catch all for all other statementsBlock
- A collection of statements wrapped in curly bracesBreak
- A break statement will exit a loop or labeled statement earlyContinue
- A continue statement will short circuit a loopDebugger
- the literal textdebugger
DoWhile
- A do loop executes the body before testing whether to continueEmpty
- A single semicolonExpr
- A catch-all for everything elseFor
- A c-style for loop e.g.for (var i = 0; i < 100; i++) ;
ForIn
- A for loop that assigns the key of an enumerable at the top of each iterationForOf
- A for loop that assigns the value of an iterable at the top of each iterationIf
- A set of if/else if/else statementsLabeled
- A statement that has been named by an attached identifierReturn
- The return statement that resolves a function's valueSwitch
- A testExpression
and a collection ofCaseStmt
sThrow
- The throw keyword followed by anExpression
Try
- A try/catch/finally block for catchingThrow
n itemsVar
- A non-top level variable declarationWhile
- A loop which continues based on a testExpression
With
- An antiquated statement that changes the order of identifier resolution
Stmt
being the real work-horse of the group, while a top level function definition would be a Decl
, a non-top level function definition would be a Stmt
. Both Decl
and Stmt
themselves are enums representing the different possible variations. Looking further into the Stmt
variants, you may notice there is another catch all in the Expr
variant which contains an Expr
(expression) enum which defines an even more granular set of program parts.
Expr
Assign
- Assigning a value to a variable, this includes any update & assign operations e.g.x = 1
,x +=1
, etcArray
- An array literal e.g.[1,2,3,4]
ArrowFunc
- An arrow function expressionAwait
- Any expression preceded by theawait
keywordCall
- Calling a function or methodClass
- A class expression is a class definition with an optional identifier that is assigned to a variable or used as an argument in aCall
expressionConditional
- Also known as the "ternary" operator e.g.test ? consequent : alternate
Func
- A function expression is a function definition with an optional identifier that is either self executing, assigned to a variable or used as aCall
argumentIdent
- The identifier of a variable, call argument, class, import, export or functionLit
- A primitive literalLogical
- Two expressions separated by&&
or||
Member
- Accessing a sub property on something. e.g.[0,1,2][1]
orconsole.log
MetaProp
- Currently the onlyMetaProperty
is in a function body you can checknew.target
to see if something was called with thenew
keywordNew
- ACall
expression preceded by thenew
keywordObj
- An object literal e.g.{a: 1, b: 2}
Seq
- Any sequence of expressions separated by commasSpread
- the...
operator followed by an expressionSuper
- Thesuper
pseudo-keyword used for accessing properties of asuper
classTaggedTemplate
- An identifier followed by a template literal see MDN for more infoThis
- Thethis
pseudo-keyword used for accessing instance propertiesUnary
- An operation (that is not an update) that requires on expression as an argument e.g.delete x
,!true
, etcUpdate
- An operation that uses the++
or--
operatorYield
- theyield
contextual keyword followed by an optional expression for use in generator function
Most of the Expr
, Stmt
, and Decl
variants have associated values, to see more information about them check out the documentation. There should be an example and description provided for each of the possible combinations.
With that long winded explanation of the basic structure we are working with let's take a look at how we would use the Parser
. In this example we have a javascript snippet that defines a function 'Thing', it will assign the first argument stuff
to a property of the function this.stuff
.
$web-only-end$
use ressa::*; static JS: &str = " function Thing(stuff) { this.stuff = stuff; } "; fn main() { let parser = Parser::new(JS).expect("Failed to create parser"); for part in parser { let part = part.expect("Failed to parse part"); println!("{:?}", part); } }
$web-only$ If we were to run the above we would get the following output. $web-only-end$
Decl(
Func(
Func {
id: Some(
Ident {
name: "Thing",
},
),
params: [
Pat(
Ident(
Ident {
name: "stuff",
},
),
),
],
body: FuncBody(
[
Stmt(
Expr(
Assign(
AssignExpr {
operator: Equal,
left: Expr(
Member(
MemberExpr {
object: This,
property: Ident(
Ident {
name: "stuff",
},
),
computed: false,
},
),
),
right: Ident(
Ident {
name: "stuff",
},
),
},
),
),
),
],
),
generator: false,
is_async: false,
},
),
)
$web-only$ If we walk through the output, we start by seeing that the
- This program consists of a single part which is a
ProgramPart::Decl
- Inside of that is a
Decl::Func
- Inside of that is a
Func
- It has an
id
, which is an optionalIdent
, with the name ofSome("Thing")
- It has a one item vec of
Pat
s inparams
- Which is a
Pat::Identifier
- Inside of that is an
Identifier
with the value of "stuff"
- Which is a
- It has a body that is a one item vec of
ProgramPart
s- The item is a
ProgramPart::Stmt
- Which is a
Stmt::Expr
- Inside of that is an
Expr::Assign
- Inside of that is an
AssignExpr
- Which has an
operator
ofEqual
- The
left
hand side is anExpr::Member
- Inside of that is a
MemberExpr
- The
object
beingExpr::This
- The
property
beingExpr::Ident
with the name of "stuff"
- The
- The
right
hand side is anExpr::Ident
with the name of "stuff" computed
is false
- Which has an
- The item is a
- It is not a
generator
is_async
is false
- It has an
Phew! That is quite a lot of information! A big part of why we need to be that verbose is because of the "you can do anything" nature of JavaScript. Let's use the MemberExpr
as an example, below are a collection of ways to write a MemberExpr
in JavaScript.
console.log;//member expr
console['log']; //member expr
const logVar = 'log';
console[logVar];//member expr
console[['l','o','g'].join('')];//member expr
class Log {
toString() {
return 'log';
}
}
const logToString = new Log();
console[logToString];//member expr
function logFunc() {
return 'log';
}
console[logFunc()];//member expr
function getConsole() {
return console
}
getConsole()[logFunc()];//member expr
getConsole().log;//member expr
And with the way JavaScript has evolved this probably isn't an exhaustive list of ways to construct a MemberExpr
. With the level of information ressa
provides we have enough to truly understand the syntactic meaning of the text. This will enable us to build more powerful tools to analyze and/or manipulate any given JavaScript program. With the pervasiveness of print debugging, wouldn't it be nice if we had a tool that would automatically insert a console.log
at the top of every function and method in a program? We could make it print the name of that function and also each of the arguments, let's try and build one.
$web-only-end$
Building a Debug Helper
$slides-only$
Demo
$slides-only-end$ $web-only$ To simplify things, we are just going to lift the technique for getting the JavaScript text from the ress example, so we won't be covering that again.
With that out of the way let's take a look at the Cargo.toml
and use
statements for our program.
[package]
name = "console_logify"
version = "0.1.0"
authors = ["Robert Masen <r@robertmasen.pizza>"]
edition = "2018"
[dependencies]
ressa = "0.7.0-beta-7"
resw = "0.4.0-beta-1"
resast = "0.4"
#![allow(unused)] fn main() { use ressa::Parser; use resw::Writer; use resast::prelude::*; }
This will make sure that all of the items we will need from ressa
and resast
are in scope. Now we can start defining our method for inserting the debug logging into any functions that we find. To start we are going to create a function that will generate a new ProgramPart::Stmt
that will represent our call to console.log
which might look like this.
#![allow(unused)] fn main() { pub fn console_log<'a>(args: Vec<Expr<'a>>) -> ProgramPart<'a> { ProgramPart::Stmt(Stmt::Expr(Expr::Call( CallExpr { callee: Box::new(Expr::Member( MemberExpr { computed: false, object: Box::new(Expr::ident_from("console")), property: Box::new(Expr::ident_from("log")), } )), arguments: args, } ))) } }
This signature might look a little intimidating with all the lifetime annotations, the reason they need to be there is that at the heart of every resast
node is a Cow
(Clone On Write) slice of the originally javascript string. By putting it in a Cow
that makes it possible to more easily manipulate the tree without having to pay the cost of allocating a new string for every node at parse time. The lifetime annotations just tell the compiler that our argument and our return value will live the same lifetime, since our arguments are going to be embedded in our return value. We will end up using this pattern quite often in this example, now let's go over what is actually happening here. We will take in the args
to supplu the arguments passed into console.log
as our only argument.
Now we are going to build the tree that represents the javascript, which will look like this:
ProgramPart
Stmt
Expr
CallExpr
callee
Expr
MemberExpr
computed
:false
object
Expr
Ident
name
:"console"
property
Expr
Ident
name
:"log"
arguments
Vec<Expr>
It might be easier to start from the inner most structure, the MemberExpr
, this represents the console.log
portion of the desired output. First, we want to set the computed
property to false, this means we are using a .
instead of []
, next we need to define the object
which will be the identifier console
and the property
which will be the identifer log
. We nest this inside of a CallExpr
as the callee
, this represents everything up to the opening parenthesis. The second property arguments
will, as the name suggests, represent the the arguments, we'll simply assign that to the args
provided by the caller. Moving up the tree we wrap the CallExpr
in a Expr
, and a Stmt
and a ProgramPart
.
Next, let's work on a few more helper functions, first up is one that will insert a ProgramPart
to the top of a FuncBody
.
#![allow(unused)] fn main() { fn insert_expr_into_func_body<'a>(expr: ProgramPart<'a>, body: &mut FuncBody<'a>) { body.0.insert(0, expr); } }
This one is pretty straight forward, we take the part and a mutable reference to the body we are modifying. A FuncBody
is a tuple struct that wraps a Vec<ProgrgramPart>
, this means we can use the insert
method on Vec
to add the new item to the first position.
Another useful utility would be a way to convert an Ident
into a StringLit
, it is something that we will be doing quite often.
#![allow(unused)] fn main() { fn ident_to_string_lit<'a>(i: &Ident<'a>) -> Expr<'a> { Expr::Lit(Lit::String(StringLit::Single(i.name.clone()))) } }
This one is also pretty straight forwrard, we take a reference to an Ident
and clone the name
property into a StringLit::Single
, we want to wrap that up into an Expr
, to do that we need to wrap it in a Lit::String
first.
To continue that theme, let's build another function that takes in an expression and returns that expression's representation as a StringLit
. To start, let's build a function that converts an Expr
into a rust String
. The problem is that not all Expr
s can be easily converted into a rust String
. This will be a good opportunity to use the Option
type to filter out any of the expressions we might not want to pass into console.log
.
#![allow(unused)] fn main() { fn expr_to_string(expr: &Expr) -> Option<String> { match expr { Expr::Ident(ref ident) => Some(ident.name.to_string()), Expr::This => Some("this".to_string()), Expr::Member(ref mem) => { let prefix = expr_to_string(&mem.object)?; let suffix = expr_to_string(&mem.property)?; Some(if mem.computed { format!("{}[{}]", prefix, suffix) } else { format!("{}.{}", prefix, suffix) }) }, Expr::Lit(lit) => { match lit { Lit::String(s) => Some(s.clone_inner().to_string()), Lit::Number(n) => Some(n.to_string()), Lit::Boolean(b) => Some(b.to_string()), Lit::RegEx(r) => Some(format!("/{}/{}", r.pattern, r.flags)), Lit::Null => Some("null".to_string()), _ => None, } }, _ => None, } } }
This function is just a match expressions, the first case is the Ident
that we simply make a copy of the the name
property by calling to_string
. Next is the This
case, which we jsut create a new string and return that. for a member expression, we ant to return the object
property converted to a string and the property
property converted to a string seperated by a .
, if either of these two can't be converted to a string, we just return None
. The last case that we want to attempt to convert is the literal case, for that we simply extract the inner string in most cases. For the regex case, we reconstruct that by putting the pattern
between two slashes and flags
at the end. For the null
case we just return that as a new string. The last case we might handle is Template
which would be a little more complicated to re-construct for this example so we will just return None
in that case. For any other expressions we want to return None
as it would be far more complicated and pretty uncommon to come up in our use case.
Now, we want to wrap the result of this new function into an Expr
just like we did for our identifier.
#![allow(unused)] fn main() { fn expr_to_string_lit<'a>(e: &Expr<'a>) -> Option<Expr<'a>> { let inner = expr_to_string(e)?; Some(Expr::Lit(Lit::String(StringLit::Single(::std::borrow::Cow::Owned(inner))))) } }
Because modern javascript allows for patterns as function arguments, we are going to need a couple of helper's to handle these possiblities. Let's take this js as an example.
function Thing({a, b = 0}, [c, d, e]) {
}
Our goal would be to add a call to this function that looks like this.
console.log('Thing', a, b, c, d, e);
Before we get into these pattern arguments, we want to have an easy way to clone an Expr
but only when it is an Ident
.
#![allow(unused)] fn main() { fn clone_ident_from_expr<'a>(expr: &Expr<'a>) -> Option<Expr<'a>> { if let Expr::Ident(_) = expr { Some(expr.clone()) } else { None } } }
Here we are just using an if let
to test for the an Ident
and cloning if there is a match. Now let's dig into the Pat
argument conversion.
#![allow(unused)] fn main() { fn extract_idents_from_pat<'a>(pat: &Pat<'a>) -> Vec<Option<Expr<'a>>> { match pat { Pat::Ident(i) => { vec![Some(Expr::Ident(i.clone()))] }, Pat::Obj(obj) => { obj.iter().map(|part| { match part { ObjPatPart::Rest(pat) => { extract_idents_from_pat(pat) }, ObjPatPart::Assign(prop) => { match prop.key { PropKey::Pat(ref pat) => { extract_idents_from_pat(pat) }, PropKey::Expr(ref expr) => { vec![clone_ident_from_expr(expr)] }, PropKey::Lit(ref lit) => { vec![Some(Expr::Lit(lit.clone()))] } } }, } }).flatten().collect() }, Pat::Array(arr) => { arr.iter().map(|p| { match p { Some(ArrayPatPart::Expr(expr)) => { vec![clone_ident_from_expr(expr)] }, Some(ArrayPatPart::Pat(pat)) => { extract_idents_from_pat(pat) }, None => vec![], } }).flatten().collect() }, Pat::RestElement(pat) => { extract_idents_from_pat(pat) }, Pat::Assign(assign) => { extract_idents_from_pat(&assign.left) }, } } }
Because pattern's like the object or array pattern can contain multiple arguments, in our example a and b would be in the same pattern, we want to return a Vec
of the optional identifiers. First, let's cover the simplest pattern the Ident
case. In this case we simply want to create a new Vec
with a clone of the inner wrapped up in an Expr
as its only contents. Next we get something a little more complicated the Obj
case. Inside of a Pat::Obj
is a Vec
of an enum called ObjPatPart
which has 2 cases the normal Assign
and the Rest
(preceded by ...
). The nice thing about the Rest
case is that we can simply use recursion to get the ident's out of the inner Pat
. The Assign
case has a data scructure called Prop
, in this situation we only really care about the key
property, since that is where our identifier would live. A propety key can be either a Pat
, Expr
or Lit
, in the first case we can use the same recursive call to get the identifiers it contains. For the expression case we are going to use that helper function we just wrote to get the ident out if it is an ident, finally we are going to just clone the liter into a new Expr
. Since we need to do this for each of the ObjPatPart
s in the object pattern we are going to use the Iterator
trait's map
to do the first step in the process, this will convert each element into a Vec
of optional Expr
s, to get that back down to a single Vec
we can use the flatten
method. Finally we will collect
the iterator back together. Next we have the Array
, this is going to look very similar. First we are going to map the inner ArrayPatPart
s into our identifiers, this enum has 3 cases the Expr
which we can pass off to our helper just like before, the Pat
which we will use recursion for again and finally a None
case which we can just return an empty Vec
. The RestElement
works just like the object pattern version, we just recurse with the inner value. Finally we have the Assign
case, this one we want to use the same recursion method but only on the left
property. Whew, that one was a bit of a doozy!
We are just now starting to dig into the meat of this project, getting through this complicated mappings now is going to greatly simplify things for us later. Since we arre going to be primarily working with the FuncArg
s in any given Func
or ArrowFunc
, we should have a function that maps any list of arguments to a new list of identifiers and literals.
#![allow(unused)] fn main() { fn extract_idents_from_args<'a>(args: &[FuncArg<'a>]) -> Vec<Expr<'a>> { let mut ret = vec![]; for arg in args { match arg { FuncArg::Expr(expr) => ret.push(clone_ident_from_expr(expr)), FuncArg::Pat(pat) => ret.extend(extract_idents_from_pat(pat)), } } ret.into_iter().filter_map(|e| e).collect() } }
In this function we are going to liberally use the last to helpers we put together. a FuncArg
can be either a Pat
or and Expr
, in the former we are dealing with a possible list of many new elements but for the latter there would be only one. With that in mind we are going to use the Vec
method push
for one element and extend
for possibly many. Once we have gone through each of the arguments provided we want to filter out any of the None
cases by using the filter_map
which will filter out any None
s and unwrap and Some
s for us automatically. We can then collect up the result to return.
Last in our helper functions is going to be a way to go from an AssignLeft
into an Expr
with a StringLit
inside. For this we are going to use the expr_to_string_lit
helper in the Expr
case and we are going to match on the Pat
case, returning a call to the ident_to_string_lit
helper.
Armed with these helpers it is time to write our first mapping function. A pattern that will be true of all of our mapping functions is that they will always take a Vec
of Expr
s as the first argument. This how we are going to track the prefix of any log we want to write. We are going to start with the Class
, which is primarily a collection of Func
s wrapped up in Prop
s so let's start at the property level.
#![allow(unused)] fn main() { fn map_class_prop<'a>(mut args: Vec<Expr<'a>>, mut prop: Prop<'a>) -> Prop<'a> { match prop.kind { PropKind::Ctor => { args.insert(args.len().saturating_sub(1), Expr::Lit(Lit::String(StringLit::single_from("new")))); }, PropKind::Get => { args.push( Expr::Lit(Lit::String(StringLit::single_from("get"))) ); }, PropKind::Set => { args.push( Expr::Lit(Lit::String(StringLit::single_from("set"))) ); }, _ => (), }; match &prop.key { PropKey::Expr(ref expr) => match expr { Expr::Ident(ref i) => { if i.name != "constructor" { args.push(ident_to_string_lit(i)); } } _ => (), }, PropKey::Lit(ref l) => match l { Lit::Boolean(_) | Lit::Number(_) | Lit::RegEx(_) | Lit::String(_) => { args.push(Expr::Lit(l.clone())) } Lit::Null => { args.push(Expr::Lit(Lit::String(StringLit::Single(::std::borrow::Cow::Owned(String::from("null")))))); } _ => (), }, PropKey::Pat(ref p) => { match p { Pat::Ident(ref i) => args.push(ident_to_string_lit(i)), _ => args.extend(extract_idents_from_pat(p).into_iter().filter_map(|e| e)), } }, } if let PropValue::Expr(expr) = prop.value { prop.value = PropValue::Expr(map_expr(args, expr)); } prop } }
To start, we want to look at the kind
property, there are 3 kinds that are important for us here. The first is Ctor
(short for constructor), if we find one of those we want to put the new
just before the class name, which should be the last element in the args. To make sure we don't run into any big problems later we should use the saturation_sub
method on usize
to do the subtraction. Next are the Get
and Set
accessors, if we find one of those we just want to append this keyword to the end of the current args.
Now that we have that, we need to start digging into the ProgramPart
to identify anything we want to modify. Since Parser
implements Iterator
and its Item
is Result<ProgramPart, Error>
we first need to use filter_map
to extract the ProgramPart
from the result. It would probably be good to handle the error case here but for the sake of simplicity we are going to skip any errors. Now that we have an Iterator
over ProgramPart
s we can use map
to update each part.
fn main() { let js = get_js().expect("Unable to get JavaScript"); let parser = Parser::new(&js).expect("Unable to construct parser"); for part in parser.filter_map(|p| p.ok()).map(map_part) { //FIXME: Write updated program part to somewhere } }
With that in mind the entry point is going to be a function that takes a ProgramPart
and returns a new ProgramPart
. It might look like this
#![allow(unused)] fn main() { fn map_part<'a>(args: Vec<Expr<'a>>, part: ProgramPart<'a>) -> ProgramPart<'a> { match part { ProgramPart::Decl(decl) => ProgramPart::Decl(map_decl(args, decl)), ProgramPart::Stmt(stmt) => ProgramPart::Stmt(map_stmt(args, stmt)), ProgramPart::Dir(_) => part, } } }
We are going to match on the part provided and either return that part if it is a Directive
or if it isn't we need to investigate further to discover if it is a function or not. We do that in two places map_decl
and map_stmt
both of which are going to utilize similar method for digging further into the tree.
#![allow(unused)] fn main() { fn map_decl<'a>(mut args: Vec<Expr<'a>>, decl: Decl<'a>) -> Decl<'a> { match decl { Decl::Func(f) => Decl::Func(map_func(args, f)), Decl::Class(class) => Decl::Class(map_class(args, class)), Decl::Var(kind, del) => { Decl::Var(kind, del.into_iter().map(|part| { if let Pat::Ident(ref ident) = part.id { args.push(ident_to_string_lit(ident)); } VarDecl { id: part.id, init: part.init.map(|e| map_expr(args.clone(), e)) } }).collect()) } }
There are two ways for a Decl
to resolve into a function or method and that is with the Function
and Class
variants while a Stmt
can end up there if it is an Expr
. When we include map_expr
we see that there are cases for both Function
and Class
in the Expr
enum. That means once we get past those we will be handling the rest in the exact same way.
#![allow(unused)] fn main() { _ => decl.clone(), } } fn map_stmt<'a>(args: Vec<Expr<'a>>, stmt: Stmt<'a>) -> Stmt<'a> { match stmt { Stmt::Expr(expr) => Stmt::Expr(map_expr(args, expr)), _ => stmt.clone(), }
Finally we are going to start manipulating the AST in map_func
.
The first thing we are going to do is to clone the func
to give us a mutable version. Next we are going to check if the id
is Some
, if it is we can add that name to our console.log
arguments. Now function arguments can be pretty complicated, to try and keep things simple we are going to only worry about the ones that are either Expr::Ident
or Pat::Identifier
. To build something more robust it might be good to include destructured arguments or arguments with default values but for this example we are just going to keep it simple.
First we are going to filter_map
the func.params
to only get the items that ultimately resolve to Identifer
s, at that point we can wrap all of these identifiers in an Expr::Ident
and add them to the console.log
args. Now we can simply insert the result of passing those args to console_log
at the first position of the func.body
. Because functions can appear in the body of other functions we also want to map all of the func.body
program parts. Once that has completed we can return the updated func
to the caller.
The next thing we are going to want to deal with is Class
, we want to insert console.log into the top of each method on a class. This is a bit unique because we also want to provide the name of that class (if it exists) as the first argument to console.log. That might look like this.
#![allow(unused)] fn main() { fn map_func<'a>(mut args: Vec<Expr<'a>>, mut func: Func<'a>) -> Func<'a> { if let Some(ref id) = &func.id { args.push(ident_to_string_lit(id)); } let local_args = extract_idents_from_args(&func.params); func.body = FuncBody(func.body.0.into_iter().map(|p| map_part(args.clone(), p)).collect()); insert_expr_into_func_body(console_log(args.clone().into_iter().chain(local_args.into_iter()).collect()), &mut func.body); func } fn map_arrow_func<'a>(mut args: Vec<Expr<'a>>, mut f: ArrowFuncExpr<'a>) -> ArrowFuncExpr<'a> { args.extend(extract_idents_from_args(&f.params)); match &mut f.body { ArrowFuncBody::FuncBody(ref mut body) => { insert_expr_into_func_body(console_log(args), body) }, ArrowFuncBody::Expr(expr) => { f.body = ArrowFuncBody::FuncBody(FuncBody(vec![ console_log(args), ProgramPart::Stmt( Stmt::Return( Some(*expr.clone()) ) ) ])) } } f } fn map_class<'a>(mut args: Vec<Expr<'a>>, mut class: Class<'a>) -> Class<'a> { if let Some(ref id) = class.id { args.push(ident_to_string_lit(id)) } let mut new_body = vec![]; for item in class.body.0 { new_body.push(map_class_prop(args.clone(), item)) } class.body = ClassBody(new_body); class } fn map_class_prop<'a>(mut args: Vec<Expr<'a>>, mut prop: Prop<'a>) -> Prop<'a> { match prop.kind { PropKind::Ctor => { args.insert(args.len().saturating_sub(1), Expr::Lit(Lit::String(StringLit::single_from("new")))); }, PropKind::Get => { args.push( Expr::Lit(Lit::String(StringLit::single_from("get"))) ); }, PropKind::Set => { args.push( Expr::Lit(Lit::String(StringLit::single_from("set"))) ); }, _ => (), }; match &prop.key { PropKey::Expr(ref expr) => match expr { Expr::Ident(ref i) => { if i.name != "constructor" { args.push(ident_to_string_lit(i)); } } _ => (), }, PropKey::Lit(ref l) => match l { Lit::Boolean(_) | Lit::Number(_) | Lit::RegEx(_) | Lit::String(_) => { args.push(Expr::Lit(l.clone())) } Lit::Null => { args.push(Expr::Lit(Lit::String(StringLit::Single(::std::borrow::Cow::Owned(String::from("null")))))); } _ => (), }, PropKey::Pat(ref p) => { match p { Pat::Ident(ref i) => args.push(ident_to_string_lit(i)), _ => args.extend(extract_idents_from_pat(p).into_iter().filter_map(|e| e)), } }, } if let PropValue::Expr(expr) = prop.value { prop.value = PropValue::Expr(map_expr(args, expr)); } prop } fn assign_left_to_string_lit<'a>(left: &AssignLeft<'a>) -> Option<Expr<'a>> { match left { AssignLeft::Expr(expr) => expr_to_string_lit(expr), AssignLeft::Pat(pat) => { match pat { Pat::Ident(ident) => Some(ident_to_string_lit(ident)), _ => None, } } } } }
Here we have two functions, the first pulls out the id from the provided class or uses an empty string of it doesn't exist. We then just pass that off to map_class_prop
which will handle all of the different types of properties a class can have. The first thing this does is map the prefix
into the right format, so a call to new Thing()
would print new Thing
, or a get method would print Thing get
before the method name. Next we take a look at the property.key
, this will provide us with the name of our function, but according to the specification a class property key can be an identifier, a literal value, or a pattern, so we need to figure out what the name of this method is by digging into that value. First in the case that it is an ident we want to add it to the args, unless it is the value constructor
because we already put the new
keyword in that one. Next we can pull out the literal values and add those as they appear. Lastly we will only handle the pattern case when it is a Pat::Identifier
otherwise we will just skip it. Now to get the parameter names from the method definition we need to look at the property.value
which should always be an Expr::Function
. Once we match on that we simply repeat the process of map_function
pulling the args out but only when they are Ident
s and then passing that along to console_log
and inserting that Expr
at the top of the function body.
At this point we have successfully updated our AST to include a call to console.log
at the top of each function and method in our code. Now the big question is how do we write that out to a file. This problem is not a small one, in the next section we are going to cover a third crate resw
that we can use to finish this project.
$web-only-end$
RESW
$web-only$
While ress
and ressa
consume text and generate data structures, resw
is going to consume data structures and write out text. This means it can do the heavy lifting when solving the problem our debug logging project left us with. However instead of just sweeping that under the rug, we are going to go over how resw
works. Because the nature of JavaScript, resw
makes some style decisions that might not work for everyone, by going over the project in detail the hope is that other's will feel enabled to either contribute a configuration option into resw
or even implement their own project that consumes ressa
's AST and generates text.
If you are just interested in seeing how we are going to finish the project from the last chapter, feel free to move ahead.
Similar to the structure of ressa
, resw
exposes a struct that will keep track of the context for us called Writer
. There are 2 methods for constructing a Writer
, the first is the ::new
method the second is the ::builder
method that utilizes the builder pattern to customize some options. Those options include
- New line character (default
\n
) - Quote (default to use origin quotation mark)
- Setting this to any value will force all of the string literals in the provided JavaScript to be re-written with the provided quotes
- Indent (default 4 spaces)
Either method you are going to need to provide the destination, this can be anything that implements the std::io::Write
trait. For testing purposes the crate provides an implementor of Write
in WriteString
, we are not going to cover that here but a more detailed explanation can be found in the appendix.
Once a Writer
is constructed, it provides an API surface that should cover most of the ressa
AST. The primary entry-point for is going to be either write_program
or write_part
. For the most part, the primary role of the writer is going to be incrementally move down the AST until we find something that we are confident in exactly what to write. Let's take the following js as an example.
function Thing(stuff) {
this.stuff = stuff;
}
let thing = new Thing('argument');
If we run that that through the ressa::Parser
, we would see the following AST.
Decl(
Function(
Function {
id: Some(
"Thing"
),
params: [
Pat(
Identifier(
"stuff"
)
)
],
body: [
Stmt(
Expr(
Assignment(
AssignmentExpr {
operator: Equal,
left: Expr(
Member(
MemberExpr {
object: ThisExpr,
property: Ident(
"stuff"
),
computed: false
}
)
),
right: Ident(
"stuff"
)
}
)
)
)
],
generator: false,
is_async: false
}
)
)
Decl(
Variable(
Let,
[
VariableDecl {
id: Identifier(
"thing"
),
init: Some(
New(
NewExpr {
callee: Ident(
"Thing"
),
arguments: [
Literal(
String(
"\'argument\'"
)
)
]
}
)
)
}
]
)
)
Using that, let's take a look at how resw
would generate the text to represent our AST. First we would enter at write_part
with the first ProgramPart
.
#![allow(unused)] fn main() { pub fn write_part(&mut self, part: &ProgramPart) -> Res { self.at_top_level = true; self._write_part(part)?; self.write_new_line()?; Ok(()) } }
Interestingly enough, write_part
is really more concerned with maintaining a context flag for if we are at the top level or not, this becomes important when trying to determine if any expression needs to be wrapped in parentheses. Almost all of the work is going to be passed off to an internal private function _write_part
.
#![allow(unused)] fn main() { fn _write_part(&mut self, part: &ProgramPart) -> Res { self.write_leading_whitespace()?; match part { ProgramPart::Decl(decl) => self.write_decl(decl)?, ProgramPart::Dir(dir) => self.write_directive(dir)?, ProgramPart::Stmt(stmt) => self.write_stmt(stmt)?, } Ok(()) } }
The first thing we want to do is make sure that any leading whitespace is included with write_leading_whitespace
.
#![allow(unused)] fn main() { pub fn write_leading_whitespace(&mut self) -> Res { self.write(&self.indent.repeat(self.current_indent))?; Ok(()) } }
This is achieved by looking at the current_indent
and writing the configurable property indent
to the destination repeated the for our current indent level, so if our indent was \t
and we were at level 2 it would write "\t\t"
. Internally the write
method just writes a single &str
to the destination. After we write our leading whitespace, we can start to descend the AST, we do that by matching on the part. You can see that there is a branch for each of the possible enum variants, looking back at the example, we know the next step would be to head to write_decl
.
#![allow(unused)] fn main() { pub fn write_decl(&mut self, decl: &Decl) -> Res { match decl { Decl::Variable(ref kind, ref decls) => self.write_variable_decls(kind, decls)?, Decl::Class(ref class) => { self.at_top_level = false; self.write_class(class)?; self.write_new_line()?; }, Decl::Function(ref func) => { self.at_top_level = false; self.write_function(func)?; self.write_new_line()?; }, Decl::Export(ref exp) => self.write_export_decl(exp)?, Decl::Import(ref imp) => self.write_import_decl(imp)?, }; Ok(()) } }
Moving further down we simply match on the the declaration handling each variant as needed. For our example we would move into the Decl::Function
branch. The first step in that branch is to set the context flag at_top_level
to false
and then move into the write_function
method.
#![allow(unused)] fn main() { pub fn write_function(&mut self, func: &Function) -> Res { if func.is_async { self.write("async ")?; } self.write("function")?; if let Some(ref id) = func.id { self.write(" ")?; if func.generator { self.write("*")?; } self.write(id)?; } else if func.generator { self.write("*")?; } self.write_function_args(&func.params)?; self.write(" ")?; self.write_function_body(&func.body) } }
Here we are going to actually start writing some information out to our destination. First is we check the flag on Function
to see if we need to write the async
keyword, next we write the keyword function
followed by a check to see if the id is Some
. If so we need to check the flag on Function
to see if that function is a generator, if it is we need to add a *
before the id, and Lastly we write the id
Now that we have gotten though that we can start to look at the parameters and body. First we are going to pass off the parameters to write_function_args
.
#![allow(unused)] fn main() { /// Write the arguments of a function or method definition /// ```js /// function(arg1, arg2) { /// } /// ``` pub fn write_function_args(&mut self, args: &[FunctionArg]) -> Res { self.write("(")?; let mut after_first = false; for ref arg in args { if after_first { self.write(", ")?; } else { after_first = true; } self.write_function_arg(arg)?; } self.write(")")?; Ok(()) } }
The first step here is to write the open parenthesis, next we are going to use a flag after_first
to help with handing if a comma should be written before the argument. This is the first place that we have seen where resw
is making a style choice, all function parameters will not include a trailing comma. Ideally style choices will be configurable in the future but currently this one is not. Now that we have handled the comma situation we can pass the argument off to write_function_arg
.
#![allow(unused)] fn main() { pub fn write_function_arg(&mut self, arg: &FunctionArg) -> Res { match arg { FunctionArg::Expr(ref ex) => self.write_expr(ex)?, FunctionArg::Pat(ref pa) => self.write_pattern(pa)?, } Ok(()) } }
Here we see another function that simply move us further down the AST. Function arguments can be either expressions or patterns so we need to handle both. For our example we are going to head down the Pat
branch with write_pattern
.
#![allow(unused)] fn main() { pub fn write_pattern(&mut self, pattern: &Pat) -> Res { match pattern { Pat::Identifier(ref i) => self.write(i), Pat::Object(ref o) => self.write_object_pattern(o), Pat::Array(ref a) => self.write_array_pattern(a.as_slice()), Pat::RestElement(ref r) => self.write_rest_element(r), Pat::Assignment(ref a) => self.write_assignment_pattern(a), } } }
Most of the options here are simply going to continue branching down our AST, however for our example we are going to head down the first match arm with Pat::Identifer
and just write that string out to our destination.
Moving back up we only had one parameter for our function signature so we finish out write_function_args
with a closing parenthesis. That then leads us to write_function_body
.
#![allow(unused)] fn main() { pub fn write_function_body(&mut self, body: &FunctionBody) -> Res { if body.len() == 0 { self.write("{ ")?; } else { self.write_open_brace()?; self.write_new_line()?; } for ref part in body { self._write_part(part)?; } if body.len() == 0 { self.write("}")?; } else { self.write_close_brace()?; } Ok(()) } }
The first thing we need to do is take a look at the &FunctionBody
which is a type alias for Vec<ProgramPart>
. We check to see if this function has any body, if not we just write a single open curly brace, if it does we want to write the curly brace using write_open_brace
, this is a convenience method for writing the character and also incrementing the current_indent
, lastly we write a new line. Now we loop over each of the ProgramPart
s in body
and pass that off to _write_body
. For our example there is only going to be one part. This part is a ProgramPart::Stmt
which would be handled by write_stmt
.
#![allow(unused)] fn main() { pub fn write_stmt(&mut self, stmt: &Stmt) -> Res { let mut semi = true; let mut new_line = true; let cached_state = self.at_top_level; match stmt { Stmt::Empty => { new_line = false; }, Stmt::Debugger => self.write_debugger_stmt()?, Stmt::Expr(ref stmt) => { let wrap = match stmt { Expr::Literal(_) | Expr::Object(_) | Expr::Function(_) | Expr::Binary(_) => true, _ => false, }; if wrap { self.write_wrapped_expr(stmt)? } else { self.write_expr(stmt)? } }, Stmt::Block(ref stmt) => { self.at_top_level = false; self.write_block_stmt(stmt)?; semi = false; new_line = false; self.at_top_level = cached_state; } Stmt::With(ref stmt) => { self.write_with_stmt(stmt)?; semi = false; } Stmt::Return(ref stmt) => self.write_return_stmt(stmt)?, Stmt::Labeled(ref stmt) => { self.write_labeled_stmt(stmt)?; semi = false; } Stmt::Break(ref stmt) => self.write_break_stmt(stmt)?, Stmt::Continue(ref stmt) => self.write_continue_stmt(stmt)?, Stmt::If(ref stmt) => { self.write_if_stmt(stmt)?; semi = false; } Stmt::Switch(ref stmt) => { self.at_top_level = false; self.write_switch_stmt(stmt)?; semi = false; } Stmt::Throw(ref stmt) => self.write_throw_stmt(stmt)?, Stmt::Try(ref stmt) => { self.write_try_stmt(stmt)?; semi = false; } Stmt::While(ref stmt) => { new_line = self.write_while_stmt(stmt)?; semi = false; } Stmt::DoWhile(ref stmt) => self.write_do_while_stmt(stmt)?, Stmt::For(ref stmt) => { self.at_top_level = false; new_line = self.write_for_stmt(stmt)?; semi = false; } Stmt::ForIn(ref stmt) => { self.at_top_level = false; new_line = self.write_for_in_stmt(stmt)?; semi = false; } Stmt::ForOf(ref stmt) => { self.at_top_level = false; new_line = self.write_for_of_stmt(stmt)?; semi = false; } Stmt::Var(ref stmt) => self.write_var_stmt(stmt)?, }; if semi { self.write_empty_stmt()?; } if new_line { self.write_new_line()?; } self.at_top_level = cached_state; Ok(()) } }
That is a pretty big match statement! Before we enter that we have a couple of context flags to help us with formatting write_semi
and new_line
, both with a default value of true
. Looking at our example, we would enter the Stmt::Expr
arm of the match which handles handles the possible requirement that this statement be wrapped in parentheses. Primitive literals, object literals, functions, and binary operations would require parentheses when not part of a larger statement. There is a convenience method called write_wrapped_expr
that just writes parentheses around a call to write_expr
.
#![allow(unused)] fn main() { pub fn write_expr(&mut self, expr: &Expr) -> Res { let cached_state = self.at_top_level; match expr { Expr::Literal(ref expr) => self.write_literal(expr)?, Expr::This => self.write_this_expr()?, Expr::Super => self.write_super_expr()?, Expr::Array(ref expr) => self.write_array_expr(expr)?, Expr::Object(ref expr) => self.write_object_expr(expr)?, Expr::Function(ref expr) => { self.at_top_level = false; self.write_function(expr)?; self.at_top_level = cached_state; } Expr::Unary(ref expr) => self.write_unary_expr(expr)?, Expr::Update(ref expr) => self.write_update_expr(expr)?, Expr::Binary(ref expr) => self.write_binary_expr(expr)?, Expr::Assignment(ref expr) => { self.at_top_level = false; self.write_assignment_expr(expr)? }, Expr::Logical(ref expr) => self.write_logical_expr(expr)?, Expr::Member(ref expr) => self.write_member_expr(expr)?, Expr::Conditional(ref expr) => self.write_conditional_expr(expr)?, Expr::Call(ref expr) => self.write_call_expr(expr)?, Expr::New(ref expr) => self.write_new_expr(expr)?, Expr::Sequence(ref expr) => self.write_sequence_expr(expr)?, Expr::Spread(ref expr) => self.write_spread_expr(expr)?, Expr::ArrowFunction(ref expr) => { self.at_top_level = false; self.write_arrow_function_expr(expr)?; self.at_top_level = cached_state; } Expr::Yield(ref expr) => self.write_yield_expr(expr)?, Expr::Class(ref expr) => { self.at_top_level = false; self.write_class(expr)?; self.at_top_level = cached_state; } Expr::MetaProperty(ref expr) => self.write_meta_property(expr)?, Expr::Await(ref expr) => self.write_await_expr(expr)?, Expr::Ident(ref expr) => self.write_ident(expr)?, Expr::TaggedTemplate(ref expr) => self.write_tagged_template(expr)?, _ => unreachable!(), } Ok(()) } }
The first step here is to keep a copy of the previous at_top_level
flag so that we can revert back to it after writing, some of the arms are going to change it. Next we enter another very large match statement. Our example would take the Expr::Assignment
arm, passing further work off to write_assignment_expr
.
#![allow(unused)] fn main() { pub fn write_assignment_expr(&mut self, assignment: &AssignmentExpr) -> Res { let wrap_self = match &assignment.left { AssignmentLeft::Expr(ref e) => match &**e { Expr::Object(_) | Expr::Array(_) => true, _ => false, }, AssignmentLeft::Pat(ref p) => match p { Pat::Array(_) => true, Pat::Object(_) => true, _ => false, } }; if wrap_self { self.write("(")?; } match &assignment.left { AssignmentLeft::Expr(ref e) => self.write_expr(e)?, AssignmentLeft::Pat(ref p) => self.write_pattern(p)?, } self.write(" ")?; self.write_assignment_operator(&assignment.operator)?; self.write(" ")?; self.write_expr(&assignment.right)?; if wrap_self { self.write(")")?; } Ok(()) } }
Here we are first we need to determine if the whole assignment expression needs to be wrapped in parentheses which would only be true if the left hand side was an object or array literal. Next we test the assignment.left
property since it can be either an Expr
or a Pat
, our example would take us back to the write_expr
method. This would take us back up through write_expr
but this time we would pass into the Expr::Member
arm which passes its work off to write_member_expr
.
#![allow(unused)] fn main() { pub fn write_member_expr(&mut self, member: &MemberExpr) -> Res { match &*member.object { Expr::Assignment(_) | Expr::Literal(Literal::Number(_)) | Expr::Conditional(_) | Expr::Logical(_) | Expr::Function(_) | Expr::ArrowFunction(_) | Expr::Object(_) | Expr::Binary(_) | Expr::Unary(_) | Expr::Update(_) => self.write_wrapped_expr(&member.object)?, _ => self.write_expr(&member.object)?, } if member.computed { self.write("[")?; } else { self.write(".")?; } self.write_expr(&member.property)?; if member.computed { self.write("]")?; } Ok(()) } }
Here we first check to see if the object
property is required to be wrapped in parentheses for us though we just want to pass that along to write_expr
. This time though there we are going to end up at Expr::ThisExpr
which just writes out the literal word this
. Next we are going to look at the flag on MemberExpr
"computed" to see if this was written originally with the bracket notation (this['stuff']
) or the dot notation (this.stuff
), writing the appropriate character. Now we are again going to pass some work back to write_expr
, this time with the property
property. This would end on the branch for Expr::Ident
which just writes that value to the destination. If the member expression was computed we would need to write the ]
but for our example it is not.
At this point we are back up at write_assignment_expr
where we are going to write a single space and then pass the assignment.operator
off to write_assignment_operator
.
#![allow(unused)] fn main() { pub fn write_assignment_operator(&mut self, op: &AssignmentOperator) -> Res { let s = match op { AssignmentOperator::AndEqual => "&=", AssignmentOperator::DivEqual => "/=", AssignmentOperator::Equal => "=", AssignmentOperator::LeftShiftEqual => "<<=", AssignmentOperator::MinusEqual => "-=", AssignmentOperator::ModEqual => "%=", AssignmentOperator::OrEqual => "|=", AssignmentOperator::PlusEqual => "+=", AssignmentOperator::PowerOfEqual => "**=", AssignmentOperator::RightShiftEqual => ">>=", AssignmentOperator::TimesEqual => "*=", AssignmentOperator::UnsignedRightShiftEqual => ">>>=", AssignmentOperator::XOrEqual => "^=", }; self.write(s)?; Ok(()) } }
This is a relatively straight forward process of looking at which operator was provided and then writing out the text that represents that operator. For our example it would be =
, we then need to write a single space. The last step in write_assignment_expr
is to handle the assignment.right
which is also an Expr
so we pass that off to write_expr
. Our example will head to the Expr::Ident
match arm and then just write to the destination. With that we have now reached the last step in write_function_body
which is to write_close_brace
similar to write_open_brace
here we are decrementing the current_indent
context property. That also brings us to the end of write_function
, write_decl
, and _write_part
. The last thing we do in write_part
is to add a trailing new line, another style choice.
As our example continues we would then start again at write_part
with the next part. This is going to move though _write_part
the same as before, however when we get to write_decl
we have a new branch to head down. This is the Decl::Variable
arm which passes its work off to write_variable_decls
.
#![allow(unused)] fn main() { pub fn write_variable_decls(&mut self, kind: &VariableKind, decls: &[VariableDecl]) -> Res { self.write_variable_kind(kind)?; let mut after_first = false; for decl in decls { if after_first { self.write(", ")?; } else { after_first = true; } self.write_variable_decl(decl)?; } self.write_empty_stmt()?; self.write_new_line() } }
As you might expect the first thing we want to do is to write the variable kind. We pass off the kind
variable to write_variable_kind
.
#![allow(unused)] fn main() { pub fn write_variable_kind(&mut self, kind: &VariableKind) -> Res { let s = match kind { VariableKind::Const => "const ", VariableKind::Let => "let ", VariableKind::Var => "var ", }; self.write(s) } }
Similar to our examination of write_assignment_operator
we are going to simply look at which keyword was used and then write that out, with a trailing space.
Next we need to keep track of two flags after_first
which should be familiar from write_function_args
. In our loop, we pass of each of the declarations to write_variable_decl
.
#![allow(unused)] fn main() { pub fn write_variable_decl(&mut self, decl: &VariableDecl) -> Res { self.write_pattern(&decl.id)?; if let Some(ref init) = decl.init { self.write(" = ")?; self.write_expr(init)?; } Ok(()) } }
Here we first write out the id of this variable by passing it off to write_pattern
. Thankfully our example is pretty simple so we are again going to take that first branch for Pat::Ident
and write the identifer to our destination. After that we want to check if this variable is initialized, ours is, and if so we would write the " = " and then write the expression by passing that off to write_expr
. For this pass through write_expr
we are going to travel down the Expr::New
arm which passes its work off to write_new_expr
.
#![allow(unused)] fn main() { pub fn write_new_expr(&mut self, new: &NewExpr) -> Res { self.write("new ")?; match &*new.callee { Expr::Assignment(_) | Expr::Call(_) => self.write_wrapped_expr(&new.callee)?, _ => self.write_expr(&new.callee)?, } self.write_sequence_expr(&new.arguments)?; Ok(()) } }
At this point we want to first write the new
keyword followed by a space. Next we want to write out what the new.callee
is which would again bring us to write_expr
. Our example would travel to the Expr::Ident
arm which just writes that out. Next we need to write an open parenthesis followed by the provided arguments. This time we are going to use the write_sequence_expr
method to do that.
#![allow(unused)] fn main() { pub fn write_sequence_expr(&mut self, sequence: &[Expr]) -> Res { let mut after_first = false; self.write("(")?; for ref e in sequence { if after_first { self.write(", ")?; } self.write_expr(e)?; after_first = true; } self.write(")")?; Ok(()) } }
At this point the structure of this function's body should look familiar, we are going to loop over the provide expressions and write them out with a comma and space before all but the first one. For our example we are going only hit this once so no comma, then we are going to pass that off to write_expr
. This time as we pass through the match in write_expr
we are going to hit the Expr::Literal
arm which passes its work off to write_literal
.
#![allow(unused)] fn main() { pub fn write_literal(&mut self, lit: &Literal) -> Res { match lit { Literal::Boolean(b) => self.write_bool(*b), Literal::Null => self.write("null"), Literal::Number(n) => self.write(&n), Literal::String(s) => self.write_string(s), Literal::RegEx(r) => self.write_regex(r), Literal::Template(t) => self.write_template(t), } } }
Here we see another match statement, our example will take us down the Literal::String
arm which passes off work to write_string
. You may be wondering why that is, since writing strings is all we have really been doing. The answer is that this is one of the few style preferences that is currently configurable as you'll see.
#![allow(unused)] fn main() { pub fn write_string(&mut self, s: &str) -> Res { if let Some(c) = self.quote { self.re_write_string(s, c)?; } else { self.write(s)?; } Ok(()) } }
We first check to see if the self.quote
property has been set, this would indicate that the user has a quote preference. If it is set then we want to re-write the string to use this quote, this involves re-writing any internal escaped quotes for the old quote and escaping the new quote that might appear in the contents. If that property is None
then we would just write it out normally as the ressa::node::Literal::String
preserves the original quotation mark.
After that we are again back at write_new_expr
where the last thing to do is write the closing parenthesis, after which we are at the bottom of write_variable_decl
. When we move up again to the write_variable_decls
we would write a semi-colon and new line to close that out. This brings us to the bottom of write_decl
, _write_part
, and write_part
, it also brings us to the end of our example JavaScript. While we didn't touch every part of how resw
works, there is a lot of surface area to cover, hopefully it has provided enough information for you feel confident in how it works. For more information you can check out the ressa
docs and the resw
docs.
Up next we are going to see how you would use resw
to complete our debug log helper.
$web-only-end$
$slides-only$
Writer
takesProgramPart
s- Somewhat Configurable
- Writes to
impl Write
$slides-only$
Building a Writer
$web-only$
Thankfully because of the existence of resw
completing the console.log
debugging tool is going to be trivial. The primary entry point for resw
is the Writer
struct, which has a method write_part
that will take a &mut self
and &ProgramPart
, so we can use that in our for loop to write out the parts as they are parsed. That might look like this.
}; fn main() { let mut args = ::std::env::args(); let _ = args.next(); let file_name = args .next() .unwrap_or(String::from("./examples/insert_logging.js")); let js = read_to_string(file_name).expect("Unable to find js file");
With that complete we can see how well it works for us. Let's use the following example JavaScript.
function Thing(stuff) {
this.stuff = stuff;
}
let x = new Thing('argument');
Just as a simple test we could enter the following into our terminal
$ echo "function Thing(stuff) {
this.stuff = stuff;
}
let x = new Thing('argument');
" | console_logify
function Thing(stuff) {
console.log('Thing', stuff);
this.stuff = stuff;
}
let x = new Thing('argument!');
That looks exactly like the output we were looking for. Let's double check that it will behave as expected by piping the output to node
$ echo "function Thing(stuff) {
this.stuff = stuff;
}
let x = new Thing('argument');
" | console_logify | node -
Thing argument
It worked!
$web-only-end$ $slides-only$
Demo
$slides-only-end$
Conclusion
$web-only$ Hopefully now you have the all you to get started building your JavaScript development tools using Rust. If you do create one please open an issue on this project's GitHub issues page with the project's name, a short description, and a link, and it will be added to the appendix.
If you run into any problems in any crates (including typos in this book) it would be wonderful of you to open an issue on GitHub.
If you want to get involved, there are probably a few open issues that could use some help. Each project does provide contributing guidelines.
$web-only-end$ $slides-only$
- Annotated version of this presentation
- https://FreeMasen.github.io/rusty-ecma-book
- Where to find me
- email: r.f.masen@gmail.com
- website: https://WiredForge.com
- twitter/github: @FreeMasen $slides-only-end$
Appendix
Tokens
Here is a list of all of the possible tokens ress
provides
Token
EoF
Boolean
-enum BooleanLiteral
True
False
Ident
-struct Ident(String)
Keyword
-enum Keyword
Await
Break
Case
Catch
Class
Const
Continue
Debugger
Default
Delete
Do
Else
Enum
Export
Finally
For
Function
If
Implements
Import
In
InstanceOf
Interface
Let
New
Package
Private
Protected
Public
Return
Static
Super
Switch
This
Throw
Try
TypeOf
Var
Void
While
With
Yield
Null
Numeric
-struct Number(String)
0
.0
0.0
0.0e1
0.0E1
.0e1
.0E1
0xfff
0Xfff
0o777
0O777
0b111
0B111
Punct
-enum Punct
And
-&
Assign
-=
Asterisk
-*
BitwiseNot
-~
Caret
-^
CloseBrace
-}
CloseBracket
-]
CloseParen
-)
Colon
-:
Comma
-,
ForwardSlash
-/
GreaterThan
->
LessThan
-<
Minus
--
Modulo
-%
Not
-!
OpenBrace
-{
OpenBracket
-[
OpenParen
-(
Period
-.
Pipe
-|
Plus
-+
QuestionMark
-?
SemiColon
-;
Spread
-...
UnsignedRightShiftAssign
->>>=
StrictEquals
-===
StrictNotEquals
-!==
UnsignedRightShift
->>>
LeftShiftAssign
-<<=
RightShiftAssign
->>=
ExponentAssign
-**=
LogicalAnd
-&&
LogicalOr
-||
Equal
-==
NotEqual
-!=
AddAssign
-+=
SubtractAssign
--=
MultiplyAssign
-*=
DivideAssign
-/=
Increment
-++
Decrement
---
LeftShift
-<<
RightShift
->>
BitwiseAndAssign
-&=
BitwiseOrAssign
-|=
BitwiseXOrAssign
-^=
ModuloAssign
-%=
FatArrow
-=>
GreaterThanEqual
->=
LessThanEqual
- `<=Exponent
-**
String
-enum StringLit
Single(String)
Double(String)
Regex
-struct Regex
body
-String
flags
-Option<String>
Template
-enum Template
,NoSub(String)
Head(String)
Middle(String)
Tail(String)
Comment
-struct Comment
kind
-enum Kind
Single
-//comment
Multi
-/* comment */
Html
-<!-- comment --> trailing content
content
-String
tail_content
-Option<String>
banned_tokens.toml
idents = [
"Int8Array",
"Uint8Array",
"Uint8ClampedArray",
"Int16Array",
"Uint16Array",
"Int32Array",
"Uint32Array",
"Float32Array",
"Float64Array",
"Promise",
"Proxy",
"async",
"padStart",
"padEnd",
"includes",
"find",
"getComputedStyle",
"FontFace",
"FontFaceSet",
"FontFaceSetLoadEvent",
"MediaSource",
"sourceBuffers",
"activeSourceBuffers",
"readyState",
"duration",
"onsourceclose",
"onsourceended",
"addSourceBuffer",
"removeSourceBuffer",
"endOfStream",
"setLiveSeekableRange",
"clearLiveSeekableRange",
"isTypeSupported",
"TouchEvent",
"Touch",
"TouchList",
"onpointerover",
"onpointerenter",
"onpointerdown",
"onpointermove",
"onpointerup",
"onpointercancel",
"onpointerout",
"onpointerleave",
"ongotpointercapture",
"onlostpointercapture",
"setPointerCapture",
"releasePointerCapture",
"MutationObserver",
]
keywords = [
"let",
"const",
"class",
"await",
"import",
"export",
"yield",
]
puncts = [
"=>",
"**",
"...",
"`",
]
strings = [
"use strict",
"sourceopen",
"touchstart",
"touchend",
"touchmove",
"touchcancel",
"pointerenter",
"pointerdown",
"pointermove",
"pointerup",
"pointercancel",
"pointerout",
"pointerleave",
"gotpointercapture",
"lostpointercapture",
"pointerover",
]
AST
While it may be a bit of a cop-out, it seems silly to duplicate the AST docs provided by cargo-doc. In the future this page may include some more introspective information but for now please refer to the link below.
StringWriter
When building resw
it became clear that the only way to validate the output would be to write a bunch of files to disk and then read them back which didn't seem like the correct option. Because of this resw
includes an public module called write_str
. In it you will find two structs WriteString
and ChildWriter
. The basic idea here is that you can use this to simply write the values to a buffer that the resw::Writer
hasn't taken ownership over and then read them back after the Writer
is done. Below is an example of how you might use that.
#![allow(unused)] fn main() { fn test_round_trip() { let original = "let x = 0"; let dest = WriteString::new(); let parser = ressa::Parser::new(original).expect("Failed to create parser"); let writer = resw::Writer::new(dest.generate_child()); for part in parser { let part = part.expect("failed to parse part"); writer.write_part(part).expect("failed to write part"); } assert_eq!(dest.get_string_lossy(), original.to_string()); } }
Projects
name | description | website |
---|---|---|
console_logger | A utility that will insert console.log to the top of all of your function bodies | repo |
lint-ie8 | A utility that will check for any javascript that would fail when executed by Internet Explorer 8 | repo |
RESS
Scanners
In the initial implementation of the ress
scanner, it was more important to get something working correctly than to have something blazing fast. To that end, the original Scanner
performs a significant amount of memory allocation, which slows everything down quite a bit. To improve upon that ress
offers a section option the RefScanner
, which is a bit unfortunately named as it doesn't actually use any references. The RefScanner
provides almost the same information as the Scanner
but it does so without making any copies from the original javascript string, it the has the option to request the String
for any Item
giving the control to the user. Here is an example of the two approaches.
Example JS
function things() {
return [1,2,3,4];
}
Example Rust
use ress::{ Scanner }; fn main() { let js = include_str!("../example.js"); let scanner = Scanner::new(js); for (i, item) in scanner.enumerate() { let item = item.unwrap(); let prefix = if i < 10 { format!(" {}", i) } else { format!("{}", i) }; println!("{} token: {:?}", prefix, item.token); } } #[cfg(test)] mod test { use ress::*; #[test] fn chapter_1_1() { let js = "var i = 0;"; let scanner = Scanner::new(js); for token in scanner { println!("{:#?}", token.unwrap()); } } use ressa::Parser; #[test] fn ressa_ex1() { static JS: &str = " function Thing(stuff) { this.stuff = stuff; } "; let parser = Parser::new(JS).expect("Failed to create parser"); for part in parser { let part = part.expect("Failed to parse part"); println!("{:#?}", part); } } }
Output
running 1 test
Decl(
Func(
Func {
id: Some(
Ident {
name: "Thing",
},
),
params: [
Pat(
Ident(
Ident {
name: "stuff",
},
),
),
],
body: FuncBody(
[
Stmt(
Expr(
Assign(
AssignExpr {
operator: Equal,
left: Expr(
Member(
MemberExpr {
object: This,
property: Ident(
Ident {
name: "stuff",
},
),
computed: false,
},
),
),
right: Ident(
Ident {
name: "stuff",
},
),
},
),
),
),
],
),
generator: false,
is_async: false,
},
),
)
test test::ressa_ex1 ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out
Let's look at token 7, the original token is Token::Numeric(Number(String::From("1")))
while the ref token is Token::Numeric(Number::Dec)
, both give similar information but the ref token doesn't allocate a new string for the text being represented, instead just informing the user that it is a decimal number. If you wanted to know what that string was, you could use the RefScanner::string_for
method by passing it RefItem.span
, this will return an Option<String>
and so long as your span doesn't overflow the length of the js provided, it will have the value you are looking for.