An Overview
To get started building development tools using the Rust programming language, we are going to be utilizing 2 crates. The first is a crate called ress
or Rusty ECMAScript Scanner, this crate is used to convert JavaScript text into a series of Token
s. Next is ressa
or Rusty ECMAScript Syntax Analyzer, this crate will take that series of Token
s and build an Abstract Syntax Tree or AST. This AST is provided by resast
. Either of these tools will be useful for building development tools however since the output of ress
is essentially flat it means we can only build a much simpler kind of tool. Over the course of this book we will cover the basics of how to build a development tool with either of these crates.
- What is RESS
- Overview
- Demo Project
- What is RESSA
- Overview
- Demo Project
- What is RESW (maybe)
- Overview
RESS
impl Iterator for Scanner
- Converts text into
Token
s - Flat Structure
Before we start on any examples let's dig a little into what ress
does. The job of a scanner or tokenizer in the parsing process is to convert raw text or bytes into logically separated parts called tokens and ress
does just that. It reads your JavaScript text and then tells you what a given word or symbol might represent. It does this through the Scanner
interface, to construct a scanner you pass it the text you would like it to tokenize.
# #![allow(unused_variables)] #fn main() { let js = "var i = 0;"; let scanner = Scanner::new(js); #}
Now that you have prepared a scanner, how do we use it? Well, the Scanner
implements Iterator
so we can actually use it in a for loop like so.
# #![allow(unused_variables)] #fn main() { for token in scanner { println!("{:#?}", token); } #}
If we were to run the above program it would print to the terminal the following.
Item {
token: Keyword(
Var
),
span: Span {
start: 0,
end: 3
}
}
Item {
token: Ident(
Ident(
"i"
)
),
span: Span {
start: 4,
end: 5
}
}
Item {
token: Punct(
Assign
),
span: Span {
start: 6,
end: 7
}
}
Item {
token: Numeric(
Number(
"0"
)
),
span: Span {
start: 8,
end: 9
}
}
Item {
token: Punct(
SemiColon
),
span: Span {
start: 9,
end: 10
}
}
Item {
token: EoF,
span: Span {
start: 10,
end: 10
}
}
The scanner's ::next()
method returns an Item
which has 2 properties token
and span
. The span
is the byte index that starts and ends the token, the token
property is going to be one variant of the Token
enum which has the following variants.
Token::Boolean(BooleanLiteral)
- The texttrue
orfalse
Token::Ident(Ident)
- A variable, function, or class nameToken::Null
- The textnull
Token::Keyword(Keyword)
- One of the 42 reserved words e.g.function
,var
,delete
, etcToken::Numeric(Number)
- A number literal, this can be an integer, a float, scientific notation, binary notation, octal notation, or hexadecimal notation e.g.1.5e9
,0xfff
, etcToken::Punct(Punct)
- One of the 52 reserved symbols or combinations of symbols e.g.*
,&&
,=>
, etcToken::String(StringLit)
- Either a double or single quoted stringToken::RegEx(RegEx)
- A Regular Expression literal e.g./.+/g
Token::Template(Template)
- A template string literal e.g.one ${2} three
Token::Comment(Comment)
- A single line, multi-line or html comment
For a more in depth look at these tokens, take a look at the Appendix
Overall the output of our scanner isn't going to provide any context for these tokens, that means when we are building our development tools it is going to be a little harder to figure out what is going on with any given token. One way we could take that is to just build a tool that is only concerned with the token level of information. Say you work on a team of JavaScript developers that need to adhere to a strict code style because the organization needs their website to be usable in Internet Explorer 8. With that restriction there are a large number of APIs that are off the table, looking over this list we can see how big that really is. It could be useful to have a linter that will check for the keywords and identifiers that are not available in IE8. let's try and build one.
Building an IE8 Linter
To get started we need to add ress
to our dependencies. This project is also going to need serde
, serde_derive
and toml
because it will rely on a .toml
file to make the list of unavailable tokens configurable.
[package]
name = "lint-ie8"
version = "0.1.0"
authors = ["Robert Masen <r@robertmasen.pizza>"]
edition = "2018"
[dependencies]
ress = "0.6"
serde = "1"
serde_derive = "1"
toml = "0.5"
atty = "0.2"
Next we want to use the Scanner
and Token
from ress
.
# #![allow(unused_variables)] #fn main() { #[macro_use] extern crate serde_derive; use ress::{ Scanner, Token, }; #}
Since we are using a .toml
file to provide the list of banned tokens, let's create a struct that will represent our configuration.
# #![allow(unused_variables)] #fn main() { #[derive(Deserialize)] struct BannedTokens { idents: Vec<String>, keywords: Vec<String>, puncts: Vec<String>, strings: Vec<String>, } #}
The toml file we are going to use is pretty big so but if you want to see what it looks like you can check it out here. Essentially it is a list of identifiers, strings, punctuation, and keywords that would cause an error when trying to run in IE8.
To start we need to deserialize that file, we can do that with the std::fs::read_to_string
and toml::from_str
functions.
# #![allow(unused_variables)] #fn main() { let config_text = ::std::fs::read_to_string("banned_tokens.toml").expect("failed to read config"); let banned: BannedTokens = from_str(&config_text).expect("Failed to deserialize banned tokens"); #}
Now that we have a list of tokens that should not be included in our javascript, let's get that text. It would be useful to be able to take a path argument or read the raw js from stdin. The function will check for an argument first and fallback to reading from stdin, it looks something like this.
# #![allow(unused_variables)] #fn main() { fn get_js() -> Result<String, ::std::io::Error> { let mut cmd_args = args(); let _ = cmd_args.next(); //discard bin name let js = if let Some(file_name) = cmd_args.next() { let js = read_to_string(file_name)?; js } else { let mut std_in = ::std::io::stdin(); let mut ret = String::new(); if atty::is(atty::Stream::Stdin) { return Ok(ret) } std_in.read_to_string(&mut ret)?; ret }; Ok(js) } #}
we will call it like this.
# #![allow(unused_variables)] #fn main() { let js = match get_js() { Ok(js) => if js.len() == 0 { print_usage(); std::process::exit(1); } else { js }, Err(_) => { print_usage(); std::process::exit(1); } }; #}
We want to handle the failure when attempting to get the js, so we will match on the call to get_js
. If everything went well we need to check if the text is an empty string, this means no argument was provided but the program was not pipped any text. In either of these failure cases we want to print a nice message about how the command should have been written and then exit with a non-zero status code. print_usage
is a pretty simple function that will just print to stdout the two ways to use the program.
# #![allow(unused_variables)] #fn main() { fn print_usage() { println!("banned_tokens <infile> cat <path/to/file> | banned_tokens"); } #}
With that out of the way, we now can get into how we are going to solve the actual problem of finding these tokens in a javascript file. There are many ways to make this work but for this example we are going to wrap the Scanner
in another struct that implements Iterator
. First here is what that struct is going to look like.
# #![allow(unused_variables)] #fn main() { struct BannedFinder { scanner: Scanner, banned: BannedTokens, } #}
Before we get into the impl Iterator
we should go over an Error
implementation that we are going to use. It is relatively straight forward, the actual struct is going to be a tuple struct with three items. The first item is going to be a message that will include the token and type, the second and third are going to be the column/row of the banned token. We need to implement display (Error
requires it) which will just create a nice error message for us.
# #![allow(unused_variables)] #fn main() { #[derive(Debug)] pub struct BannedError(String, usize, usize); impl ::std::error::Error for BannedError { } impl ::std::fmt::Display for BannedError { fn fmt(&self, f: &mut ::std::fmt::Formatter) -> ::std::fmt::Result { write!(f, "Banned {} found at {}:{}", self.0, self.1, self.2) } } #}
The last thing we need to do is create a way to map from a byte index to a column/row pair. Thankfully the Scanner
exposes the original text as a property stream
so we can use that to figure out what line and column any index means. The first thing we need is the ability to tell when any given character is a new line character. JavaScript allows for 5 new line sequences (\r
,\n
, \r\n
, \u{2028}
, and \u{2029}
) so a function that would test for that might look like this.
# #![allow(unused_variables)] #fn main() { fn is_js_new_line(c: char) -> bool { c == '\n' || c == '\u{2028}' || c == '\u{2029}' } #}
Notice that we aren't testing for \r
, this could come back to bite us but for this example the \n
should be enough to catch \r\n
and for simplicity's sake we can just say that your team does not support the \r
new line. Now we can add a method to BannedFinder
that will take an index and return the row/column pair.
# #![allow(unused_variables)] #fn main() { impl BannedFinder { fn get_position(&self, idx: usize) -> (usize, usize) { let (row, line_start) = self.scanner.stream[..idx] .char_indices() .fold((1, 0), |(row, line_start), (i, c)| if is_js_new_line(c) { (row + 1, i) } else { (row, line_start) }); let col = if line_start == 0 { idx } else { idx.saturating_sub(line_start) }; (row, col) } } #}
We need to capture two pieces of information, the first step what row we are on the second is the index that row started at. We can get both pieces of information by using the char_indices
method on &str
which will give us an Iterator
over tuples the indices and char
s in the string. We then fold
that iterator into a single value, the row will start at 1 and the index will start at 0. If the current character is a new line we add one to the row and replace any previous index value, otherwise we move on. We are only counting the new lines from the start until the provided index, this will make sure we don't count any extra new lines. Now that we have the row number we need to calculate the column, if the line_start
is 0 that means we didn't find new lines so we can just assume it is the first line, meaning the index is already the column, otherwise we need to subtract the line_start
from the index.
Ok, now for the exciting part; we are going to impl Iterator for BannedFinder
which will look like this.
# #![allow(unused_variables)] #fn main() { impl Iterator for BannedFinder { type Item = Result<(), BannedError>; fn next(&mut self) -> Option<Self::Item> { if let Some(item) = self.scanner.next() { Some(match &item.token { Token::Ident(ref id) => { let id = id.to_string(); if self.banned.idents.contains(&id) { let (row, column) = self.get_position(item.span.start); Err(BannedError(format!("identifier {}", id), row, column)) } else { Ok(()) } }, Token::Keyword(ref key) => { if self.banned.keywords.contains(&key.to_string()) { let (row, column) = self.get_position(item.span.start); Err(BannedError(format!("keyword {}", key.to_string()), row, column)) } else { Ok(()) } }, Token::Punct(ref punct) => { if self.banned.puncts.contains(&punct.to_string()) { let (row, column) = self.get_position(item.span.start); Err(BannedError(format!("punct {}", punct.to_string()), row, column)) } else { Ok(()) } }, Token::String(ref lit) => { if self.banned.strings.contains(&lit.no_quote()) { let (row, column) = self.get_position(item.span.start); Err(BannedError(format!("string {}", lit.to_string()), row, column)) } else { Ok(()) } }, _ => Ok(()), }) } else { None } } } #}
First we need to define what the Item
for our Iterator
is. It is going to be a Result<(), BannedError>
, this will allow the caller to check if an item passed inspection. Now we can add the fn next(&mut self) -> Option<Self::Item>
definition. Inside that we first want to make sure that the Scanner
isn't returning None
, if it is we can just return None
. If the scanner returns and Item
we want to check what kind of token it is, we can do that by matching on &item.token
. We only care if the token is a Keyword
, Ident
, Punct
or String
, other wise we can say that the token passed. For each of these tokens we are going to check if the actual text is included in any of the Vec<String>
properties of self.banned
, if it is included we return a BannedError
where the first property is a message containing the name of the token type and the text that token represents.
Now that we have all of the underlying infrastructure setup, let's use the BannedFinder
in our main
.
# #![allow(unused_variables)] #fn main() { for item in finder { match item { Ok(_) => (), Err(msg) => println!("{}", msg), } } #}
That is pretty much it. If you wanted to see the full project you can find it in the lint-ie8 folder of this book's github repository.
Demo
RESSA
impl Iterator for Parser
- Converts stream of
Token
s into AST - Significantly more context
Before we get into how to use ressa
, It is a good idea to briefly touch on the scope of a parser or syntax analyzer. The biggest thing to understand is that we still are not dealing with the semantic meaning of the program. That means ressa
itself won't discover things like assigning to undeclared variables or attempting to call undefined functions because that would require more context. To that end, ressa
's true value isn't realized until it is embedded into another program that provide that context.
With that said ressa
is providing a larger context as compared to what is provided by ress
. It achieves that by wrapping the Scanner
in a struct called Parser
. Essentially Parser
provides a way to keep track of what any given set of Token
s might mean. Parser
also implements Iterator
over the enum ProgramPart
, which has 3 cases representing the 3 different top level JavaScript constructs.
Decl
- a variable/function/class declarationVariable
- A top level variable declaration e.g.let x = 0;
Class
- A named class definition at the top levelFunction
- A named function definition at the top levelImport
- An ES Module import statementExport
- An ES Module export statement
Dir
- A script directive, pretty much just 'use strict'Stmt
- A catch all for all other statementsBlock
- A collection of statements wrapped in curly bracesBreak
- A break statement will exit a loop or labeled statement earlyContinue
- A continue statement will short circuit a loopDebugger
- the literal textdebugger
DoWhile
- A do loop executes the body before testing whether to continueEmpty
- A single semicolonExpr
- A catch-all for everything elseFor
- A c-style for loop e.g.for (var i = 0; i < 100; i++) ;
ForIn
- A for loop that assigns the key of an enumerable at the top of each iterationForOf
- A for loop that assigns the value of an iterable at the top of each iterationIf
- A set of if/else if/else statementsLabeled
- A statement that has been named by an attached identifierReturn
- The return statement that resolves a function's valueSwitch
- A testExpression
and a collection ofCaseStatements
Throw
- The throw keyword followed by anExpression
Try
- A try/catch/finally block for catchingThrow
n itemsVar
- A non-top level variable declarationWhile
- A loop which continues based on a testExpression
With
- An antiquated statement that changes the order of identifier resolution
Stmt
being the real work-horse of the group, while a top level function definition would be a Decl
, a non-top level function definition would be a Statement
. Both Decl
and Statement
themselves are enums representing the different possible variations. Looking further into the Statement
variants, you may notice there is another catch all in the Expr
variant which contains an Expr
(expression) enum which defines an even more granular set of program parts.
Expression
Assignment
- Assigning a value to a variable, this includes any update & assign operations e.g.x = 1
,x +=1
, etcArray
- An array literal e.g.[1,2,3,4]
ArrowFunction
- An arrow function expressionAwait
- Any expression preceded by theawait
keywordCall
- Calling a function or methodClass
- A class expression is a class definition with an optional identifier that is assigned to a variable or used as an argument in aCall
expressionConditional
- Also known as the "ternary" operator e.g.test ? consequent : alternate
Function
- A function expression is a function definition with an optional identifier that is either self executing, assigned to a variable or used as aCall
argumentIdent
- The identifier of a variable, call argument, class, import, export or functionLiteral
- A primitive literalLogical
- Two expressions separated by&&
or||
Member
- Accessing a sub property on something. e.g.[0,1,2][1]
orconsole.log
MetaProperty
- Currently the onlyMetaProperty
is in a function body you can checknew.target
to see if something was called with thenew
keywordNew
- ACall
expression preceded by thenew
keywordObject
- An object literal e.g.{a: 1, b: 2}
Sequence
- Any sequence of expressions separated by commasSpread
- the...
operator followed by an expressionSuperExpression
- Thesuper
pseudo-keyword used for accessing properties of asuper
classTaggedTemplate
- An identifier followed by a template literal see MDN for more infoThisExpression
- Thethis
pseudo-keyword used for accessing instance propertiesUnary
- An operation (that is not an update) that requires on expression as an argument e.g.delete x
,!true
, etcUpdate
- An operation that uses the++
or--
operatorYield
- theyield
contextual keyword followed by an optional expression for use in generator function
Most of the Expr
, Stmt
, and Decl
variants have associated values, to see more information about them check out the documentation. There should be an example and description provided for each of the possible combinations.
With that long winded explanation of the basic structure we are working with let's take a look at how we would use the Parser
.
use ressa::*; static JS: &str = " function Thing(stuff) { this.stuff = stuff; } "; fn main() { let parser = Parser::new(JS).expect("Failed to create parser"); for part in parser { let part = part.expect("Failed to parse part"); println!("{:?}", part); } }
If we were to run the above we would get the following output.
Script([
Decl(
Function(
Function {
id: Some(
"Thing"
),
params: [
Pat(
Identifier(
"stuff"
)
)
],
body: [
Stmt(
Expr(
Assignment(
AssignmentExpr {
operator: Equal,
left: Expr(
Member(
MemberExpr {
object: This,
property: Ident(
"stuff"
),
computed: false
}
)
),
right: Ident(
"stuff"
)
}
)
)
)
],
generator: false,
is_async: false
}
)
)
])
If we walk through the output, we start by seeing that the
- This program consists of a single part which is a
ProgramPart::Decl
- Inside of that is a
Decl::Function
- Inside of that is a
Function
- It has an
id
, which is an optionalIdentifier
, with the value ofSome("Thing")
- It has a one item vec of
Pat
s inparams
- Which is a
Pat::Identifier
- Inside of that is an
Identifier
with the value of "stuff"
- Which is a
- It has a body that is a one item vec of
ProgramPart
s- The item is a
ProgramPart::Stmt
- Which is a
Stmt::Expr
- Inside of that is an
Expr::Assignment
- Inside of that is an
AssignmentExpr
- Which has an
operator
ofEqual
- The
left
hand side is anExpr::Member
- The
object
beingExpr::This
- The
property
beingExpr::Ident
with the value of "stuff"
- The
- The
right
hand side is anExpr::Ident
with the value of "stuff" computed
is false
- Which has an
- The item is a
- It is not a
generator
is_async
is false
- It has an
Phew! That is quite a lot of information! A big part of why we need to be that verbose is because of the "you can do anything" nature of JavaScript. Let's use the MemberExpr
as an example, below are a collection of ways to write a MemberExpr
in JavaScript.
console.log;
console['log'];
const logVar = 'log';
console[logVar];
console[['l','o','g'].join('')];
class Log {
toString() {
return 'log';
}
}
const logToString = new Log();
console[logToString];
function logFunc() {
return 'log';
}
console[logFunc()];
function getConsole() {
return console
}
getConsole()[logFunc()];
getConsole().log;
And with the way JavaScript has evolved this probably isn't an exhaustive list of ways to construct a MemberExpr
. With the level of information ressa
provides we have enough to truly understand the syntactic meaning of the text. This will enable us to build more powerful tools to analyze and/or manipulate any given JavaScript program. With the pervasiveness of print debugging, wouldn't it be nice if we had a tool that would automatically insert a console.log
at the top of every function and method in a program? We could make it print the name of that function and also each of the arguments, let's try and build one.
Building a Debug Helper
Demo
To simplify things, we are just going to lift the technique for getting the JavaScript text from the ress example, so we won't be covering that again.
With that out of the way let's take a look at the Cargo.toml
and use
statements for our program.
[package]
name = "console_logify"
version = "0.1.0"
authors = ["Robert Masen <r@robertmasen.pizza>"]
edition = "2018"
[dependencies]
ressa = "0.5"
atty = "0.2"
resw = "0.2"
resast = "0.2"
# #![allow(unused_variables)] #fn main() { use ressa::{ Parser, }; use std::{ io::Read, env::args, fs::read_to_string, }; use resw::Writer; #}
This will make sure that all of the items we will need from ressa
and resast
are in scope. Now we can start defining our method for inserting the debug logging into any functions that we find. To start we are going to create a function that will generate a new ProgramPart::Stmt
that will represent our call to console.log
which might look like this.
# #![allow(unused_variables)] #fn main() { } } prop } fn console_log(args: Vec<Expr>) -> ProgramPart { ProgramPart::Stmt( Stmt::Expr( Expr::call( Expr::member( Expr::ident("console"), Expr::ident("log"), false, #}
We need to make the arguments configurable so we can insert the context information for each instance of a function but otherwise it is a pretty straight forward. Now that we have that, we need to start digging into the ProgramPart
to identify anything we want to modify. Since Parser
implements Iterator
and its Item
is Result<ProgramPart, Error>
we first need to use filter_map
to extract the ProgramPart
from the result. It would probably be good to handle the error case here but for the sake of simplicity we are going to skip any errors. Now that we have an Iterator
over ProgramPart
s we can use map
to update each part.
fn main() { let js = get_js().expect("Unable to get JavaScript"); let parser = Parser::new(&js).expect("Unable to construct parser"); for part in parser.filter_map(|p| p.ok()).map(map_part) { //FIXME: Write updated program part to somewhere } }
With that in mind the entry point is going to be a function that takes a ProgramPart
and returns a new ProgramPart
. It might look like this
# #![allow(unused_variables)] #fn main() { fn map_part(part: ProgramPart) -> ProgramPart { match part { ProgramPart::Decl(ref decl) => ProgramPart::Decl(map_decl(decl)), ProgramPart::Stmt(ref stmt) => ProgramPart::Stmt(map_stmt(stmt)), ProgramPart::Dir(_) => part, } #}
We are going to match on the part provided and either return that part if it is a Directive
or if it isn't we need to investigate further to discover if it is a function or not. We do that in two places map_decl
and map_stmt
both of which are going to utilize similar method for digging further into the tree.
# #![allow(unused_variables)] #fn main() { fn map_decl(decl: &Decl) -> Decl { match decl { Decl::Function(ref f) => Decl::Function(map_func(f)), Decl::Class(ref class) => Decl::Class(map_class(class)), _ => decl.clone() } } fn map_stmt(stmt: &Stmt) -> Stmt { match stmt { Stmt::Expr(ref expr) => Stmt::Expr(map_expr(expr)), _ => stmt.clone(), } #}
There are two ways for a Decl
to resolve into a function or method and that is with the Function
and Class
variants while a Stmt
can end up there if it is an Expr
. When we include map_expr
we see that there are cases for both Function
and Class
in the Expr
enum. That means once we get past those we will be handling the rest in the exact same way.
# #![allow(unused_variables)] #fn main() { fn map_expr(expr: &Expr) -> Expr { match expr { Expr::Function(ref f) => Expr::Function(map_func(f)), Expr::Class(ref c) => Expr::Class(map_class(c)), _ => expr.clone(), } #}
Finally we are going to start manipulating the AST in map_func
.
# #![allow(unused_variables)] #fn main() { fn map_func(func: &Function) -> Function { let mut f = func.clone(); let mut args = vec![]; if let Some(ref name) = f.id { args.push( Expr::string(&format!("'{}'", name)) ); } for arg in f.params.iter().filter_map(|a| match a { FunctionArg::Expr(e) => match e { Expr::Ident(i) => Some(i), _ => None, }, FunctionArg::Pat(p) => match p { Pat::Identifier(i) => Some(i), _ => None, }, }) { args.push(Expr::ident(arg)); } f.body.insert( 0, console_log(args), ); f.body = f.body.into_iter().map(map_part).collect(); f } #}
The first thing we are going to do is to clone the func
to give us a mutable version. Next we are going to check if the id
is Some
, if it is we can add that name to our console.log
arguments. Now function arguments can be pretty complicated, to try and keep things simple we are going to only worry about the ones that are either Expr::Ident
or Pat::Identifier
. To build something more robust it might be good to include destructured arguments or arguments with default values but for this example we are just going to keep it simple.
First we are going to filter_map
the func.params
to only get the items that ultimately resolve to Identifer
s, at that point we can wrap all of these identifiers in an Expr::Ident
and add them to the console.log
args. Now we can simply insert the result of passing those args to console_log
at the first position of the func.body
. Because functions can appear in the body of other functions we also want to map all of the func.body
program parts. Once that has completed we can return the updated func
to the caller.
The next thing we are going to want to deal with is Class
, we want to insert console.log into the top of each method on a class. This is a bit unique because we also want to provide the name of that class (if it exists) as the first argument to console.log. That might look like this.
# #![allow(unused_variables)] #fn main() { } fn map_class(class: &Class) -> Class { let mut class = class.clone(); let prefix = if let Some(ref id) = class.id { id.clone() } else { String::new() }; class.body = class.body .iter() .map(|prop| map_class_prop(&prefix, prop)) .collect(); class } fn map_class_prop(prefix: &str, prop: &Property) -> Property { let mut prop = prop.clone(); let mut args = match prop.kind { PropertyKind::Ctor => { vec![Expr::string(&format!("'new {}'", prefix))] }, PropertyKind::Get => { vec![ Expr::string(&format!("'{}'", prefix)), Expr::string("get"), ] }, PropertyKind::Set => { vec![ Expr::string(&format!("'{}'", prefix)), Expr::string("set"), ] }, PropertyKind::Method => { vec![ Expr::string(&format!("'{}'", prefix)), ] }, _ => vec![], }; match &prop.key { PropertyKey::Expr(ref e) => { match e { Expr::Ident(i) => if i != "constructor" { args.push(Expr::string(&format!("'{}'", i))); }, _ => (), } }, PropertyKey::Literal(ref l) => { match l { Literal::Boolean(ref b) => { args.push(Expr::string(&format!("'{}'", b))); }, Literal::Null => { args.push(Expr::string("'null'")); }, Literal::Number(ref n) => { args.push(Expr::string(&format!("'{}'", n))); } Literal::RegEx(ref r) => { args.push(Expr::string(&format!("'/{}/{}'", r.pattern, r.flags))); }, Literal::String(ref s) => { if s != "constructor" { args.push(Expr::string(s)); } }, _ => (), } }, PropertyKey::Pat(ref p) => { match p { Pat::Identifier(ref i) => { args.push(Expr::string(&format!("'{}'", i))); }, _ => (), } }, } if let PropertyValue::Expr(ref mut expr) = prop.value { match expr { Expr::Function(ref mut f) => { for ref arg in &f.params { match arg { FunctionArg::Expr(ref expr) => { match expr { Expr::Ident(_) => args.push(expr.clone()), _ => (), } }, FunctionArg::Pat(ref pat) => { match pat { Pat::Identifier(ref ident) => { args.push(Expr::ident(ident)) }, _ => {}, } } } } f.body.insert(0, console_log(args) ) #}
Here we have two functions, the first pulls out the id from the provided class or uses an empty string of it doesn't exist. We then just pass that off to map_class_prop
which will handle all of the different types of properties a class can have. The first thing this does is map the prefix
into the right format, so a call to new Thing()
would print new Thing
, or a get method would print Thing get
before the method name. Next we take a look at the property.key
, this will provide us with the name of our function, but according to the specification a class property key can be an identifier, a literal value, or a pattern, so we need to figure out what the name of this method is by digging into that value. First in the case that it is an ident we want to add it to the args, unless it is the value constructor
because we already put the new
keyword in that one. Next we can pull out the literal values and add those as they appear. Lastly we will only handle the pattern case when it is a Pat::Identifier
otherwise we will just skip it. Now to get the parameter names from the method definition we need to look at the property.value
which should always be an Expr::Function
. Once we match on that we simply repeat the process of map_function
pulling the args out but only when they are Ident
s and then passing that along to console_log
and inserting that Expr
at the top of the function body.
At this point we have successfully updated our AST to include a call to console.log
at the top of each function and method in our code. Now the big question is how do we write that out to a file. This problem is not a small one, in the next section we are going to cover a third crate resw
that we can use to finish this project.
RESW
While ress
and ressa
consume text and generate data structures, resw
is going to consume data structures and write out text. This means it can do the heavy lifting when solving the problem our debug logging project left us with. However instead of just sweeping that under the rug, we are going to go over how resw
works. Because the nature of JavaScript, resw
makes some style decisions that might not work for everyone, by going over the project in detail the hope is that other's will feel enabled to either contribute a configuration option into resw
or even implement their own project that consumes ressa
's AST and generates text.
If you are just interested in seeing how we are going to finish the project from the last chapter, feel free to move ahead.
Similar to the structure of ressa
, resw
exposes a struct that will keep track of the context for us called Writer
. There are 2 methods for constructing a Writer
, the first is the ::new
method the second is the ::builder
method that utilizes the builder pattern to customize some options. Those options include
- New line character (default
\n
) - Quote (default to use origin quotation mark)
- Setting this to any value will force all of the string literals in the provided JavaScript to be re-written with the provided quotes
- Indent (default 4 spaces)
Either method you are going to need to provide the destination, this can be anything that implements the std::io::Write
trait. For testing purposes the crate provides an implementor of Write
in WriteString
, we are not going to cover that here but a more detailed explanation can be found in the appendix.
Once a Writer
is constructed, it provides an API surface that should cover most of the ressa
AST. The primary entry-point for is going to be either write_program
or write_part
. For the most part, the primary role of the writer is going to be incrementally move down the AST until we find something that we are confident in exactly what to write. Let's take the following js as an example.
function Thing(stuff) {
this.stuff = stuff;
}
let thing = new Thing('argument');
If we run that that through the ressa::Parser
, we would see the following AST.
Decl(
Function(
Function {
id: Some(
"Thing"
),
params: [
Pat(
Identifier(
"stuff"
)
)
],
body: [
Stmt(
Expr(
Assignment(
AssignmentExpr {
operator: Equal,
left: Expr(
Member(
MemberExpr {
object: ThisExpr,
property: Ident(
"stuff"
),
computed: false
}
)
),
right: Ident(
"stuff"
)
}
)
)
)
],
generator: false,
is_async: false
}
)
)
Decl(
Variable(
Let,
[
VariableDecl {
id: Identifier(
"thing"
),
init: Some(
New(
NewExpr {
callee: Ident(
"Thing"
),
arguments: [
Literal(
String(
"\'argument\'"
)
)
]
}
)
)
}
]
)
)
Using that, let's take a look at how resw
would generate the text to represent our AST. First we would enter at write_part
with the first ProgramPart
.
# #![allow(unused_variables)] #fn main() { pub fn write_part(&mut self, part: &ProgramPart) -> Res { self.at_top_level = true; self._write_part(part)?; self.write_new_line()?; Ok(()) } #}
Interestingly enough, write_part
is really more concerned with maintaining a context flag for if we are at the top level or not, this becomes important when trying to determine if any expression needs to be wrapped in parentheses. Almost all of the work is going to be passed off to an internal private function _write_part
.
# #![allow(unused_variables)] #fn main() { fn _write_part(&mut self, part: &ProgramPart) -> Res { self.write_leading_whitespace()?; match part { ProgramPart::Decl(decl) => self.write_decl(decl)?, ProgramPart::Dir(dir) => self.write_directive(dir)?, ProgramPart::Stmt(stmt) => self.write_stmt(stmt)?, } Ok(()) } #}
The first thing we want to do is make sure that any leading whitespace is included with write_leading_whitespace
.
# #![allow(unused_variables)] #fn main() { pub fn write_leading_whitespace(&mut self) -> Res { self.write(&self.indent.repeat(self.current_indent))?; Ok(()) } #}
This is achieved by looking at the current_indent
and writing the configurable property indent
to the destination repeated the for our current indent level, so if our indent was \t
and we were at level 2 it would write "\t\t"
. Internally the write
method just writes a single &str
to the destination. After we write our leading whitespace, we can start to descend the AST, we do that by matching on the part. You can see that there is a branch for each of the possible enum variants, looking back at the example, we know the next step would be to head to write_decl
.
# #![allow(unused_variables)] #fn main() { pub fn write_decl(&mut self, decl: &Decl) -> Res { match decl { Decl::Variable(ref kind, ref decls) => self.write_variable_decls(kind, decls)?, Decl::Class(ref class) => { self.at_top_level = false; self.write_class(class)?; self.write_new_line()?; }, Decl::Function(ref func) => { self.at_top_level = false; self.write_function(func)?; self.write_new_line()?; }, Decl::Export(ref exp) => self.write_export_decl(exp)?, Decl::Import(ref imp) => self.write_import_decl(imp)?, }; Ok(()) } #}
Moving further down we simply match on the the declaration handling each variant as needed. For our example we would move into the Decl::Function
branch. The first step in that branch is to set the context flag at_top_level
to false
and then move into the write_function
method.
# #![allow(unused_variables)] #fn main() { pub fn write_function(&mut self, func: &Function) -> Res { if func.is_async { self.write("async ")?; } self.write("function")?; if let Some(ref id) = func.id { self.write(" ")?; if func.generator { self.write("*")?; } self.write(id)?; } else if func.generator { self.write("*")?; } self.write_function_args(&func.params)?; self.write(" ")?; self.write_function_body(&func.body) } #}
Here we are going to actually start writing some information out to our destination. First is we check the flag on Function
to see if we need to write the async
keyword, next we write the keyword function
followed by a check to see if the id is Some
. If so we need to check the flag on Function
to see if that function is a generator, if it is we need to add a *
before the id, and Lastly we write the id
Now that we have gotten though that we can start to look at the parameters and body. First we are going to pass off the parameters to write_function_args
.
# #![allow(unused_variables)] #fn main() { /// Write the arguments of a function or method definition /// ```js /// function(arg1, arg2) { /// } /// ``` pub fn write_function_args(&mut self, args: &[FunctionArg]) -> Res { self.write("(")?; let mut after_first = false; for ref arg in args { if after_first { self.write(", ")?; } else { after_first = true; } self.write_function_arg(arg)?; } self.write(")")?; Ok(()) } #}
The first step here is to write the open parenthesis, next we are going to use a flag after_first
to help with handing if a comma should be written before the argument. This is the first place that we have seen where resw
is making a style choice, all function parameters will not include a trailing comma. Ideally style choices will be configurable in the future but currently this one is not. Now that we have handled the comma situation we can pass the argument off to write_function_arg
.
# #![allow(unused_variables)] #fn main() { pub fn write_function_arg(&mut self, arg: &FunctionArg) -> Res { match arg { FunctionArg::Expr(ref ex) => self.write_expr(ex)?, FunctionArg::Pat(ref pa) => self.write_pattern(pa)?, } Ok(()) } #}
Here we see another function that simply move us further down the AST. Function arguments can be either expressions or patterns so we need to handle both. For our example we are going to head down the Pat
branch with write_pattern
.
# #![allow(unused_variables)] #fn main() { pub fn write_pattern(&mut self, pattern: &Pat) -> Res { match pattern { Pat::Identifier(ref i) => self.write(i), Pat::Object(ref o) => self.write_object_pattern(o), Pat::Array(ref a) => self.write_array_pattern(a.as_slice()), Pat::RestElement(ref r) => self.write_rest_element(r), Pat::Assignment(ref a) => self.write_assignment_pattern(a), } } #}
Most of the options here are simply going to continue branching down our AST, however for our example we are going to head down the first match arm with Pat::Identifer
and just write that string out to our destination.
Moving back up we only had one parameter for our function signature so we finish out write_function_args
with a closing parenthesis. That then leads us to write_function_body
.
# #![allow(unused_variables)] #fn main() { pub fn write_function_body(&mut self, body: &FunctionBody) -> Res { if body.len() == 0 { self.write("{ ")?; } else { self.write_open_brace()?; self.write_new_line()?; } for ref part in body { self._write_part(part)?; } if body.len() == 0 { self.write("}")?; } else { self.write_close_brace()?; } Ok(()) } #}
The first thing we need to do is take a look at the &FunctionBody
which is a type alias for Vec<ProgramPart>
. We check to see if this function has any body, if not we just write a single open curly brace, if it does we want to write the curly brace using write_open_brace
, this is a convenience method for writing the character and also incrementing the current_indent
, lastly we write a new line. Now we loop over each of the ProgramPart
s in body
and pass that off to _write_body
. For our example there is only going to be one part. This part is a ProgramPart::Stmt
which would be handled by write_stmt
.
# #![allow(unused_variables)] #fn main() { pub fn write_stmt(&mut self, stmt: &Stmt) -> Res { let mut semi = true; let mut new_line = true; let cached_state = self.at_top_level; match stmt { Stmt::Empty => { new_line = false; }, Stmt::Debugger => self.write_debugger_stmt()?, Stmt::Expr(ref stmt) => { let wrap = match stmt { Expr::Literal(_) | Expr::Object(_) | Expr::Function(_) | Expr::Binary(_) => true, _ => false, }; if wrap { self.write_wrapped_expr(stmt)? } else { self.write_expr(stmt)? } }, Stmt::Block(ref stmt) => { self.at_top_level = false; self.write_block_stmt(stmt)?; semi = false; new_line = false; self.at_top_level = cached_state; } Stmt::With(ref stmt) => { self.write_with_stmt(stmt)?; semi = false; } Stmt::Return(ref stmt) => self.write_return_stmt(stmt)?, Stmt::Labeled(ref stmt) => { self.write_labeled_stmt(stmt)?; semi = false; } Stmt::Break(ref stmt) => self.write_break_stmt(stmt)?, Stmt::Continue(ref stmt) => self.write_continue_stmt(stmt)?, Stmt::If(ref stmt) => { self.write_if_stmt(stmt)?; semi = false; } Stmt::Switch(ref stmt) => { self.at_top_level = false; self.write_switch_stmt(stmt)?; semi = false; } Stmt::Throw(ref stmt) => self.write_throw_stmt(stmt)?, Stmt::Try(ref stmt) => { self.write_try_stmt(stmt)?; semi = false; } Stmt::While(ref stmt) => { new_line = self.write_while_stmt(stmt)?; semi = false; } Stmt::DoWhile(ref stmt) => self.write_do_while_stmt(stmt)?, Stmt::For(ref stmt) => { self.at_top_level = false; new_line = self.write_for_stmt(stmt)?; semi = false; } Stmt::ForIn(ref stmt) => { self.at_top_level = false; new_line = self.write_for_in_stmt(stmt)?; semi = false; } Stmt::ForOf(ref stmt) => { self.at_top_level = false; new_line = self.write_for_of_stmt(stmt)?; semi = false; } Stmt::Var(ref stmt) => self.write_var_stmt(stmt)?, }; if semi { self.write_empty_stmt()?; } if new_line { self.write_new_line()?; } self.at_top_level = cached_state; Ok(()) } #}
That is a pretty big match statement! Before we enter that we have a couple of context flags to help us with formatting write_semi
and new_line
, both with a default value of true
. Looking at our example, we would enter the Stmt::Expr
arm of the match which handles handles the possible requirement that this statement be wrapped in parentheses. Primitive literals, object literals, functions, and binary operations would require parentheses when not part of a larger statement. There is a convenience method called write_wrapped_expr
that just writes parentheses around a call to write_expr
.
# #![allow(unused_variables)] #fn main() { pub fn write_expr(&mut self, expr: &Expr) -> Res { let cached_state = self.at_top_level; match expr { Expr::Literal(ref expr) => self.write_literal(expr)?, Expr::This => self.write_this_expr()?, Expr::Super => self.write_super_expr()?, Expr::Array(ref expr) => self.write_array_expr(expr)?, Expr::Object(ref expr) => self.write_object_expr(expr)?, Expr::Function(ref expr) => { self.at_top_level = false; self.write_function(expr)?; self.at_top_level = cached_state; } Expr::Unary(ref expr) => self.write_unary_expr(expr)?, Expr::Update(ref expr) => self.write_update_expr(expr)?, Expr::Binary(ref expr) => self.write_binary_expr(expr)?, Expr::Assignment(ref expr) => { self.at_top_level = false; self.write_assignment_expr(expr)? }, Expr::Logical(ref expr) => self.write_logical_expr(expr)?, Expr::Member(ref expr) => self.write_member_expr(expr)?, Expr::Conditional(ref expr) => self.write_conditional_expr(expr)?, Expr::Call(ref expr) => self.write_call_expr(expr)?, Expr::New(ref expr) => self.write_new_expr(expr)?, Expr::Sequence(ref expr) => self.write_sequence_expr(expr)?, Expr::Spread(ref expr) => self.write_spread_expr(expr)?, Expr::ArrowFunction(ref expr) => { self.at_top_level = false; self.write_arrow_function_expr(expr)?; self.at_top_level = cached_state; } Expr::Yield(ref expr) => self.write_yield_expr(expr)?, Expr::Class(ref expr) => { self.at_top_level = false; self.write_class(expr)?; self.at_top_level = cached_state; } Expr::MetaProperty(ref expr) => self.write_meta_property(expr)?, Expr::Await(ref expr) => self.write_await_expr(expr)?, Expr::Ident(ref expr) => self.write_ident(expr)?, Expr::TaggedTemplate(ref expr) => self.write_tagged_template(expr)?, _ => unreachable!(), } Ok(()) } #}
The first step here is to keep a copy of the previous at_top_level
flag so that we can revert back to it after writing, some of the arms are going to change it. Next we enter another very large match statement. Our example would take the Expr::Assignment
arm, passing further work off to write_assignment_expr
.
# #![allow(unused_variables)] #fn main() { pub fn write_assignment_expr(&mut self, assignment: &AssignmentExpr) -> Res { let wrap_self = match &assignment.left { AssignmentLeft::Expr(ref e) => match &**e { Expr::Object(_) | Expr::Array(_) => true, _ => false, }, AssignmentLeft::Pat(ref p) => match p { Pat::Array(_) => true, Pat::Object(_) => true, _ => false, } }; if wrap_self { self.write("(")?; } match &assignment.left { AssignmentLeft::Expr(ref e) => self.write_expr(e)?, AssignmentLeft::Pat(ref p) => self.write_pattern(p)?, } self.write(" ")?; self.write_assignment_operator(&assignment.operator)?; self.write(" ")?; self.write_expr(&assignment.right)?; if wrap_self { self.write(")")?; } Ok(()) } #}
Here we are first we need to determine if the whole assignment expression needs to be wrapped in parentheses which would only be true if the left hand side was an object or array literal. Next we test the assignment.left
property since it can be either an Expr
or a Pat
, our example would take us back to the write_expr
method. This would take us back up through write_expr
but this time we would pass into the Expr::Member
arm which passes its work off to write_member_expr
.
# #![allow(unused_variables)] #fn main() { pub fn write_member_expr(&mut self, member: &MemberExpr) -> Res { match &*member.object { Expr::Assignment(_) | Expr::Literal(Literal::Number(_)) | Expr::Conditional(_) | Expr::Logical(_) | Expr::Function(_) | Expr::ArrowFunction(_) | Expr::Object(_) | Expr::Binary(_) | Expr::Unary(_) | Expr::Update(_) => self.write_wrapped_expr(&member.object)?, _ => self.write_expr(&member.object)?, } if member.computed { self.write("[")?; } else { self.write(".")?; } self.write_expr(&member.property)?; if member.computed { self.write("]")?; } Ok(()) } #}
Here we first check to see if the object
property is required to be wrapped in parentheses for us though we just want to pass that along to write_expr
. This time though there we are going to end up at Expr::ThisExpr
which just writes out the literal word this
. Next we are going to look at the flag on MemberExpr
"computed" to see if this was written originally with the bracket notation (this['stuff']
) or the dot notation (this.stuff
), writing the appropriate character. Now we are again going to pass some work back to write_expr
, this time with the property
property. This would end on the branch for Expr::Ident
which just writes that value to the destination. If the member expression was computed we would need to write the ]
but for our example it is not.
At this point we are back up at write_assignment_expr
where we are going to write a single space and then pass the assignment.operator
off to write_assignment_operator
.
# #![allow(unused_variables)] #fn main() { pub fn write_assignment_operator(&mut self, op: &AssignmentOperator) -> Res { let s = match op { AssignmentOperator::AndEqual => "&=", AssignmentOperator::DivEqual => "/=", AssignmentOperator::Equal => "=", AssignmentOperator::LeftShiftEqual => "<<=", AssignmentOperator::MinusEqual => "-=", AssignmentOperator::ModEqual => "%=", AssignmentOperator::OrEqual => "|=", AssignmentOperator::PlusEqual => "+=", AssignmentOperator::PowerOfEqual => "**=", AssignmentOperator::RightShiftEqual => ">>=", AssignmentOperator::TimesEqual => "*=", AssignmentOperator::UnsignedRightShiftEqual => ">>>=", AssignmentOperator::XOrEqual => "^=", }; self.write(s)?; Ok(()) } #}
This is a relatively straight forward process of looking at which operator was provided and then writing out the text that represents that operator. For our example it would be =
, we then need to write a single space. The last step in write_assignment_expr
is to handle the assignment.right
which is also an Expr
so we pass that off to write_expr
. Our example will head to the Expr::Ident
match arm and then just write to the destination. With that we have now reached the last step in write_function_body
which is to write_close_brace
similar to write_open_brace
here we are decrementing the current_indent
context property. That also brings us to the end of write_function
, write_decl
, and _write_part
. The last thing we do in write_part
is to add a trailing new line, another style choice.
As our example continues we would then start again at write_part
with the next part. This is going to move though _write_part
the same as before, however when we get to write_decl
we have a new branch to head down. This is the Decl::Variable
arm which passes its work off to write_variable_decls
.
# #![allow(unused_variables)] #fn main() { pub fn write_variable_decls(&mut self, kind: &VariableKind, decls: &[VariableDecl]) -> Res { self.write_variable_kind(kind)?; let mut after_first = false; for decl in decls { if after_first { self.write(", ")?; } else { after_first = true; } self.write_variable_decl(decl)?; } self.write_empty_stmt()?; self.write_new_line() } #}
As you might expect the first thing we want to do is to write the variable kind. We pass off the kind
variable to write_variable_kind
.
# #![allow(unused_variables)] #fn main() { pub fn write_variable_kind(&mut self, kind: &VariableKind) -> Res { let s = match kind { VariableKind::Const => "const ", VariableKind::Let => "let ", VariableKind::Var => "var ", }; self.write(s) } #}
Similar to our examination of write_assignment_operator
we are going to simply look at which keyword was used and then write that out, with a trailing space.
Next we need to keep track of two flags after_first
which should be familiar from write_function_args
. In our loop, we pass of each of the declarations to write_variable_decl
.
# #![allow(unused_variables)] #fn main() { pub fn write_variable_decl(&mut self, decl: &VariableDecl) -> Res { self.write_pattern(&decl.id)?; if let Some(ref init) = decl.init { self.write(" = ")?; self.write_expr(init)?; } Ok(()) } #}
Here we first write out the id of this variable by passing it off to write_pattern
. Thankfully our example is pretty simple so we are again going to take that first branch for Pat::Ident
and write the identifer to our destination. After that we want to check if this variable is initialized, ours is, and if so we would write the " = " and then write the expression by passing that off to write_expr
. For this pass through write_expr
we are going to travel down the Expr::New
arm which passes its work off to write_new_expr
.
# #![allow(unused_variables)] #fn main() { pub fn write_new_expr(&mut self, new: &NewExpr) -> Res { self.write("new ")?; match &*new.callee { Expr::Assignment(_) | Expr::Call(_) => self.write_wrapped_expr(&new.callee)?, _ => self.write_expr(&new.callee)?, } self.write_sequence_expr(&new.arguments)?; Ok(()) } #}
At this point we want to first write the new
keyword followed by a space. Next we want to write out what the new.callee
is which would again bring us to write_expr
. Our example would travel to the Expr::Ident
arm which just writes that out. Next we need to write an open parenthesis followed by the provided arguments. This time we are going to use the write_sequence_expr
method to do that.
# #![allow(unused_variables)] #fn main() { pub fn write_sequence_expr(&mut self, sequence: &[Expr]) -> Res { let mut after_first = false; self.write("(")?; for ref e in sequence { if after_first { self.write(", ")?; } self.write_expr(e)?; after_first = true; } self.write(")")?; Ok(()) } #}
At this point the structure of this function's body should look familiar, we are going to loop over the provide expressions and write them out with a comma and space before all but the first one. For our example we are going only hit this once so no comma, then we are going to pass that off to write_expr
. This time as we pass through the match in write_expr
we are going to hit the Expr::Literal
arm which passes its work off to write_literal
.
# #![allow(unused_variables)] #fn main() { pub fn write_literal(&mut self, lit: &Literal) -> Res { match lit { Literal::Boolean(b) => self.write_bool(*b), Literal::Null => self.write("null"), Literal::Number(n) => self.write(&n), Literal::String(s) => self.write_string(s), Literal::RegEx(r) => self.write_regex(r), Literal::Template(t) => self.write_template(t), } } #}
Here we see another match statement, our example will take us down the Literal::String
arm which passes off work to write_string
. You may be wondering why that is, since writing strings is all we have really been doing. The answer is that this is one of the few style preferences that is currently configurable as you'll see.
# #![allow(unused_variables)] #fn main() { pub fn write_string(&mut self, s: &str) -> Res { if let Some(c) = self.quote { self.re_write_string(s, c)?; } else { self.write(s)?; } Ok(()) } #}
We first check to see if the self.quote
property has been set, this would indicate that the user has a quote preference. If it is set then we want to re-write the string to use this quote, this involves re-writing any internal escaped quotes for the old quote and escaping the new quote that might appear in the contents. If that property is None
then we would just write it out normally as the ressa::node::Literal::String
preserves the original quotation mark.
After that we are again back at write_new_expr
where the last thing to do is write the closing parenthesis, after which we are at the bottom of write_variable_decl
. When we move up again to the write_variable_decls
we would write a semi-colon and new line to close that out. This brings us to the bottom of write_decl
, _write_part
, and write_part
, it also brings us to the end of our example JavaScript. While we didn't touch every part of how resw
works, there is a lot of surface area to cover, hopefully it has provided enough information for you feel confident in how it works. For more information you can check out the ressa
docs and the resw
docs.
Up next we are going to see how you would use resw
to complete our debug log helper.
Writer
takesProgramPart
s- Somewhat Configurable
- Writes to
impl Write
Building a Writer
Thankfully because of the existence of resw
completing the console.log
debugging tool is going to be trivial. The primary entry point for resw
is the Writer
struct, which has a method write_part
that will take a &mut self
and &ProgramPart
, so we can use that in our for loop to write out the parts as they are parsed. That might look like this.
use resast::prelude::*; fn main() { let js = get_js().expect("Unable to get JavaScript"); let parser = Parser::new(&js).expect("Unable to construct parser"); let mut writer = Writer::new(::std::io::stdout()); for part in parser.filter_map(|p| p.ok()).map(map_part) { writer.write_part(&part).expect("Failed to write part"); }
With that complete we can see how well it works for us. Let's use the following example JavaScript.
function Thing(stuff) {
this.stuff = stuff;
}
let x = new Thing('argument');
Just as a simple test we could enter the following into our terminal
$ echo "function Thing(stuff) {
this.stuff = stuff;
}
let x = new Thing('argument');
" | console_logify
function Thing(stuff) {
console.log('Thing', stuff);
this.stuff = stuff;
}
let x = new Thing('argument!');
That looks exactly like the output we were looking for. Let's double check that it will behave as expected by piping the output to node
$ echo "function Thing(stuff) {
this.stuff = stuff;
}
let x = new Thing('argument');
" | console_logify | node -
Thing argument
It worked!
Demo
Conclusion
Hopefully now you have the all you to get started building your JavaScript development tools using Rust. If you do create one please open an issue on this project's GitHub issues page with the project's name, a short description, and a link, and it will be added to the appendix.
If you run into any problems in any crates (including typos in this book) it would be wonderful of you to open an issue on GitHub.
If you want to get involved, there are probably a few open issues that could use some help. Each project does provide contributing guidelines.
- Annotated version of this presentation
- https://FreeMasen.github.io/rusty-ecma-book
- Where to find me
- email: r.f.masen@gmail.com
- website: https://WiredForge.com
- twitter/github: @FreeMasen
Appendix
Tokens
Here is a list of all of the possible tokens ress
provides
Token
EoF
Boolean
-enum BooleanLiteral
True
False
Ident
-struct Ident(String)
Keyword
-enum Keyword
Await
Break
Case
Catch
Class
Const
Continue
Debugger
Default
Delete
Do
Else
Enum
Export
Finally
For
Function
If
Implements
Import
In
InstanceOf
Interface
Let
New
Package
Private
Protected
Public
Return
Static
Super
Switch
This
Throw
Try
TypeOf
Var
Void
While
With
Yield
Null
Numeric
-struct Number(String)
0
.0
0.0
0.0e1
0.0E1
.0e1
.0E1
0xfff
0Xfff
0o777
0O777
0b111
0B111
Punct
-enum Punct
And
-&
Assign
-=
Asterisk
-*
BitwiseNot
-~
Caret
-^
CloseBrace
-}
CloseBracket
-]
CloseParen
-)
Colon
-:
Comma
-,
ForwardSlash
-/
GreaterThan
->
LessThan
-<
Minus
--
Modulo
-%
Not
-!
OpenBrace
-{
OpenBracket
-[
OpenParen
-(
Period
-.
Pipe
-|
Plus
-+
QuestionMark
-?
SemiColon
-;
Spread
-...
UnsignedRightShiftAssign
->>>=
StrictEquals
-===
StrictNotEquals
-!==
UnsignedRightShift
->>>
LeftShiftAssign
-<<=
RightShiftAssign
->>=
ExponentAssign
-**=
LogicalAnd
-&&
LogicalOr
-||
Equal
-==
NotEqual
-!=
AddAssign
-+=
SubtractAssign
--=
MultiplyAssign
-*=
DivideAssign
-/=
Increment
-++
Decrement
---
LeftShift
-<<
RightShift
->>
BitwiseAndAssign
-&=
BitwiseOrAssign
-|=
BitwiseXOrAssign
-^=
ModuloAssign
-%=
FatArrow
-=>
GreaterThanEqual
->=
LessThanEqual
- `<=Exponent
-**
String
-enum StringLit
Single(String)
Double(String)
Regex
-struct Regex
body
-String
flags
-Option<String>
Template
-enum Template
,NoSub(String)
Head(String)
Middle(String)
Tail(String)
Comment
-struct Comment
kind
-enum Kind
Single
-//comment
Multi
-/* comment */
Html
-<!-- comment --> trailing content
content
-String
tail_content
-Option<String>
banned_tokens.toml
idents = [
"Int8Array",
"Uint8Array",
"Uint8ClampedArray",
"Int16Array",
"Uint16Array",
"Int32Array",
"Uint32Array",
"Float32Array",
"Float64Array",
"Promise",
"Proxy",
"async",
"padStart",
"padEnd",
"includes",
"find",
"getComputedStyle",
"FontFace",
"FontFaceSet",
"FontFaceSetLoadEvent",
"MediaSource",
"sourceBuffers",
"activeSourceBuffers",
"readyState",
"duration",
"onsourceclose",
"onsourceended",
"addSourceBuffer",
"removeSourceBuffer",
"endOfStream",
"setLiveSeekableRange",
"clearLiveSeekableRange",
"isTypeSupported",
"TouchEvent",
"Touch",
"TouchList",
"onpointerover",
"onpointerenter",
"onpointerdown",
"onpointermove",
"onpointerup",
"onpointercancel",
"onpointerout",
"onpointerleave",
"ongotpointercapture",
"onlostpointercapture",
"setPointerCapture",
"releasePointerCapture",
"MutationObserver",
]
keywords = [
"let",
"const",
"class",
"await",
"import",
"export",
"yield",
]
puncts = [
"=>",
"**",
"...",
"`",
]
strings = [
"use strict",
"sourceopen",
"touchstart",
"touchend",
"touchmove",
"touchcancel",
"pointerenter",
"pointerdown",
"pointermove",
"pointerup",
"pointercancel",
"pointerout",
"pointerleave",
"gotpointercapture",
"lostpointercapture",
"pointerover",
]
AST
While it may be a bit of a cop-out, it seems silly to duplicate the AST docs provided by cargo-doc. In the future this page may include some more introspective information but for now please refer to the link below.
StringWriter
When building resw
it became clear that the only way to validate the output would be to write a bunch of files to disk and then read them back which didn't seem like the correct option. Because of this resw
includes an public module called write_str
. In it you will find two structs WriteString
and ChildWriter
. The basic idea here is that you can use this to simply write the values to a buffer that the resw::Writer
hasn't taken ownership over and then read them back after the Writer
is done. Below is an example of how you might use that.
# #![allow(unused_variables)] #fn main() { fn test_round_trip() { let original = "let x = 0"; let dest = WriteString::new(); let parser = ressa::Parser::new(original).expect("Failed to create parser"); let writer = resw::Writer::new(dest.generate_child()); for part in parser { let part = part.expect("failed to parse part"); writer.write_part(part).expect("failed to write part"); } assert_eq!(dest.get_string_lossy(), original.to_string()); } #}
Projects
name | description | website |
---|---|---|
console_logger | A utility that will insert console.log to the top of all of your function bodies | repo |
lint-ie8 | A utility that will check for any javascript that would fail when executed by Internet Explorer 8 | repo |
RESS
Scanners
In the initial implementation of the ress
scanner, it was more important to get something working correctly than to have something blazing fast. To that end, the original Scanner
performs a significant amount of memory allocation, which slows everything down quite a bit. To improve upon that ress
offers a section option the RefScanner
, which is a bit unfortunately named as it doesn't actually use any references. The RefScanner
provides almost the same information as the Scanner
but it does so without making any copies from the original javascript string, it the has the option to request the String
for any Item
giving the control to the user. Here is an example of the two approaches.
Example JS
function things() {
return [1,2,3,4];
}
Example Rust
use ress::{ Scanner, refs::RefScanner, }; fn main() { let js = include_str!("../example.js"); let scanner = Scanner::new(js); let ref_scanner = RefScanner::new(js); for (i, (item, ref_item) )in scanner.zip(ref_scanner).enumerate() { let prefix = if i < 10 { format!(" {}", i) } else { format!("{}", i) }; println!("{} token: {:?}", prefix, item.token); println!(" ref: {:?}", ref_item.token); } }
Output
0 token: Keyword(Function)
ref: Keyword(Function)
1 token: Ident(Ident("things"))
ref: Ident
2 token: Punct(OpenParen)
ref: Punct(OpenParen)
3 token: Punct(CloseParen)
ref: Punct(CloseParen)
4 token: Punct(OpenBrace)
ref: Punct(OpenBrace)
5 token: Keyword(Return)
ref: Keyword(Return)
6 token: Punct(OpenBracket)
ref: Punct(OpenBracket)
7 token: Numeric(Number("1"))
ref: Numeric(Dec)
8 token: Punct(Comma)
ref: Punct(Comma)
9 token: Numeric(Number("2"))
ref: Numeric(Dec)
10 token: Punct(Comma)
ref: Punct(Comma)
11 token: Numeric(Number("3"))
ref: Numeric(Dec)
12 token: Punct(Comma)
ref: Punct(Comma)
13 token: Numeric(Number("4"))
ref: Numeric(Dec)
14 token: Punct(CloseBracket)
ref: Punct(CloseBracket)
15 token: Punct(SemiColon)
ref: Punct(SemiColon)
16 token: Punct(CloseBrace)
ref: Punct(CloseBrace)
17 token: EoF
ref: EoF
Let's look at token 7, the original token is Token::Numeric(Number(String::From("1")))
while the ref token is Token::Numeric(Number::Dec)
, both give similar information but the ref token doesn't allocate a new string for the text being represented, instead just informing the user that it is a decimal number. If you wanted to know what that string was, you could use the RefScanner::string_for
method by passing it RefItem.span
, this will return an Option<String>
and so long as your span doesn't overflow the length of the js provided, it will have the value you are looking for.