RESS
$slides-only$
impl Iterator for Scanner
- Converts text into
Token
s - Flat Structure
$slides-only-end$
$web-only$
Before we start on any examples let's dig a little into what
ress
does. The job of a scanner (sometimes called a tokenizer or lexer) in the parsing process is to convert raw text or bytes into logically separated parts called tokens andress
does just that. It reads your JavaScript text and then tells you what a given word or symbol might represent. It does this through theScanner
interface, to construct a scanner you pass it the text you would like it to tokenize.
$web-only-end$
#![allow(unused)] fn main() { let js = "var i = 0;"; let scanner = Scanner::new(js); }
$web-only$
Now that you have prepared a scanner, how do we use it? Well, the Scanner
implements Iterator
so we can actually use it in a for loop like so.
#![allow(unused)] fn main() { for token in scanner { println!("{:#?}", token); } }
If we were to run the above program it would print to the terminal the following. $web-only-end$
Item {
token: Keyword(
Var,
),
span: Span {
start: 0,
end: 3,
},
location: SourceLocation {
start: Position {
line: 1,
column: 1,
},
end: Position {
line: 1,
column: 4,
},
},
}
Item {
token: Ident(
Ident(
"i",
),
),
span: Span {
start: 4,
end: 5,
},
location: SourceLocation {
start: Position {
line: 1,
column: 5,
},
end: Position {
line: 1,
column: 6,
},
},
}
Item {
token: Punct(
Equal,
),
span: Span {
start: 6,
end: 7,
},
location: SourceLocation {
start: Position {
line: 1,
column: 7,
},
end: Position {
line: 1,
column: 8,
},
},
}
Item {
token: Number(
Number(
"0",
),
),
span: Span {
start: 8,
end: 9,
},
location: SourceLocation {
start: Position {
line: 1,
column: 9,
},
end: Position {
line: 1,
column: 10,
},
},
}
Item {
token: Punct(
SemiColon,
),
span: Span {
start: 9,
end: 10,
},
location: SourceLocation {
start: Position {
line: 1,
column: 10,
},
end: Position {
line: 1,
column: 11,
},
},
}
Item {
token: EoF,
span: Span {
start: 10,
end: 10,
},
location: SourceLocation {
start: Position {
line: 1,
column: 11,
},
end: Position {
line: 1,
column: 11,
},
},
}
$web-only$
The scanner's ::next()
method returns an Result<Item, Error>
the Ok
variant has 3 properties token
, span
and location
. The span
is the byte index that starts and ends the token, the location
property is the human readable location of the token, the token
property is going to be one variant of the Token
enum which has the following variants.
Token::Boolean(BooleanLiteral)
- The texttrue
orfalse
Token::Ident(Ident)
- A variable, function, or class nameToken::Null
- The textnull
Token::Keyword(Keyword)
- One of the 42 reserved words e.g.function
,var
,delete
, etcToken::Numeric(Number)
- A number literal, this can be an integer, a float, scientific notation, binary notation, octal notation, or hexadecimal notation e.g.1.5e9
,0xfff
, etcToken::Punct(Punct)
- One of the 52+ reserved symbols or combinations of symbols e.g.*
,&&
,=>
, etcToken::String(StringLit)
- Either a double or single quoted stringToken::RegEx(RegEx)
- A Regular Expression literal e.g./.+/g
Token::Template(Template)
- A template string literal e.g.one ${2} three
Token::Comment(Comment)
- A single line, multi-line or html comment
For a more in depth look at these tokens, take a look at the Appendix
Overall the output of our scanner isn't going to provide any context for these tokens, that means when we are building our development tools it is going to be a little harder to figure out what is going on with any given token. One way we could take that is to just build a tool that is only concerned with the token level of information. Say you work on a team of JavaScript developers that need to adhere to a strict code style because the organization needs their website to be usable in Internet Explorer 8. With that restriction there are a large number of APIs that are off the table, looking over this list we can see how big that really is. It could be useful to have a linter that will check for the keywords and identifiers that are not available in IE8. let's try and build one.
$web-only-end$