How to Tokenize a String in JavaScript
The JavaScript token is more like a visualization concept of individually scanning characters, expressions, and strings.
If we supposedly take 10+5
as an expression, then the lexer
(the process that differs each valid character as a token) will define 10 as a Number
type token, +
as a Plus
, and 5 as a Number
type.
After all the characters have been tokenized, more specifically categorized, then they will be sent to parse. The parser rule will then specify the tokens to define the expression.
This article can be followed for a more detailed explanation. We will consider one example that will cover the concept of token in JavaScript.
Use the split()
Method to Tokenize a String in JavaScript
We will follow the lexer and parser rules to define each word in the following example. The full text will first be scanned as individual words differentiated by space.
And then, the whole tokenized group will fall under parsing. This concept gives a step-by-step rail for splitting a string or any expression.
An abstract syntax tree
performs the visual of the expression. Let’s dive into the code for a demonstrative explanation.
var text = 'Is not it weird to live in a world like this? It is a 42';
var words = text.toLowerCase();
var okay = words.split(/\W+/).filter(function(token) {
return token.length == 2;
});
console.log(okay);
Output:
So, the text
string is converted to lowercase, and then the split()
method completes the task of tokenizing.
The procedure is abstracted, so we cannot visually determine the internal work process. We have filtered out some specific lengths of words from the tokens.