Lexical Analyzer in C++
- Concept of Tokens in Lexical Analyzer in C++
- The Purpose of the Lexical Analyzer in C++
- Steps to Use Lexical Analyzer in C++
A lexical analyzer is a computer program that breaks a text stream into tokens and marks their type. It takes input as an arbitrarily long sequence of characters, called the input string, and produces output as one or more sequences of characters called the token sequences.
The output may be the token sequences or just enough information to identify them uniquely.
Moreover, lexical analyzers are typically implemented as two separate programs: one that reads characters from an input stream and another that outputs tokens for each word it encounters.
Mainly, the lexical analyzer can be used to identify errors in a text file, such as spelling mistakes or syntax errors. It is also used to detect certain patterns in natural language that could be considered harmful or dangerous if they occur in real life.
Concept of Tokens in Lexical Analyzer in C++
Tokens are the smallest and most indivisible components of a program. There are various types of tokens linked with each language.
Identifiers are the names the user assigns to multiple parts of the program, such as features and parameters. They are called this because they “identify” a specific memory address.
Then there are keywords, a collection of words used by the language to perform various functions. In C++, these include cout
, cin
, if
, else
, for
, break
, continue
, and so on.
Punctuators are used to create expressions and statements, and they are only helpful when combined with signifiers or keywords in a statement.
The Purpose of the Lexical Analyzer in C++
A lexical analyzer performs the following tasks.
- Identifies the tokens in the input text stream and groups them into meaningful categories.
- Provides information about each token, such as its type and value, which helps in understanding the meaning of the input text.
- Parses the input text by breaking it into smaller units of meaning so that they can be more easily analyzed.
Steps to Use Lexical Analyzer in C++
Let’s discuss the steps to using a lexical analyzer in C++.
-
Include header files.
-
Write a function to split the sentence into tokens.
-
Define tokens and rules for each token type.
-
Write code to output tokens from the input sentence.
-
Test and debug your code until it works correctly.
Example of the lexical analyzer in C++:
#include <bits/stdc++.h>
#include <string>
using namespace std;
vector<string> demo = {
"auto", "break", "case", "char", "const", "continue", "default",
"do", "double", "else", "enum", "extern", "float", "for",
"goto", "if", "int", "signed", "sizeof", "static", "struct",
"switch", "typedef", "union", "unsigned", "void", "volatile", "while"};
vector<string> hello = {"-", "*", "=", "+", "/"};
vector<string> ten = {"67", "87", "5", "12", "90"};
vector<string> parenthesis = {"(", ")"};
vector<string> brackets = {"[", "]"};
void printout(string q) {
if (find(demo.begin(), demo.end(), q) != demo.end())
cout << q << " \t keyword\n";
else if (find(hello.begin(), hello.end(), q) != hello.end())
cout << q << " \t operator\n";
else if (find(ten.begin(), ten.end(), q) != ten.end())
cout << q << " \t number\n";
else if (find(parenthesis.begin(), parenthesis.end(), q) != parenthesis.end())
cout << q << " \t paranthesis\n";
else if (find(brackets.begin(), brackets.end(), q) != brackets.end())
cout << q << " \t seperator\n";
}
int main() {
string line;
vector<string> sample;
while (getline(cin, line, ' ')) {
sample.push_back(line);
}
for (auto q : sample) printout(q);
return 0;
}
Click here to check the live demonstration of the code as mentioned above.
Muhammad Adil is a seasoned programmer and writer who has experience in various fields. He has been programming for over 5 years and have always loved the thrill of solving complex problems. He has skilled in PHP, Python, C++, Java, JavaScript, Ruby on Rails, AngularJS, ReactJS, HTML5 and CSS3. He enjoys putting his experience and knowledge into words.
Facebook