In programming, a lexer is a program or function that breaks down input code into a sequence of tokens, which are meaningful units of code. Go, a popular programming language provides a built-in package called “text/scanner” for implementing lexers. In this blog, we will explore how to write a lexer in Go and the different types of tokens that can be generated.
Types of Tokens
Before we dive into the implementation of a lexer in Go, let’s take a look at the different types of tokens that can be generated.
- Identifiers: Identifiers are names given to variables, functions, and other program entities. In Go, identifiers start with a letter or an underscore, followed by any combination of letters, digits, and underscores.
- Keywords: Keywords are reserved words that have a specific meaning in the language. Some examples of keywords in Go include “if”, “else”, “for”, “func”, and “return”.
- Operators: Operators are symbols or words that perform an operation on one or more operands. Examples of operators in Go include “+”, “-”, “*”, “/”, “&&”, and “||”.
- Literals: Literals are values that are directly represented in the code. Examples of literals in Go include integers, floating-point numbers, strings, and boolean values.
- Delimiters: Delimiters are symbols that separate or mark the beginning or end of a program entity. Examples of delimiters in Go include parentheses, braces, and commas.
Implementing a Lexer in Go
Let’s now look at how to implement a lexer in Go using the “text/scanner” package.
package main
import (
"fmt"
"text/scanner"
)
func main() {
var s scanner.Scanner
input := "x := 5 + 3\nif x > 5 {\n fmt.Println(\"x is greater than 5\")\n}"
s.Init(strings.NewReader(input))
for {
tok := s.Scan()
if tok == scanner.EOF {
break
}
fmt.Printf("Token: %s, Value: %s\n", s.TokenText(), tok.String())
}
}
In this code, we first import the necessary packages, “fmt” for printing output and “text/scanner” for implementing the lexer. We then create a new instance of the scanner using the “scanner.Scanner” type and initialize it with a new string reader.
Next, we enter into a loop where we scan for the next token using the “Scan()” method of the scanner. The method returns a token type from the “scanner.Token” type and the value of the token as a string using the “TokenText()” method.
We then print out the token type and value using the “fmt.Printf()” function. Finally, we check for the end of file token, “scanner.EOF”, and break out of the loop.
Let’s run this code and see what output we get:
Token: x, Value: ident
Token: :=, Value: assign
Token: 5, Value: int
Token: +, Value: add
Token: 3, Value: int
Token: if, Value: if
Token: x, Value: ident
Token: >, Value: gt
Token: 5, Value: int
Token: {, Value: lbrace
Token: fmt.Println, Value: ident
Token: (, Value: lparen
Token: "x is greater than 5", Value: string
Token: ), Value: rparen
Token: }, Value: rbrace
As we can see, the lexer has correctly identified each token in the input string and provided the corresponding token type and value