CMPS 144 Fall 2023
Prog. Assg. #2: Arithmetic Expression Scanner
Due: 11:59pm, Oct. 18

Assignment

For the purposes of this assignment, let us say that arithmetic expressions are composed of four kinds of elements, or tokens:

  1. left parenthesis: (
  2. right parenthesis: )
  3. operator:
  4. integer literal: A non-empty and maximal1 sequence of the digit characters ('0'..'9').

The process of identifying the substrings that correspond to meaningful units within a given string is commonly referred to as scanning, and an agent with the ability to carry out this process is called a scanner. These meaningful units are often referred to as tokens.2 For example, an instance of the java.util.Scanner class identifies as a token within a String any non-empty maximal substring that contains no whitespace characters and is preceded and followed by whitespace characters (or occurs at the beginning or end of the string).

If you were developing a program that analyzed strings purported to be arithmetic expressions, it would be useful to have as a tool an object that identified the relevant elements within a given string. Such an object could reasonably be called an arithmetic expression scanner. The Java class ArithExprScanner, whose design was influenced by the java.util.Scanner class, is intended to provide such objects. Your task is to complete it so that it works as intended (as described in the comments preceding each method). As given, several of its methods are only stubs3 (and are marked as such by comments). (Also included are a few comments that provide "suggestions" that you may ignore, if you wish.)

What makes this task non-trivial is that, unlike the FPAE's that you worked with in a recent lab, here we are not assuming that the tokens in an expression are separated from each other by spaces. That is, two consecutive tokens could possibly occur with no spaces between them. Thus, given the (nonsense) string

234+7   −*!=< ) 46(

a scanner of arithmetic expressions should be able to identify the tokens within it, namely: 234, +, 7, , *, !=, <, ), 46, and (, in that order.

Note that a scanner is not responsible for checking the syntactic validity of an expression; rather, it simply identifies the tokens within it, going from left to right. Typically, a parser makes use of a scanner and is responsible for determining whether or not an expression is syntactically correct (and, in the case that it is, converting it to another form, often a tree).


Footnotes

[1] By a maximal sequence of digit characters we mean one that, in the context of the expression in which it lies, is neither immediately preceded nor immediately followed by another digit character. For example, in the expression 473 - 2, the integer literals are 473 and 2, but none of 4, 47, 73, or 3 occurs as an integer literal.

[2] Technically, the word token refers to a type (or category) of element that can occur in an expression (e.g., left parenthesis, relational operator, integer literal). An occurrence of a substring falling into a particular category (e.g., "573", "<=") is rightly called a lexeme.

[3] The term stub is often used to refer to a method that basically doesn't do anything but is intended to be completed later. In the case of a functional method (i.e., one that returns a value), it is necessary to put a return statement in its body in order to make it syntactically correct.


Program Submission

Submit your Java source code file, named ArithExprScanner.java, to the appropriate dropbox.