Stefy

stefy.avm
Class Tokenizer

java.lang.Object
  |
  +--stefy.avm.Tokenizer

public class Tokenizer
extends java.lang.Object

Tokenizer class (DD) is an advanced tokenizer similar to the standard StringTokenizer or StreamTokenizer.


Field Summary
 double dval
          Double value of the current token.
 int ival
          Integer (int) value of the current token.
 java.lang.String[] saval
          An array of strings obtained after parsing some compelx tokens (eg, XML tags).
 java.lang.String sval
          String value of the current token.
static int TT_CONTROL
          The token type denoting a control word (reserved word, keyword).
static int TT_DOUBLE
          The token type denoting a floating-point number.
static int TT_EOF
          The token type denoting the end of file.
static int TT_EOL
          The token type denoting the end of line.
static int TT_HEADER
          The token type denoting header (e.g., RFC822 (e-mail) header).
static int TT_INTEGER
          The token type denoting an integer number.
static int TT_QUOTE
          Default quote token type.
static int TT_SPC
          The token type denoting white space (not EOL).
static int TT_WORD
          The token type denoting a word.
static int TT_XMLTAG
          The token type denoting XML tag (may not be precisely according to the standard).
 int ttype
          Current token type.
 
Constructor Summary
Tokenizer(java.io.InputStream is)
          Constructs tokenizer on a given InputStream.
Tokenizer(java.io.Reader r)
          Constructs tokenizer on a Reader.
Tokenizer(java.lang.String s)
          Constructs tokenizer on a given string.
 
Method Summary
 void addControlString(java.lang.String ctrl)
          Adds a control string, usually: a keyword, reserved word, multi-character operator, and similar.
 void alphanumChars(int low, int hi)
          Specifies that all characters c in the range low <= c <= high have "alphanum" attribute set.
 void assertToken(int ttype, java.lang.String s)
          Asserts certain token type (ttype) and the value of sval.
 void assertToken(java.lang.String ttname, java.lang.String s)
          Asserts certain token type (ttype), given its name, and the value of sval.
 int colno()
          Get the current column number of the input.
 void defaultSyntax()
          Sets the tokenizer configuration to default values.
 int defineQuotes(java.lang.String pref, java.lang.String suf, java.lang.String ttname)
          Define quotes.
 void defQuoteSub(java.lang.String pref, java.lang.String suf, java.lang.String sub, java.lang.String rep)
          Defines a substitution of a quote type.
 void emailHeaders(boolean flag)
          Activates recognition of RFC822 (e-mail) headers.
 void error(java.lang.String s)
          Report an error while parsing.
 java.lang.String getLastPreRead()
          Returns a "raw" string value composed of all characters that were read during the recognition of the last token, but before the beginning of the token.
 java.lang.String getLastRead()
          Returns a "raw" string value composed of all characters that were read during the recognition of the last token.
 int getTtype(java.lang.String ttname)
          Get ttype number.
 boolean hasMoreTokens()
          Verifies whether there are more tokens (i.e., whether the current token is of type TT_EOF.
 int lineno()
          Get the current line number of the input.
 int nextToken()
          Reads next token.
 void ordinaryChar(int c)
           
 boolean parseDoubles(boolean flag)
          Activates recognition of floating-point numbers (default is false).
 void parseIntegers(boolean flag)
          Specifies whether this tokenizer should recognize integers.
 void parseNumbers(boolean flag)
          Specifies whether this tokenizer should recognize numbers.
 Tokenizer popConfig()
          Pops the current tokenizer configuration from an internal stack.
 Tokenizer pushConfig()
          Pushes the current tokenizer configuration to an internal stack.
 void quoteChar(int b)
          Set a character to be a quote character.
 double readDval(int ttype)
          Verifies that the current token is of type ttype, returns its dval, and reads next token.
 double readDval(java.lang.String ttname)
          Verifies that the type name of the current token is ttname, returns its ival, and reads next token.
 int readIval(int ttype)
          Verifies that the current token is of type ttype, returns its ival, and reads next token.
 int readIval(java.lang.String ttname)
          Verifies that the type name of the current token is ttname, returns its ival, and reads next token.
 java.lang.String readSval(int ttype)
          Verifies that the current token is of type ttype, returns its sval, and reads next token.
 java.lang.String readSval(java.lang.String ttname)
          Verifies that the type name of the current token is ttname, returns its sval, and reads next token.
 void readToken(int ttype, java.lang.String s)
          Reports an error if it does not see a token of type ttype, and sval equal to s.
 void readToken(java.lang.String ttname, java.lang.String s)
          Reports an error if it does not see a token with token name ttname, and sval equal to s.
 int rereadToken()
          Return to the beginning of the last read token and read a token again (with possibly another rules).
 void resetSyntax()
          Resets syntax: all characters are ordinary characters.
 boolean seeToken(int ttype, java.lang.String s)
          Check if the current token is the token ttype with the string value s.
 boolean seeToken(java.lang.String ttname, java.lang.String s)
          Check if the current token has the token type name ttname, and has the sval value s.
 void setEscapeCharacter(int i)
          Sets the escape character for control tokens.
 void setIgnore(int ttype, boolean f)
          Specifies that a token type should be ignored.
 java.lang.String toString()
          Returns the string representation of the current stream token.
 java.lang.String ttName(int tt)
          Get ttype name.
 void whitespaceChars(int low, int hi)
          Specifies that all characters c in the range low <= c <= high have "whitespace" attribute set.
 void wordChar(int c)
          Specifies the character c have "alpha" and "alphanum" attributes set.
 void wordChars(int low, int hi)
          Specifies that all characters c in the range low <= c <= high have "alpha" and "alphanum" attributes set.
 void wordChars(java.lang.String s)
          Specifies that all characters c in the string s have "alpha" and "alphanum" attributes set.
 void XMLtags(boolean flag)
          Activates recognition of XML tags.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

ttype

public int ttype
Current token type. It can either be one of the TT_ constants (which are negative numbers), or a postitive number if it is a character.

TT_EOF

public static final int TT_EOF
The token type denoting the end of file.

TT_SPC

public static final int TT_SPC
The token type denoting white space (not EOL).

TT_EOL

public static final int TT_EOL
The token type denoting the end of line.

TT_INTEGER

public static final int TT_INTEGER
The token type denoting an integer number.

TT_WORD

public static final int TT_WORD
The token type denoting a word.

TT_CONTROL

public static final int TT_CONTROL
The token type denoting a control word (reserved word, keyword).

TT_HEADER

public static final int TT_HEADER
The token type denoting header (e.g., RFC822 (e-mail) header).

TT_XMLTAG

public static final int TT_XMLTAG
The token type denoting XML tag (may not be precisely according to the standard).

TT_QUOTE

public static final int TT_QUOTE
Default quote token type.

TT_DOUBLE

public static final int TT_DOUBLE
The token type denoting a floating-point number.

sval

public java.lang.String sval
String value of the current token. If the token is a quoted string, then the value is "clean"; e.g., if the string was 'quoted \'string\'', then sval is: quoted 'string'

ival

public int ival
Integer (int) value of the current token.

dval

public double dval
Double value of the current token.

saval

public java.lang.String[] saval
An array of strings obtained after parsing some compelx tokens (eg, XML tags).
Constructor Detail

Tokenizer

public Tokenizer(java.lang.String s)
Constructs tokenizer on a given string.
Parameters:
s - the string

Tokenizer

public Tokenizer(java.io.InputStream is)
Constructs tokenizer on a given InputStream.
Parameters:
is - the input stream

Tokenizer

public Tokenizer(java.io.Reader r)
Constructs tokenizer on a Reader.
Parameters:
r - the reader
Method Detail

pushConfig

public Tokenizer pushConfig()
Pushes the current tokenizer configuration to an internal stack. In this way, we can temporarily change the configuration, and later pop the old one.
Returns:
this tokenizer

popConfig

public Tokenizer popConfig()
Pops the current tokenizer configuration from an internal stack.
Returns:
this tokenizer
See Also:
pushConfig()

setIgnore

public void setIgnore(int ttype,
                      boolean f)
Specifies that a token type should be ignored. For example, a Java-like line comment can be specified as: setIgnore( defineQuotes("//", "\n", "IGNORED"), true) or, a normal Java-like line comment can be specified as: setIgnore( defineQuotes("/*", "*/", "IGNORED"), true)
Parameters:
ttype - token type
f - boolean flag (true meanse ignore)

resetSyntax

public void resetSyntax()
Resets syntax: all characters are ordinary characters.

defaultSyntax

public void defaultSyntax()
Sets the tokenizer configuration to default values. In the default state, all characters from 0 to ' ' are spaces, A-Za-z_ are word characters, 0-9 are additional alphanum's, the numbers are recognized, quotes '..' and ".." are recognized, and EOL's and SPC's (white space), are ignored.

More precisely, the method consists of the following instructions:

config.reset();
whitespaceChars(0, ' ');
wordChars('A', 'Z');
wordChars('a', 'z');
wordChar('_');
alphanumChars('0', '9');
parseNumbers(true);
defineQuotes("'", "'", null);
defQuoteSub ("'", "'", "\\'", "'");
defQuoteSub ("'", "'", "\\\\", "\\");
defineQuotes("\"", "\"", null);
defQuoteSub ("\"", "\"", "\\\"", "\"");
defQuoteSub ("\"", "\"", "\\\\", "\\");
defQuoteSub ("\"", "\"", "\\n", "\n");
defQuoteSub ("\"", "\"", "\\b", "\b");
defQuoteSub ("\"", "\"", "\\f", "\f");
defQuoteSub ("\"", "\"", "\\t", "\r");
defQuoteSub ("\"", "\"", "\\t", "\t");
setIgnore(TT_EOL, true);
setIgnore(TT_SPC, true);

setEscapeCharacter

public void setEscapeCharacter(int i)
Sets the escape character for control tokens. For example, escape character `\' enables tokenizer to recognize tokens: \test, \12cm, but also \> and \+. Another example is `$': $test, $>, $$.
Parameters:
i - the ASCII value of the escape character, -1 for not having an escape character.

addControlString

public void addControlString(java.lang.String ctrl)
Adds a control string, usually: a keyword, reserved word, multi-character operator, and similar. Examples: if, then, but also: :=, <-, :-, ==, <=, -->

emailHeaders

public void emailHeaders(boolean flag)
Activates recognition of RFC822 (e-mail) headers. No guarantee that it completely complies to the standard.
Parameters:
flag - true to activate, false to deactivate.

XMLtags

public void XMLtags(boolean flag)
Activates recognition of XML tags. No guarantee that it completerly complies to the standard
Parameters:
flag - true to activate, false to deactivate.

parseDoubles

public boolean parseDoubles(boolean flag)
Activates recognition of floating-point numbers (default is false).
Parameters:
flag - true to activate, false to deactivate.
Returns:
the old value of the flag

ttName

public java.lang.String ttName(int tt)
Get ttype name.
Parameters:
tt - a token type (ttype)
Returns:
the name of the tokeng type

getTtype

public int getTtype(java.lang.String ttname)
Get ttype number.
Parameters:
ttname - a token type name
Returns:
the token type number (ttype); -7 if not valid.

quoteChar

public void quoteChar(int b)
Set a character to be a quote character. For example, quoteChar('\'') defines quote to be a quote character. You may want to invoke: defQuoteSub(b,"\\"+(char)b, ""+(char)b) afterwards.
Parameters:
b - the quote character

defineQuotes

public int defineQuotes(java.lang.String pref,
                        java.lang.String suf,
                        java.lang.String ttname)
Define quotes.
Parameters:
pref - is the prefix string
suf - is the suffix string
ttname - token type name, if null, the generic QUOTE is used
Returns:
the token type assigned to the quotes

defQuoteSub

public void defQuoteSub(java.lang.String pref,
                        java.lang.String suf,
                        java.lang.String sub,
                        java.lang.String rep)
Defines a substitution of a quote type. Example, defQuoteSub("\"","\"","\\\"","\"") and defQuoteSub("\"","\"","\\n","\n"). It calls defineQuotes(pref,suf,null)).
Parameters:
pref - the quote prefix string
suf - the quote suffix string
sub - the string to be substituted
rep - the replacement

wordChar

public void wordChar(int c)
Specifies the character c have "alpha" and "alphanum" attributes set. A word token consists of an "alpha" character followed by zero or more "alphanum" characters.
Parameters:
c - the character.
See Also:
wordChars(String), wordChars(int, int), sval, TT_WORD, ttype

ordinaryChar

public void ordinaryChar(int c)

wordChars

public void wordChars(java.lang.String s)
Specifies that all characters c in the string s have "alpha" and "alphanum" attributes set. A word token consists of an "alpha" character followed by zero or more "alphanum" characters.
Parameters:
s - the string of characters.
See Also:
wordChar(int), wordChars(int, int), sval, TT_WORD, ttype

wordChars

public void wordChars(int low,
                      int hi)
Specifies that all characters c in the range low <= c <= high have "alpha" and "alphanum" attributes set. A word token consists of an "alpha" character followed by zero or more "alphanum" characters.
Parameters:
low - the low end of the range.
hi - the high end of the range.

whitespaceChars

public void whitespaceChars(int low,
                            int hi)
Specifies that all characters c in the range low <= c <= high have "whitespace" attribute set.
Parameters:
low - the low end of the range.
hi - the high end of the range.

alphanumChars

public void alphanumChars(int low,
                          int hi)
Specifies that all characters c in the range low <= c <= high have "alphanum" attribute set.
Parameters:
low - the low end of the range.
hi - the high end of the range.

parseIntegers

public void parseIntegers(boolean flag)
Specifies whether this tokenizer should recognize integers. If the argument is true, the characters:
0 1 2 3 4 5 6 7 8 9
get the "digit" attribute set, and their "alpha" attribute (starting a word) is reset.

If the argument is false, then all characters get the attribute "digit" reset.

For all integers, the values of the fields ival and sval are set, and the ttype gets value TT_INTEGER.

Parameters:
flag - true indicates that integers are parsed; false indicates that numbers are not parsed.
See Also:
sval, TT_INTEGER, ttype, parseNumbers(boolean)

parseNumbers

public void parseNumbers(boolean flag)
Specifies whether this tokenizer should recognize numbers. It calls parseIngeters and parseDoubles.
Parameters:
flag - true indicates that numbers are parsed; and false indicates that numbers are not parsed.

lineno

public int lineno()
Get the current line number of the input.
Returns:
the current line number.

colno

public int colno()
Get the current column number of the input.
Returns:
the current column number.

rereadToken

public int rereadToken()
                throws java.io.IOException
Return to the beginning of the last read token and read a token again (with possibly another rules).
Returns:
the token
Throws:
java.io.IOException - in case of an error

getLastPreRead

public java.lang.String getLastPreRead()
Returns a "raw" string value composed of all characters that were read during the recognition of the last token, but before the beginning of the token. It is usualy a string of white-space characters.
Returns:
the last read raw string value

getLastRead

public java.lang.String getLastRead()
Returns a "raw" string value composed of all characters that were read during the recognition of the last token.
Returns:
the last read raw string value

nextToken

public int nextToken()
              throws java.io.IOException
Reads next token.
Returns:
the token
Throws:
java.io.IOException - in case of an error

hasMoreTokens

public boolean hasMoreTokens()
Verifies whether there are more tokens (i.e., whether the current token is of type TT_EOF.
Returns:
true if there are more tokens, false otherwise (ttype !=TT_EOF)

seeToken

public boolean seeToken(int ttype,
                        java.lang.String s)
Check if the current token is the token ttype with the string value s. If the string is null, it will not be compared.
Parameters:
ttype - token type
s - the string value (if null, it is not compared)
Returns:
true if the current token is the given token, false otherwise.

seeToken

public boolean seeToken(java.lang.String ttname,
                        java.lang.String s)
Check if the current token has the token type name ttname, and has the sval value s. If s is null, it will not be compared.
Parameters:
ttname - token type
s - the string value (if null, it is not compared)
Returns:
true if the current token is the given token, false otherwise.

readToken

public void readToken(int ttype,
                      java.lang.String s)
               throws java.io.IOException
Reports an error if it does not see a token of type ttype, and sval equal to s. Otherwise, goes to the next token.
Parameters:
ttype - expected token type
s - expected sval

readToken

public void readToken(java.lang.String ttname,
                      java.lang.String s)
               throws java.io.IOException
Reports an error if it does not see a token with token name ttname, and sval equal to s. Otherwise, it goes to the next token.
Parameters:
ttname - expected token type name
s - expected sval

assertToken

public void assertToken(int ttype,
                        java.lang.String s)
                 throws java.io.IOException
Asserts certain token type (ttype) and the value of sval.
Parameters:
ttype - expected token type
s - expected sval, if null then it is not checked.

assertToken

public void assertToken(java.lang.String ttname,
                        java.lang.String s)
                 throws java.io.IOException
Asserts certain token type (ttype), given its name, and the value of sval.
Parameters:
ttype - expected token type name
s - expected sval, if null then it is not checked.

readSval

public java.lang.String readSval(int ttype)
                          throws java.io.IOException
Verifies that the current token is of type ttype, returns its sval, and reads next token.
Parameters:
ttype - expected token type
Returns:
sval

readSval

public java.lang.String readSval(java.lang.String ttname)
                          throws java.io.IOException
Verifies that the type name of the current token is ttname, returns its sval, and reads next token.
Parameters:
ttype - expected token type
Returns:
sval

readIval

public int readIval(int ttype)
             throws java.io.IOException
Verifies that the current token is of type ttype, returns its ival, and reads next token.
Parameters:
ttype - expected token type
Returns:
ival

readIval

public int readIval(java.lang.String ttname)
             throws java.io.IOException
Verifies that the type name of the current token is ttname, returns its ival, and reads next token. The name "NUMBER" is special since it allows for INTEGER as well as DOUBLE type.
Parameters:
ttype - expected token type
Returns:
ival

readDval

public double readDval(int ttype)
                throws java.io.IOException
Verifies that the current token is of type ttype, returns its dval, and reads next token.
Parameters:
ttype - expected token type
Returns:
dval

readDval

public double readDval(java.lang.String ttname)
                throws java.io.IOException
Verifies that the type name of the current token is ttname, returns its ival, and reads next token. The name "NUMBER" is special since it allows for INTEGER as well as DOUBLE type.
Parameters:
ttype - expected token type
Returns:
dval

toString

public java.lang.String toString()
Returns the string representation of the current stream token.
Overrides:
toString in class java.lang.Object
Returns:
a string representation of the token specified by the ttype, nval, and sval fields.
See Also:
java.io.StreamTokenizer#nval, java.io.StreamTokenizer#sval, java.io.StreamTokenizer#ttype

error

public void error(java.lang.String s)
Report an error while parsing. It uses ErrorHandler class to report the error and exit.
Parameters:
s - the error message.

Stefy

Submit a bug or feature
Copyright 1998-2004 Vlado Keselj. All Rights Reserved.