/MEng/System/Runtime/StringTokenizer

ClassPath:MEng.System.Runtime.StringTokenizer
Parent ClassPath:MEng.Object
Copyable:No
Final:Yes

MEng.System.Runtime.StringTokenizer provides tools for breaking strings up into 'tokens', i.e. shorter strings that are delimited by defined 'white space' characters, i.e. characters that are not part of the content. This class provides two basic tokenizing mechanisms.

The first tokenizing scheme allows you to put a string into the tokenizer, and a set of white space characters that you want to delimit the tokens. You can then iterate through the tokens using a 'get next' token, until there no more tokens to get.

The other scheme is provided by the ParseCmdLine() method, which is provided to parse a format that is commonly used for command lines to programs. It will break out all of the tokens at once and fill in a vector of strings with the resulting parameter values.  The rules for this type of command line format are:

  • Whitespace is a delimiter for parameters, so:
           myprogram foo bar
    will be broken into three parameters.
  • Use single or double quotes if a parameter has spaces, so:
          myprogram "parm one" parmtwo
    will be broken into three parameters
  • Quotes are only delimiters, so they only are special if space is used before and after, so:
          myprogram Parm'one two
    will only have two parameters, which are "Parm'one" and "two"
  • To have a quote literally around a parameter, surround it with the other type of quote, like this
          myprogram '"A quoted String"'

It's important to keep the two schemes separate. The iteration scheme works on the string that you set on the tokenizer, whereas the command line parser works on a string that you pass to the method, and doesn't affect the state of the tokenizer at all.

 

Nested Classes:

Enum=StrTokErrors
    BadCmdLine   	: "";
    TokenizeErr   : "";
EndEnum;

This enumerated type defines the errors that are thrown from this class. Some don't have any text because they just get the underlying C++ error text.


VectorOf[String]  ParmStrList;

This is a vector of String objects, used to return lists of tokens.

 

Literals:

String kDefWhitespace("\r\n\t\a\f\v ");

As a convenience, provides a default string of white space characters that can be passed to the tokenizer to use in string tokenizing, since white space is a very common delimiter scheme.

 

Constructors:

Constructor();

There is just a default constructor available, which creates a tokenizer that needs to be set up with a string to tokenize.

 

Final, Const Methods:

MoreTokens() Returns MEng.Boolean;

Returns True if there are more tokens available, else False.

ParseCmdLine([In] MEng.String ToParse, [Out] VectorOf[String] ToFill);

Parses the passed command line, ToParse, and breaks out the parameters it holds, putting them into the passed vector of strings. If no parmeters are in the string, the vector will be empty. Check the element count on the vector to see how many parameters were parsed out.

This method does not affect the state of this object at all, since it works on the passed in string and returns it's results in the output parameter. So it is separate from the iteration scheme and can be used without affecting any ongoing iteration.

See the class comments above on this page for the format rules of command lines that this method parses.

PeekRestOfLine([Out] String ToFill) Returns Boolean;

Non-destructively returns the text from the start of the next token (the one that would be returned right now if you called GetNextToken()), to the end of the input text string. It returns True if there was any text to get, else it returns False to indicate that you were already at the end of the string. Whitespace removal is not done, you just get the rest of the line as is.

If the return is True, then you are not at the end of the input line and you can continue to get more tokens if you want.

Tokenize([In] String ToTokenize, [In] String Whitespace, [Out] ParmStrList To Fill) Returns Boolean;

Fully tokenizes a string all in one pass, putting the resulting tokens into the passed vector of strings. The Whitespace parameter indicates the characters that should be considered separators. Any vector of strings can be passed, the ParmStrList is just used to define the parameter.

 

Final, Non-Const Methods:

GetNextToken([Out] MEng.String ToFill) Returns MEng.Boolean;
EatNextToken() Returns MEng.Boolean;

Destructively gets or eats the next token, if there is another one available. If getting, it copies the token text into the passed ToFill parameter. If eating, it just skips the next token and returns nothing. In either case, if there was another token, it returns True, else it returns False to indicate that the token iteration is complete.

GetRestOfLine([Out] String ToFill) Returns Boolean;
PeekRestOfLine([Out] String ToFill) Returns Boolean;

Destructively or non-destructively returns the text from the start of the next token (the one that would be returned right now if you called GetNextToken()), to the end of the input text string. It returns True if there was some text to get, else it returns False to indicate that you were already at the end of the string. Whitespace removal is not done, you just get the rest of the line as is.

If this call returns True, then you were not already at the end of the line and some text was returned. But you've now eaten all the input, and no more tokens will be available if you did a Get. If you did a Peek, then the current position was not changed.

Reset();

If you hit the end of the available tokens and want to iterate back over the same string again, you can call reset and start iterating from the beginning again.

Set([In] MEng.String ToIterate, [In] MEng.String Whitespace);

Sets up the tokenizer to iterate through the tokens of the ToIterate string. It makes a copy of the string to iterate, so the original is unchanged. The Whitespace string indicates the characters that you want to be considered delimiters, i.e. that separate the tokens.

SetWhitespace([In] MEng.String Whitespace);

Changes the whitespace list. This can be done in the middle of tokenizing an input string, so that you can change the whitespace list as required for the format of the string you are parsing.