PHP Class pQuery\TokenizerBase

Can convert any string into tokens. The base class only supports identifier/whitespace tokens. For more tokens, the class can be easily extended. Use like: next() !== $a::TOK_NULL) { echo $a->token, ': ',$a->getTokenString(), "
\n"; } ?>
Datei anzeigen Open project: tburry/pquery

Public Properties

Property Type Description
$char_map array Automatically built character map. Built using {@link $identifiers}, {@link $whitespace} and {@link $custom_char_map}
$custom_char_map array All characters that should be mapped to a token/function that cannot be considered as whitespace or identifier
$doc string The document that is being tokenized
$errors array All errors found while parsing the document
$identifiers array | string List with all the character that can be considered as identifier
$line_pos Current (Line/Column) position in document
$pos integer Current (character) position in the document
$size integer The size of the document (length of string)
$token integer Current token
$token_start integer Start position of token. If NULL, then current position is used.
$whitespace array | string List with all the character that can be considered as whitespace

Public Methods

Method Description
__construct ( string $doc = '', integer $pos ) Class constructor
addError ( string $error ) Add error to the array and appends current position
getDoc ( ) : string Returns target document
getIdentifiers ( boolean $as_string = true ) : string | array Returns identifier characters as string/array
getLinePos ( ) : array Returns current position in document (Line/Char)
getPos ( ) : integer Returns current position in document (Index)
getToken ( ) : integer Returns current token
getTokenString ( integer $start_offset, integer $end_offset ) : string Returns current token as string
getWhitespace ( boolean $as_string = true ) : string | array Returns whitespace characters as string/array
mapChar ( string $char, integer | string $map ) Maps a custom character to a token/function
next ( ) : integer Continues to the next token
next_no_whitespace ( ) : integer Finds the next token, but skips whitespace
next_pos ( string $needle, boolean $callback = true ) : integer Finds the next token by searching for a string
next_search ( string | array $characters, boolean $callback = true ) : integer Finds the next token using stop characters.
setDoc ( string $doc, integer $pos ) Sets target document
setIdentifiers ( string | array $ident ) Sets characters to be recognized as identifier
setPos ( integer $pos ) Sets position in document
setWhitespace ( string | array $ws ) Sets characters to be recognized as whitespace
unmapChar ( string $char ) Removes a char mapped with {@link mapChar()}

Protected Methods

Method Description
buildCharMap ( ) Builds the {@link $map_char} array
expect ( string | integer $token, boolean | integer $do_next = true, boolean | integer $try_next = false, boolean | integer $next_on_match = 1 ) : boolean Expect a specific token or character. Adds error if token doesn't match.
parse_identifier ( ) : integer Parse identifiers
parse_linebreak ( ) Parse line breaks and increase line number
parse_whitespace ( ) : integer Parse whitespace

Method Details

__construct() public method

Class constructor
See also: setDoc()
See also: setPos()
public __construct ( string $doc = '', integer $pos )
$doc string Document to be tokenized
$pos integer Position to start parsing

addError() public method

Add error to the array and appends current position
public addError ( string $error )
$error string

buildCharMap() protected method

Builds the {@link $map_char} array
protected buildCharMap ( )

expect() protected method

Expect a specific token or character. Adds error if token doesn't match.
protected expect ( string | integer $token, boolean | integer $do_next = true, boolean | integer $try_next = false, boolean | integer $next_on_match = 1 ) : boolean
$token string | integer Character or token to expect
$do_next boolean | integer Go to next character before evaluating. 1 for next char, true to ignore whitespace
$try_next boolean | integer Try next character if current doesn't match. 1 for next char, true to ignore whitespace
$next_on_match boolean | integer Go to next character after evaluating. 1 for next char, true to ignore whitespace
return boolean

getDoc() public method

Returns target document
See also: setDoc()
public getDoc ( ) : string
return string

getIdentifiers() public method

Returns identifier characters as string/array
See also: setIdentifiers()
public getIdentifiers ( boolean $as_string = true ) : string | array
$as_string boolean Should the result be a string or an array?
return string | array

getLinePos() public method

Returns current position in document (Line/Char)
public getLinePos ( ) : array
return array array(Line, Column)

getPos() public method

Returns current position in document (Index)
See also: setPos()
public getPos ( ) : integer
return integer

getToken() public method

Returns current token
public getToken ( ) : integer
return integer

getTokenString() public method

Returns current token as string
public getTokenString ( integer $start_offset, integer $end_offset ) : string
$start_offset integer Offset from token start
$end_offset integer Offset from token end
return string

getWhitespace() public method

Returns whitespace characters as string/array
See also: setWhitespace()
public getWhitespace ( boolean $as_string = true ) : string | array
$as_string boolean Should the result be a string or an array?
return string | array

mapChar() public method

Used like: mapChar('a', self::{@link TOK_IDENTIFIER}) or mapChar('a', 'parse_identifier');
See also: unmapChar()
public mapChar ( string $char, integer | string $map )
$char string Character that should be mapped. If set, it will be overridden
$map integer | string If function name, then $this->function will be called, otherwise token is set to $map

next() public method

Continues to the next token
public next ( ) : integer
return integer Next token ({@link TOK_NULL} if none)

next_no_whitespace() public method

Finds the next token, but skips whitespace
public next_no_whitespace ( ) : integer
return integer Next token ({@link TOK_NULL} if none)

next_pos() public method

Finds the next token by searching for a string
public next_pos ( string $needle, boolean $callback = true ) : integer
$needle string The needle that's being searched for
$callback boolean Should the function check the charmap after finding the needle?
return integer Next token ({@link TOK_NULL} if none)

parse_identifier() protected method

Parse identifiers
protected parse_identifier ( ) : integer
return integer Token

parse_linebreak() protected method

Parse line breaks and increase line number
protected parse_linebreak ( )

parse_whitespace() protected method

Parse whitespace
protected parse_whitespace ( ) : integer
return integer Token

setDoc() public method

Sets target document
See also: getDoc()
See also: setPos()
public setDoc ( string $doc, integer $pos )
$doc string Document to be tokenized
$pos integer Position to start parsing

setIdentifiers() public method

Used like: setIdentifiers('ab') or setIdentifiers(array('a' => true, 'b', 'c'));
public setIdentifiers ( string | array $ident )
$ident string | array

setPos() public method

Sets position in document
See also: getPos()
public setPos ( integer $pos )
$pos integer

setWhitespace() public method

Used like: setWhitespace('ab') or setWhitespace(array('a' => true, 'b', 'c'));
public setWhitespace ( string | array $ws )
$ws string | array

unmapChar() public method

Removes a char mapped with {@link mapChar()}
See also: mapChar()
public unmapChar ( string $char )
$char string Character that should be unmapped

Property Details

$char_map public_oe property

Automatically built character map. Built using {@link $identifiers}, {@link $whitespace} and {@link $custom_char_map}
public array $char_map
return array

$custom_char_map public_oe property

All characters that should be mapped to a token/function that cannot be considered as whitespace or identifier
See also: mapChar()
See also: unmapChar()
public array $custom_char_map
return array

$doc public_oe property

The document that is being tokenized
See also: setDoc()
See also: getDoc()
public string $doc
return string

$errors public_oe property

All errors found while parsing the document
See also: addError()
public array $errors
return array

$identifiers public_oe property

List with all the character that can be considered as identifier
See also: setIdentifiers()
See also: getIdentifiers()
public array|string $identifiers
return array | string

$line_pos public_oe property

Current (Line/Column) position in document
See also: getLinePos()
public $line_pos

$pos public_oe property

Current (character) position in the document
See also: setPos()
See also: getPos()
public int $pos
return integer

$size public_oe property

The size of the document (length of string)
public int $size
return integer

$token public_oe property

Current token
See also: getToken()
public int $token
return integer

$token_start public_oe property

Start position of token. If NULL, then current position is used.
See also: getTokenString()
public int $token_start
return integer

$whitespace public_oe property

List with all the character that can be considered as whitespace
See also: setWhitespace()
See also: getWhitespace()
public array|string $whitespace
return array | string