PHP Класс SimpleLexer, simpletest

Some optimisation to make the sure the content is only scanned by the PHP regex parser once. Lexer modes must not start with leading underscores.
Показать файл Открыть проект Примеры использования класса

Открытые методы

Метод Описание
__construct ( SimpleSaxParser $parser, string $start = 'accept', boolean $case = false ) Sets up the lexer in case insensitive matching by default.
addEntryPattern ( string $pattern, string $mode, string $new_mode ) Adds a pattern that will enter a new parsing mode.
addExitPattern ( string $pattern, string $mode ) Adds a pattern that will exit the current mode and re-enter the previous one.
addPattern ( string $pattern, string $mode = 'accept' ) Adds a token search pattern for a particular parsing mode.
addSpecialPattern ( string $pattern, string $mode, string $special ) Adds a pattern that has a special mode.
mapHandler ( string $mode, string $handler ) Adds a mapping from a mode to another handler.
parse ( string $raw ) : boolean Splits the page text into tokens.

Защищенные методы

Метод Описание
decodeSpecial ( string $mode ) : string Strips the magic underscore marking single token modes.
dispatchTokens ( string $unmatched, string $matched, string $mode = false ) : boolean Sends the matched token and any leading unmatched text to the parser changing the lexer to a new mode if one is listed.
invokeParser ( string $content, boolean $is_match ) Calls the parser method named after the current mode.
isModeEnd ( string $mode ) : boolean Tests to see if the new mode is actually to leave the current mode and pop an item from the matching mode stack.
isSpecialMode ( string $mode ) : boolean Test to see if the mode is one where this mode is entered for this token only and automatically leaves immediately afterwoods.
reduce ( string $raw ) : array/boolean Tries to match a chunk of text and if successful removes the recognised chunk and any leading unparsed data.

Описание методов

__construct() публичный Метод

Sets up the lexer in case insensitive matching by default.
public __construct ( SimpleSaxParser $parser, string $start = 'accept', boolean $case = false )
$parser SimpleSaxParser Handling strategy by reference.
$start string Starting handler.
$case boolean True for case sensitive.

addEntryPattern() публичный Метод

Useful for entering parenthesis, strings, tags, etc.
public addEntryPattern ( string $pattern, string $mode, string $new_mode )
$pattern string Perl style regex, but ( and ) lose the usual meaning.
$mode string Should only apply this pattern when dealing with this type of input.
$new_mode string Change parsing to this new nested mode.

addExitPattern() публичный Метод

Adds a pattern that will exit the current mode and re-enter the previous one.
public addExitPattern ( string $pattern, string $mode )
$pattern string Perl style regex, but ( and ) lose the usual meaning.
$mode string Mode to leave.

addPattern() публичный Метод

The pattern does not change the current mode.
public addPattern ( string $pattern, string $mode = 'accept' )
$pattern string Perl style regex, but ( and ) lose the usual meaning.
$mode string Should only apply this pattern when dealing with this type of input.

addSpecialPattern() публичный Метод

Acts as an entry and exit pattern in one go, effectively calling a special parser handler for this token only.
public addSpecialPattern ( string $pattern, string $mode, string $special )
$pattern string Perl style regex, but ( and ) lose the usual meaning.
$mode string Should only apply this pattern when dealing with this type of input.
$special string Use this mode for this one token.

decodeSpecial() защищенный Метод

Strips the magic underscore marking single token modes.
protected decodeSpecial ( string $mode ) : string
$mode string Mode to decode.
Результат string Underlying mode name.

dispatchTokens() защищенный Метод

Sends the matched token and any leading unmatched text to the parser changing the lexer to a new mode if one is listed.
protected dispatchTokens ( string $unmatched, string $matched, string $mode = false ) : boolean
$unmatched string Unmatched leading portion.
$matched string Actual token match.
$mode string Mode after match. A boolean false mode causes no change.
Результат boolean False if there was any error from the parser.

invokeParser() защищенный Метод

Empty content will be ignored. The lexer has a parser handler for each mode in the lexer.
protected invokeParser ( string $content, boolean $is_match )
$content string Text parsed.
$is_match boolean Token is recognised rather than unparsed data.

isModeEnd() защищенный Метод

Tests to see if the new mode is actually to leave the current mode and pop an item from the matching mode stack.
protected isModeEnd ( string $mode ) : boolean
$mode string Mode to test.
Результат boolean True if this is the exit mode.

isSpecialMode() защищенный Метод

Test to see if the mode is one where this mode is entered for this token only and automatically leaves immediately afterwoods.
protected isSpecialMode ( string $mode ) : boolean
$mode string Mode to test.
Результат boolean True if this is the exit mode.

mapHandler() публичный Метод

Adds a mapping from a mode to another handler.
public mapHandler ( string $mode, string $handler )
$mode string Mode to be remapped.
$handler string New target handler.

parse() публичный Метод

Will fail if the handlers report an error or if no content is consumed. If successful then each unparsed and parsed token invokes a call to the held listener.
public parse ( string $raw ) : boolean
$raw string Raw HTML text.
Результат boolean True on success, else false.

reduce() защищенный Метод

Empty strings will not be matched.
protected reduce ( string $raw ) : array/boolean
$raw string The subject to parse. This is the content that will be eaten.
Результат array/boolean