PHP Class Tale\Jade\Lexer

Show file Open project: talesoft/tale-jade Class Usage Examples

Public Methods

Method	Description
__construct ( array $options = null )	Creates a new lexer instance.
dump ( string $input )	Dumps jade-input into a set of string-represented tokens.
getIndentStyle ( ) : string	Returns the detected or previously passed indentation style.
getIndentWidth ( ) : integer	Returns the detected or previously passed indentation width.
getInput ( ) : string	Returns the current input-string worked on.
getLastMatches ( ) : array \| null	Returns the last array of matches through ->match.
getLastPeekResult ( ) : string \| null	Returns the last result of ->peek().
getLength ( ) : integer	Returns the total length of the current input-string.
getLevel ( ) : integer	Returns the current indentation level we are on.
getLine ( ) : integer	Returns the line we are working on in the current input-string.
getOffset ( ) : integer	Gets the offset on a line (Line-start is 0) in the current input-string.
getPosition ( ) : integer	Returns the total position in the current input-string.
lex ( string $input ) : Generator	Returns a generator that will lex the passed $input sequentially.

Protected Methods

Method	Description
consume ( integer \| null $length = null )	Consumes a length or the length of the last peeked string.
consumeMatch ( )	Consumes a match previously read and matched by ->match().
createToken ( string $type ) : array	Creates a new token.
getMatch ( integer \| string $index ) : mixed \| null	Gets a match from the last ->match() call
isAtEnd ( ) : boolean	Checks if our read pointer is at the end of the code.
match ( string $pattern, string $modifiers = '' ) : boolean	Matches a pattern against the start of the current $input.
peek ( integer $length = 1 ) : string	Shows the next characters in our input.
read ( callable $callback, integer $length = 1 ) : string	Peeks and consumes chars until the passed callback returns false.
readBracketContents ( array $breakChars = null ) : string	Reads a "value", 'value' or value style string really gracefully.
readSpaces ( ) : string	Reads all TAB (\t) and SPACE ( ) chars until something else is found.
scanAssignment ( ) : Generator	Scans for an -token (begins with ampersand (&)).
scanAttributes ( ) : Generator	Scans for an attribute-block.
scanBlock ( ) : Generator	Scans for -tokens.
scanCase ( ) : Generator	Scans for a -token.
scanClasses ( ) : Generator	Scans for a -token (begins with dot (.)).
scanCode ( ) : Generator	Scans for a code-block initiated with a dash (-) character.
scanComment ( ) : Generator	Scans for //-? comments yielding a token if found as well as a stack of text-block tokens.
scanConditional ( ) : Generator	Scans for a -token.
scanControlStatement ( string $type, array $names, string \| null $nameAttribute = null ) : Generator	Scans for a control-statement-kind of token.
scanDo ( ) : Generator	Scans for a -token.
scanDoctype ( ) : Generator	Scans for a -token.
scanEach ( ) : Generator	Scans for an -token.
scanExpansion ( ) : Generator	Scans for a -token.
scanExpression ( ) : Generator	Scans for !=-style expression.
scanFilter ( ) : Generator	Scans for :-style filters and yields a token if found.
scanFor ( array $scans, boolean \| false $throwException = false ) : Generator	Keeps scanning for all types of tokens passed as the first argument.
scanForLoop ( ) : Generator	Scans for a -token.
scanId ( ) : Generator	Scans for a -token (begins with hash (#)).
scanImport ( ) : Generator	Scans for imports and yields an -token if found.
scanIndent ( ) : Generator \| void	Scans for indentation and automatically keeps the $_level updated through all tokens.
scanMarkup ( ) : Generator	Scans for HTML-markup based on a starting '<'.
scanMixin ( ) : Generator	Scans for a mixin definition token ().
scanMixinCall ( ) : Generator	Scans for a -token (begins with plus (+)).
scanNewLine ( ) : Generator	Scans for a new-line character and yields a -token if found.
scanSub ( ) : Generator	Scans sub-expressions of elements, e.g. a text-block initiated with a dot (.) or a block expansion.
scanTag ( ) : Generator	Scans for a -token.
scanText ( boolean $escaped = false ) : Generator	Scans for text until the end of the current line and yields a -token if found.
scanTextBlock ( $escaped = false ) : Generator	Scans for text and keeps scanning text, if you indent once until it is outdented again (e.g. .-text-blocks, expressions, comments).
scanTextLine ( ) : Generator	Scans for a \|-style text-line and yields it along with a text-block, if it has any.
scanToken ( string $type, string $pattern, string $modifiers = '' ) : Generator	Scans for a specific token-type based on a pattern and converts it to a valid token automatically.
scanVariable ( ) : Generator	Scans for a -token.
scanWhen ( ) : Generator	Scans for a -token.
scanWhile ( ) : Generator	Scans for a -token.
strlen ( string $string ) : integer	mb_* compatible version of PHP's strlen.
strpos ( string $haystack, string $needle, integer \| null $offset = null ) : integer \| false	mb_* compatible version of PHP's strpos.
substr ( string $string, integer $start, integer \| null $range = null ) : string	mb_* compatible version of PHP's substr.
substr_count ( string $haystack, string $needle ) : integer	mb_* compatible version of PHP's substr_count.
throwException ( string $message )	Throws a lexer-exception.

Method Details

__construct() public method

The options should be an associative array Valid options are: indentStyle: The indentation character (auto-detected) indentWidth: How often to repeat indentStyle (auto-detected) encoding: The encoding when working with mb_*-functions (Default: UTF-8) scans: An array of scans that will be performed Passing an indentation-style forces you to stick to that style. If not, the lexer will assume the first indentation type it finds as the indentation. Mixed indentation is not possible, since it would be a bitch to calculate without taking away configuration freedom Add a new scan to 'scans' to extend the lexer. Notice that you need the fitting 'handle*'-method in the parser or you will get unhandled-token-exceptions.

public __construct ( array $options = null )
$options	array	the options passed to the lexer instance

consume() protected method

Internally $input = substr($input, $length) is done, so everything _before_ the consumed length will be cut off and removed from the RAM (since we probably tokenized it already, remember? sequential shit etc.?)

protected consume ( integer \| null $length = null )
$length	integer \| null	the length to consume or null, to use the length of the last peeked string

consumeMatch() protected method

Consumes a match previously read and matched by ->match().

protected consumeMatch ( )

createToken() protected method

A token is an associative array. The following keys _always_ exist: type: The type of the node (e.g. newLine, tag, class, id) line: The line we encountered this token on offset: The offset on a line we encountered it on Before adding a new token-type, make sure that the Parser knows how to handle it and the Compiler knows how to compile it.

protected createToken ( string $type ) : array
$type	string	the type to give that token
return	array	the token array

dump() public method

This makes debugging the lexer easier.

public dump ( string $input )
$input	string	the jade input to dump the tokens of

getIndentStyle() public method

Returns the detected or previously passed indentation style.

public getIndentStyle ( ) : string
return	string

getIndentWidth() public method

Returns the detected or previously passed indentation width.

public getIndentWidth ( ) : integer
return	integer

getInput() public method

Returns the current input-string worked on.

public getInput ( ) : string
return	string

getLastMatches() public method

Returns the last array of matches through ->match.

public getLastMatches ( ) : array \| null
return	array \| null

getLastPeekResult() public method

Returns the last result of ->peek().

public getLastPeekResult ( ) : string \| null
return	string \| null

getLength() public method

Returns the total length of the current input-string.

public getLength ( ) : integer
return	integer

getLevel() public method

Returns the current indentation level we are on.

public getLevel ( ) : integer
return	integer

getLine() public method

Returns the line we are working on in the current input-string.

public getLine ( ) : integer
return	integer

getMatch() protected method

Gets a match from the last ->match() call

protected getMatch ( integer \| string $index ) : mixed \| null
$index	integer \| string	the index of the usual PREG $matches argument
return	mixed \| null	the value of the match or null, if none found

getOffset() public method

Gets the offset on a line (Line-start is 0) in the current input-string.

public getOffset ( ) : integer
return	integer

getPosition() public method

Returns the total position in the current input-string.

public getPosition ( ) : integer
return	integer

isAtEnd() protected method

Checks if our read pointer is at the end of the code.

protected isAtEnd ( ) : boolean
return	boolean

lex() public method

If you don't move the generator, the lexer does nothing. Only as soon as you iterate the generator or call next()/current() on it the lexer will start it's work and spit out tokens sequentially. This approach takes less memory during the lexing process. Tokens are always an array and always provide the following keys: [ 'type' => The token type, 'line' => The line this token is on, 'offset' => The offset this token is at ]

public lex ( string $input ) : Generator
$input	string	the Jade-string to lex into tokens
return	Generator	a generator that can be iterated sequentially

match() protected method

Notice that this always takes the start of the current pointer position as a reference, since consume means cutting of the front of the input string After a match was successful, you can retrieve the matches with ->getMatch() and consume the whole match with ->consumeMatch() ^ gets automatically prepended to the pattern (since it makes no sense for a sequential lexer to search _inside_ the input)

protected match ( string $pattern, string $modifiers = '' ) : boolean
$pattern	string	the regular expression without delimeters and a ^-prefix
$modifiers	string	the usual PREG RegEx-modifiers
return	boolean

peek() protected method

Pass a $length to get more than one character. The character's _won't_ be consumed here, they are just shown. The position pointer won't be moved forward The result gets saved in $_lastPeekResult

protected peek ( integer $length = 1 ) : string
$length	integer	the length of the string we want to peek on
return	string	the peeked string

read() protected method

The callback takes the current character as the first argument. This works great with ctype_*-functions If the last character doesn't match, it also won't be consumed You can always go on reading right after a call to ->read() e.g. $alNumString = $this->read('ctype_alnum') $spaces = $this->read('ctype_space')

protected read ( callable $callback, integer $length = 1 ) : string
$callback	callable	the callback to check the current character against
$length	integer	the length to peek. This will also increase the length of the characters passed to the callback
return	string	the read string

readBracketContents() protected method

It will stop on all chars passed to $breakChars as well as a closing ')' when _not_ inside an expression initiated with either ", ', (, [ or {. $breakChars might be [','] as an example to read sequential arguments into an array. Scan for ',', skip spaces, repeat readBracketContents Brackets are counted, strings are respected. Inside a " string, \" escaping is possible, inside a ' string, \' escaping is possible As soon as a ) is found and we're outside a string and outside any kind of bracket, the reading will stop and the value, including any quotes, will be returned Examples: ('' marks the parts that are read, understood and returned by this function) (arg1=abc, arg2="some expression", 'some string expression') some-mixin('some arg', [1, 2, 3, 4], (isset($complex) ? $complex : 'complex')) and even some-mixin(callback=function($input) { return trim($input, '\'"'); }`)

protected readBracketContents ( array $breakChars = null ) : string
$breakChars	array	the chars to break on.
return	string	the (possibly quote-enclosed) result string

readSpaces() protected method

This is primarily used to parse the indentation at the begin of each line.

protected readSpaces ( ) : string
return	string	the spaces that have been found

scanAssignment() protected method

Assignment-Tokens always have: name, which is the name of the assignment

protected scanAssignment ( ) : Generator
return	Generator

scanAttributes() protected method

Attribute blocks always consist of the following tokens: ('(') -> Indicates that attributes start here ... (name*=*value*) -> Name and Value are both optional, but one of both needs to be provided Multiple attributes are separated by a Comma (,) or white-space ( , \n, \t) (')') -> Required. Indicates the end of the attribute block This function will always yield an -token first, if there's an attribute block Attribute-blocks can be split across multiple lines and don't respect indentation of any kind except for the token After that it will continue to yield -tokens containing > name, which is the name of the attribute (Default: null) > value, which is the value of the attribute (Default: null) > escaped, which indicates that the attribute expression result should be escaped After that it will always require and yield an token If the is not found, this function will throw an exception Between , , and as well as around = and , of the attributes you can utilize as many spaces and new-lines as you like

protected scanAttributes ( ) : Generator
return	Generator

scanBlock() protected method

Blocks can have three styles: block append|prepend|replace name append|prepend|replace name or simply block (for mixin blocks) Block-tokens may have: mode, which is either "append", "prepend" or "replace" name, which is the name of the block

protected scanBlock ( ) : Generator
return	Generator

scanCase() protected method

Case-tokens always have: subject, which is the expression between the parenthesis

protected scanCase ( ) : Generator
return	Generator

scanClasses() protected method

Class-tokens always have: name, which is the name of the class

protected scanClasses ( ) : Generator
return	Generator

scanCode() protected method

If the dash-character stands alone on a line, a multi-line code block will be opened Examples: - if ($something): p Do something - endif; - doSomething(); doSomethingElse(); Code-tokens always have: single, which indicates that the expression is not multi-line

protected scanCode ( ) : Generator
return	Generator

scanComment() protected method

Scans for //-? comments yielding a token if found as well as a stack of text-block tokens.

protected scanComment ( ) : Generator
return	Generator

scanConditional() protected method

Conditional-tokens always have: conditionType, which is either "if", "unless", "elseif", "else if" or "else" subject, which is the expression the between the parenthesis

protected scanConditional ( ) : Generator
return	Generator

scanControlStatement() protected method

e.g. control-statement-name ($expression) Since the -statement is a special little unicorn, it get's handled very specifically inside this function (But correctly!) If the condition can have a subject, the subject will be set as the "subject"-value of the token

protected scanControlStatement ( string $type, array $names, string \| null $nameAttribute = null ) : Generator
$type	string	The token type that should be created if scan is successful
$names	array	The names the statement can have (e.g. do, while, if, else etc.)
$nameAttribute	string \| null	The attribute the name gets saved into, if wanted
return	Generator

scanDo() protected method

Do-tokens are always stand-alone

protected scanDo ( ) : Generator
return	Generator

scanDoctype() protected method

Doctype-tokens always have: name, which is the passed name of the doctype or a custom-doctype, if the named doctype isn't provided

protected scanDoctype ( ) : Generator
return	Generator

scanEach() protected method

Each-tokens always have: itemName, which is the name of the item for each iteration subject, which is the expression to iterate Each-tokens may have: keyName, which is the name of the key for each iteration

protected scanEach ( ) : Generator
return	Generator

scanExpansion() protected method

(a: b-style expansion or a:b-style tags) Expansion-tokens always have: withSpace, which indicates wether there's a space after the double-colon Usually, if there's no space, it should be handled as part of a tag-name

protected scanExpansion ( ) : Generator
return	Generator

scanExpression() protected method

e.g. != expr = expr Expression-tokens always have: escaped, which indicates that the expression result should be escaped value, which is the code of the expression

protected scanExpression ( ) : Generator
return	Generator

scanFilter() protected method

Filter-tokens always have: name, which is the name of the filter

protected scanFilter ( ) : Generator
return	Generator

scanFor() protected method

If one token is encountered that's not in $scans, the function breaks or throws an exception, if the second argument is true The passed scans get converted to methods e.g. newLine => scanNewLine, blockExpansion => scanBlockExpansion etc.

protected scanFor ( array $scans, boolean \| false $throwException = false ) : Generator
$scans	array	the scans to perform
$throwException	boolean \| false	throw an exception if no tokens in $scans found anymore
return	Generator	the generator yielding all tokens found

scanForLoop() protected method

For-tokens always have: subject, which is the expression between the parenthesis

protected scanForLoop ( ) : Generator
return	Generator

scanId() protected method

ID-tokens always have: name, which is the name of the id

protected scanId ( ) : Generator
return	Generator

scanImport() protected method

Import-tokens always have: importType, which is either "extends" or "include path, the (relative) path to which the import points Import-tokens may have: filter, which is an optional filter that should be only usable on "include"

protected scanImport ( ) : Generator
return	Generator

scanIndent() protected method

Upon reaching a higher level, an -token is yielded, upon reaching a lower level, an -token is yielded If you outdented 3 levels, 3 -tokens are yielded The first indentation this function encounters will be used as the indentation style for this document. You can indent with everything between 1 space and a few million tabs other than most Jade implementations

protected scanIndent ( ) : Generator \| void
return	Generator \| void

scanMarkup() protected method

The whole markup will be kept and yielded as a -token

protected scanMarkup ( ) : Generator
return	Generator

scanMixin() protected method

Mixin-token always have: name, which is the name of the mixin you want to define

protected scanMixin ( ) : Generator
return	Generator

scanMixinCall() protected method

Mixin-Call-Tokens always have: name, which is the name of the called mixin

protected scanMixinCall ( ) : Generator
return	Generator

scanNewLine() protected method

Scans for a new-line character and yields a -token if found.

protected scanNewLine ( ) : Generator
return	Generator

scanSub() protected method

Yields whatever scanTextBlock() and scanExpansion() yield

protected scanSub ( ) : Generator
return	Generator

scanTag() protected method

Tag-tokens always have: name, which is the name of the tag

protected scanTag ( ) : Generator
return	Generator

scanText() protected method

Scans for text until the end of the current line and yields a -token if found.

protected scanText ( boolean $escaped = false ) : Generator
$escaped	boolean
return	Generator

scanTextBlock() protected method

Yields anything between , , and tokens it encounters

protected scanTextBlock ( $escaped = false ) : Generator
return	Generator

scanTextLine() protected method

Scans for a |-style text-line and yields it along with a text-block, if it has any.

protected scanTextLine ( ) : Generator
return	Generator

scanToken() protected method

All matches that have a name (RegEx (?...)-directive will directly get a key with that name and value on the token array For matching, ->match() is used internally

protected scanToken ( string $type, string $pattern, string $modifiers = '' ) : Generator
$type	string	the token type to create, if matched
$pattern	string	the pattern to match
$modifiers	string	the regex-modifiers for the pattern
return	Generator

scanVariable() protected method

Variable-tokens always have: name, which is the name of the variables to work on

protected scanVariable ( ) : Generator
return	Generator

scanWhen() protected method

When-tokens always have: name, which is either "when" or "default" subject, which is the expression behind "when ..." When-tokens may have: default, which indicates that this is the "default"-case

protected scanWhen ( ) : Generator
return	Generator

scanWhile() protected method

While-tokens always have: subject, which is the expression between the parenthesis

protected scanWhile ( ) : Generator
return	Generator

strlen() protected method

(so we don't require mb.func_overload)

strpos() protected method

(so we don't require mb.func_overload)

substr() protected method

(so we don't require mb.func_overload)

substr_count() protected method

(so we don't require mb.func_overload)

throwException() protected method

The current line and offset of the exception get automatically appended to the message

protected throwException ( string $message )
$message	string	A meaningful error message

protected strpos ( string $haystack, string $needle, integer \| null $offset = null ) : integer \| false
$haystack	string	the string to search in
$needle	string	the string we search for
$offset	integer \| null	the offset at which we might expect it
return	integer \| false	the offset of the string or false, if not found

protected strlen ( string $string ) : integer
$string	string	the string to get the length of
return	integer	the multi-byte-respecting length of the string

protected substr ( string $string, integer $start, integer \| null $range = null ) : string
$string	string	the string to get a sub-string of
$start	integer	the start-index
$range	integer \| null	the amount of characters we want to get
return	string	the sub-string

protected substr_count ( string $haystack, string $needle ) : integer
$haystack	string	the string we want to count sub-strings in
$needle	string	the sub-string we want to count inside $haystack
return	integer	the amount of occurences of $needle in $haystack