PHP Class Tale\Jade\Lexer

Show file Open project: talesoft/tale-jade Class Usage Examples

Public Methods

Method Description
__construct ( array $options = null ) Creates a new lexer instance.
dump ( string $input ) Dumps jade-input into a set of string-represented tokens.
getIndentStyle ( ) : string Returns the detected or previously passed indentation style.
getIndentWidth ( ) : integer Returns the detected or previously passed indentation width.
getInput ( ) : string Returns the current input-string worked on.
getLastMatches ( ) : array | null Returns the last array of matches through ->match.
getLastPeekResult ( ) : string | null Returns the last result of ->peek().
getLength ( ) : integer Returns the total length of the current input-string.
getLevel ( ) : integer Returns the current indentation level we are on.
getLine ( ) : integer Returns the line we are working on in the current input-string.
getOffset ( ) : integer Gets the offset on a line (Line-start is 0) in the current input-string.
getPosition ( ) : integer Returns the total position in the current input-string.
lex ( string $input ) : Generator Returns a generator that will lex the passed $input sequentially.

Protected Methods

Method Description
consume ( integer | null $length = null ) Consumes a length or the length of the last peeked string.
consumeMatch ( ) Consumes a match previously read and matched by ->match().
createToken ( string $type ) : array Creates a new token.
getMatch ( integer | string $index ) : mixed | null Gets a match from the last ->match() call
isAtEnd ( ) : boolean Checks if our read pointer is at the end of the code.
match ( string $pattern, string $modifiers = '' ) : boolean Matches a pattern against the start of the current $input.
peek ( integer $length = 1 ) : string Shows the next characters in our input.
read ( callable $callback, integer $length = 1 ) : string Peeks and consumes chars until the passed callback returns false.
readBracketContents ( array $breakChars = null ) : string Reads a "value", 'value' or value style string really gracefully.
readSpaces ( ) : string Reads all TAB (\t) and SPACE ( ) chars until something else is found.
scanAssignment ( ) : Generator Scans for an -token (begins with ampersand (&)).
scanAttributes ( ) : Generator Scans for an attribute-block.
scanBlock ( ) : Generator Scans for -tokens.
scanCase ( ) : Generator Scans for a -token.
scanClasses ( ) : Generator Scans for a -token (begins with dot (.)).
scanCode ( ) : Generator Scans for a code-block initiated with a dash (-) character.
scanComment ( ) : Generator Scans for //-? comments yielding a token if found as well as a stack of text-block tokens.
scanConditional ( ) : Generator Scans for a -token.
scanControlStatement ( string $type, array $names, string | null $nameAttribute = null ) : Generator Scans for a control-statement-kind of token.
scanDo ( ) : Generator Scans for a -token.
scanDoctype ( ) : Generator Scans for a -token.
scanEach ( ) : Generator Scans for an -token.
scanExpansion ( ) : Generator Scans for a -token.
scanExpression ( ) : Generator Scans for !=-style expression.
scanFilter ( ) : Generator Scans for :-style filters and yields a token if found.
scanFor ( array $scans, boolean | false $throwException = false ) : Generator Keeps scanning for all types of tokens passed as the first argument.
scanForLoop ( ) : Generator Scans for a -token.
scanId ( ) : Generator Scans for a -token (begins with hash (#)).
scanImport ( ) : Generator Scans for imports and yields an -token if found.
scanIndent ( ) : Generator | void Scans for indentation and automatically keeps the $_level updated through all tokens.
scanMarkup ( ) : Generator Scans for HTML-markup based on a starting '<'.
scanMixin ( ) : Generator Scans for a mixin definition token ().
scanMixinCall ( ) : Generator Scans for a -token (begins with plus (+)).
scanNewLine ( ) : Generator Scans for a new-line character and yields a -token if found.
scanSub ( ) : Generator Scans sub-expressions of elements, e.g. a text-block initiated with a dot (.) or a block expansion.
scanTag ( ) : Generator Scans for a -token.
scanText ( boolean $escaped = false ) : Generator Scans for text until the end of the current line and yields a -token if found.
scanTextBlock ( $escaped = false ) : Generator Scans for text and keeps scanning text, if you indent once until it is outdented again (e.g. .-text-blocks, expressions, comments).
scanTextLine ( ) : Generator Scans for a |-style text-line and yields it along with a text-block, if it has any.
scanToken ( string $type, string $pattern, string $modifiers = '' ) : Generator Scans for a specific token-type based on a pattern and converts it to a valid token automatically.
scanVariable ( ) : Generator Scans for a -token.
scanWhen ( ) : Generator Scans for a -token.
scanWhile ( ) : Generator Scans for a -token.
strlen ( string $string ) : integer mb_* compatible version of PHP's strlen.
strpos ( string $haystack, string $needle, integer | null $offset = null ) : integer | false mb_* compatible version of PHP's strpos.
substr ( string $string, integer $start, integer | null $range = null ) : string mb_* compatible version of PHP's substr.
substr_count ( string $haystack, string $needle ) : integer mb_* compatible version of PHP's substr_count.
throwException ( string $message ) Throws a lexer-exception.

Method Details

__construct() public method

The options should be an associative array Valid options are: indentStyle: The indentation character (auto-detected) indentWidth: How often to repeat indentStyle (auto-detected) encoding: The encoding when working with mb_*-functions (Default: UTF-8) scans: An array of scans that will be performed Passing an indentation-style forces you to stick to that style. If not, the lexer will assume the first indentation type it finds as the indentation. Mixed indentation is not possible, since it would be a bitch to calculate without taking away configuration freedom Add a new scan to 'scans' to extend the lexer. Notice that you need the fitting 'handle*'-method in the parser or you will get unhandled-token-exceptions.
public __construct ( array $options = null )
$options array the options passed to the lexer instance

consume() protected method

Internally $input = substr($input, $length) is done, so everything _before_ the consumed length will be cut off and removed from the RAM (since we probably tokenized it already, remember? sequential shit etc.?)
protected consume ( integer | null $length = null )
$length integer | null the length to consume or null, to use the length of the last peeked string

consumeMatch() protected method

Consumes a match previously read and matched by ->match().
protected consumeMatch ( )

createToken() protected method

A token is an associative array. The following keys _always_ exist: type: The type of the node (e.g. newLine, tag, class, id) line: The line we encountered this token on offset: The offset on a line we encountered it on Before adding a new token-type, make sure that the Parser knows how to handle it and the Compiler knows how to compile it.
protected createToken ( string $type ) : array
$type string the type to give that token
return array the token array

dump() public method

This makes debugging the lexer easier.
public dump ( string $input )
$input string the jade input to dump the tokens of

getIndentStyle() public method

Returns the detected or previously passed indentation style.
public getIndentStyle ( ) : string
return string

getIndentWidth() public method

Returns the detected or previously passed indentation width.
public getIndentWidth ( ) : integer
return integer

getInput() public method

Returns the current input-string worked on.
public getInput ( ) : string
return string

getLastMatches() public method

Returns the last array of matches through ->match.
public getLastMatches ( ) : array | null
return array | null

getLastPeekResult() public method

Returns the last result of ->peek().
public getLastPeekResult ( ) : string | null
return string | null

getLength() public method

Returns the total length of the current input-string.
public getLength ( ) : integer
return integer

getLevel() public method

Returns the current indentation level we are on.
public getLevel ( ) : integer
return integer

getLine() public method

Returns the line we are working on in the current input-string.
public getLine ( ) : integer
return integer

getMatch() protected method

Gets a match from the last ->match() call
protected getMatch ( integer | string $index ) : mixed | null
$index integer | string the index of the usual PREG $matches argument
return mixed | null the value of the match or null, if none found

getOffset() public method

Gets the offset on a line (Line-start is 0) in the current input-string.
public getOffset ( ) : integer
return integer

getPosition() public method

Returns the total position in the current input-string.
public getPosition ( ) : integer
return integer

isAtEnd() protected method

Checks if our read pointer is at the end of the code.
protected isAtEnd ( ) : boolean
return boolean

lex() public method

If you don't move the generator, the lexer does nothing. Only as soon as you iterate the generator or call next()/current() on it the lexer will start it's work and spit out tokens sequentially. This approach takes less memory during the lexing process. Tokens are always an array and always provide the following keys: [ 'type' => The token type, 'line' => The line this token is on, 'offset' => The offset this token is at ]
public lex ( string $input ) : Generator
$input string the Jade-string to lex into tokens
return Generator a generator that can be iterated sequentially

match() protected method

Notice that this always takes the start of the current pointer position as a reference, since consume means cutting of the front of the input string After a match was successful, you can retrieve the matches with ->getMatch() and consume the whole match with ->consumeMatch() ^ gets automatically prepended to the pattern (since it makes no sense for a sequential lexer to search _inside_ the input)
protected match ( string $pattern, string $modifiers = '' ) : boolean
$pattern string the regular expression without delimeters and a ^-prefix
$modifiers string the usual PREG RegEx-modifiers
return boolean

peek() protected method

Pass a $length to get more than one character. The character's _won't_ be consumed here, they are just shown. The position pointer won't be moved forward The result gets saved in $_lastPeekResult
protected peek ( integer $length = 1 ) : string
$length integer the length of the string we want to peek on
return string the peeked string

read() protected method

The callback takes the current character as the first argument. This works great with ctype_*-functions If the last character doesn't match, it also won't be consumed You can always go on reading right after a call to ->read() e.g. $alNumString = $this->read('ctype_alnum') $spaces = $this->read('ctype_space')
protected read ( callable $callback, integer $length = 1 ) : string
$callback callable the callback to check the current character against
$length integer the length to peek. This will also increase the length of the characters passed to the callback
return string the read string

readBracketContents() protected method

It will stop on all chars passed to $breakChars as well as a closing ')' when _not_ inside an expression initiated with either ", ', (, [ or {. $breakChars might be [','] as an example to read sequential arguments into an array. Scan for ',', skip spaces, repeat readBracketContents Brackets are counted, strings are respected. Inside a " string, \" escaping is possible, inside a ' string, \' escaping is possible As soon as a ) is found and we're outside a string and outside any kind of bracket, the reading will stop and the value, including any quotes, will be returned Examples: ('' marks the parts that are read, understood and returned by this function) (arg1=abc, arg2="some expression", 'some string expression') some-mixin('some arg', [1, 2, 3, 4], (isset($complex) ? $complex : 'complex')) and even some-mixin(callback=function($input) { return trim($input, '\'"'); }`)
protected readBracketContents ( array $breakChars = null ) : string
$breakChars array the chars to break on.
return string the (possibly quote-enclosed) result string

readSpaces() protected method

This is primarily used to parse the indentation at the begin of each line.
protected readSpaces ( ) : string
return string the spaces that have been found

scanAssignment() protected method

Assignment-Tokens always have: name, which is the name of the assignment
protected scanAssignment ( ) : Generator
return Generator

scanAttributes() protected method

Attribute blocks always consist of the following tokens: ('(') -> Indicates that attributes start here ... (name*=*value*) -> Name and Value are both optional, but one of both needs to be provided Multiple attributes are separated by a Comma (,) or white-space ( , \n, \t) (')') -> Required. Indicates the end of the attribute block This function will always yield an -token first, if there's an attribute block Attribute-blocks can be split across multiple lines and don't respect indentation of any kind except for the token After that it will continue to yield -tokens containing > name, which is the name of the attribute (Default: null) > value, which is the value of the attribute (Default: null) > escaped, which indicates that the attribute expression result should be escaped After that it will always require and yield an token If the is not found, this function will throw an exception Between , , and as well as around = and , of the attributes you can utilize as many spaces and new-lines as you like
protected scanAttributes ( ) : Generator
return Generator

scanBlock() protected method

Blocks can have three styles: block append|prepend|replace name append|prepend|replace name or simply block (for mixin blocks) Block-tokens may have: mode, which is either "append", "prepend" or "replace" name, which is the name of the block
protected scanBlock ( ) : Generator
return Generator

scanCase() protected method

Case-tokens always have: subject, which is the expression between the parenthesis
protected scanCase ( ) : Generator
return Generator

scanClasses() protected method

Class-tokens always have: name, which is the name of the class
protected scanClasses ( ) : Generator
return Generator

scanCode() protected method

If the dash-character stands alone on a line, a multi-line code block will be opened Examples: - if ($something): p Do something - endif; - doSomething(); doSomethingElse(); Code-tokens always have: single, which indicates that the expression is not multi-line
protected scanCode ( ) : Generator
return Generator

scanComment() protected method

Scans for //-? comments yielding a token if found as well as a stack of text-block tokens.
protected scanComment ( ) : Generator
return Generator

scanConditional() protected method

Conditional-tokens always have: conditionType, which is either "if", "unless", "elseif", "else if" or "else" subject, which is the expression the between the parenthesis
protected scanConditional ( ) : Generator
return Generator

scanControlStatement() protected method

e.g. control-statement-name ($expression) Since the -statement is a special little unicorn, it get's handled very specifically inside this function (But correctly!) If the condition can have a subject, the subject will be set as the "subject"-value of the token
protected scanControlStatement ( string $type, array $names, string | null $nameAttribute = null ) : Generator
$type string The token type that should be created if scan is successful
$names array The names the statement can have (e.g. do, while, if, else etc.)
$nameAttribute string | null The attribute the name gets saved into, if wanted
return Generator

scanDo() protected method

Do-tokens are always stand-alone
protected scanDo ( ) : Generator
return Generator

scanDoctype() protected method

Doctype-tokens always have: name, which is the passed name of the doctype or a custom-doctype, if the named doctype isn't provided
protected scanDoctype ( ) : Generator
return Generator

scanEach() protected method

Each-tokens always have: itemName, which is the name of the item for each iteration subject, which is the expression to iterate Each-tokens may have: keyName, which is the name of the key for each iteration
protected scanEach ( ) : Generator
return Generator

scanExpansion() protected method

(a: b-style expansion or a:b-style tags) Expansion-tokens always have: withSpace, which indicates wether there's a space after the double-colon Usually, if there's no space, it should be handled as part of a tag-name
protected scanExpansion ( ) : Generator
return Generator

scanExpression() protected method

e.g. != expr = expr Expression-tokens always have: escaped, which indicates that the expression result should be escaped value, which is the code of the expression
protected scanExpression ( ) : Generator
return Generator

scanFilter() protected method

Filter-tokens always have: name, which is the name of the filter
protected scanFilter ( ) : Generator
return Generator

scanFor() protected method

If one token is encountered that's not in $scans, the function breaks or throws an exception, if the second argument is true The passed scans get converted to methods e.g. newLine => scanNewLine, blockExpansion => scanBlockExpansion etc.
protected scanFor ( array $scans, boolean | false $throwException = false ) : Generator
$scans array the scans to perform
$throwException boolean | false throw an exception if no tokens in $scans found anymore
return Generator the generator yielding all tokens found

scanForLoop() protected method

For-tokens always have: subject, which is the expression between the parenthesis
protected scanForLoop ( ) : Generator
return Generator

scanId() protected method

ID-tokens always have: name, which is the name of the id
protected scanId ( ) : Generator
return Generator

scanImport() protected method

Import-tokens always have: importType, which is either "extends" or "include path, the (relative) path to which the import points Import-tokens may have: filter, which is an optional filter that should be only usable on "include"
protected scanImport ( ) : Generator
return Generator

scanIndent() protected method

Upon reaching a higher level, an -token is yielded, upon reaching a lower level, an -token is yielded If you outdented 3 levels, 3 -tokens are yielded The first indentation this function encounters will be used as the indentation style for this document. You can indent with everything between 1 space and a few million tabs other than most Jade implementations
protected scanIndent ( ) : Generator | void
return Generator | void

scanMarkup() protected method

The whole markup will be kept and yielded as a -token
protected scanMarkup ( ) : Generator
return Generator

scanMixin() protected method

Mixin-token always have: name, which is the name of the mixin you want to define
protected scanMixin ( ) : Generator
return Generator

scanMixinCall() protected method

Mixin-Call-Tokens always have: name, which is the name of the called mixin
protected scanMixinCall ( ) : Generator
return Generator

scanNewLine() protected method

Scans for a new-line character and yields a -token if found.
protected scanNewLine ( ) : Generator
return Generator

scanSub() protected method

Yields whatever scanTextBlock() and scanExpansion() yield
protected scanSub ( ) : Generator
return Generator

scanTag() protected method

Tag-tokens always have: name, which is the name of the tag
protected scanTag ( ) : Generator
return Generator

scanText() protected method

Scans for text until the end of the current line and yields a -token if found.
protected scanText ( boolean $escaped = false ) : Generator
$escaped boolean
return Generator

scanTextBlock() protected method

Yields anything between , , and tokens it encounters
protected scanTextBlock ( $escaped = false ) : Generator
return Generator

scanTextLine() protected method

Scans for a |-style text-line and yields it along with a text-block, if it has any.
protected scanTextLine ( ) : Generator
return Generator

scanToken() protected method

All matches that have a name (RegEx (?...)-directive will directly get a key with that name and value on the token array For matching, ->match() is used internally
protected scanToken ( string $type, string $pattern, string $modifiers = '' ) : Generator
$type string the token type to create, if matched
$pattern string the pattern to match
$modifiers string the regex-modifiers for the pattern
return Generator

scanVariable() protected method

Variable-tokens always have: name, which is the name of the variables to work on
protected scanVariable ( ) : Generator
return Generator

scanWhen() protected method

When-tokens always have: name, which is either "when" or "default" subject, which is the expression behind "when ..." When-tokens may have: default, which indicates that this is the "default"-case
protected scanWhen ( ) : Generator
return Generator

scanWhile() protected method

While-tokens always have: subject, which is the expression between the parenthesis
protected scanWhile ( ) : Generator
return Generator

strlen() protected method

(so we don't require mb.func_overload)
See also: strlen
See also: mb_strlen
protected strlen ( string $string ) : integer
$string string the string to get the length of
return integer the multi-byte-respecting length of the string

strpos() protected method

(so we don't require mb.func_overload)
See also: strpos
See also: mb_strpos
protected strpos ( string $haystack, string $needle, integer | null $offset = null ) : integer | false
$haystack string the string to search in
$needle string the string we search for
$offset integer | null the offset at which we might expect it
return integer | false the offset of the string or false, if not found

substr() protected method

(so we don't require mb.func_overload)
See also: substr
See also: mb_substr
protected substr ( string $string, integer $start, integer | null $range = null ) : string
$string string the string to get a sub-string of
$start integer the start-index
$range integer | null the amount of characters we want to get
return string the sub-string

substr_count() protected method

(so we don't require mb.func_overload)
See also: substr_count
See also: mb_substr_count
protected substr_count ( string $haystack, string $needle ) : integer
$haystack string the string we want to count sub-strings in
$needle string the sub-string we want to count inside $haystack
return integer the amount of occurences of $needle in $haystack

throwException() protected method

The current line and offset of the exception get automatically appended to the message
protected throwException ( string $message )
$message string A meaningful error message