PHP Class cebe\jssearch\tokenizer\StandardTokenizer

Author: Carsten Brandt ([email protected])
Inheritance: implements cebe\jssearch\TokenizerInterface
Datei anzeigen Open project: cebe/js-search

Public Properties

Property Type Description
$delimiters a list of characters that should be used as word delimiters.
$stopWords a list of stopwords to remove from the token list.

Public Methods

Method Description
tokenize ( string $string ) : array Tokenizes a string and returns an array of the following format:
tokenizeJs ( ) : string Returns a javascript equivalent of [[tokenize]] that will be used on client side to tokenize the search query.

Method Details

tokenize() public method

[['t' => 'word', 'w' => 2], ['t' => 'other', 'w' => 1]] where the first part is the token string and the second is a weight value. Also removes [[stopWords]] from the list.
public tokenize ( string $string ) : array
$string string the string to tokenize
return array

tokenizeJs() public method

This is used to ensure the same tokenizer is used for building the index and for searching.
public tokenizeJs ( ) : string
return string

Property Details

$delimiters public_oe property

a list of characters that should be used as word delimiters.
public $delimiters

$stopWords public_oe property

a list of stopwords to remove from the token list.
public $stopWords