PHP Class SolrWebService, ojs

Inheritance: extends XmlWebService
Exibir arquivo Open project: pkp/ojs Class Usage Examples

Public Properties

Property Type Description
$_fieldCache A cache containing the available search fields.
$_instId The unique ID identifying this OJS installation to the solr server.
$_issueCache An issue cache.
$_journalCache A journal cache.
$_serviceMessage A description of the last error or message that occurred when calling the service.
$_solrCore The solr core we get our data from.
$_solrSearchHandler The solr search handler name we place our searches on.
$_solrServer The base URL of the solr server without core and search handler.
$_useProxySettings Whether the proxy settings in the config.inc.php should be considered for the web service request.

Public Methods

Method Description
__construct ( $searchHandler, $username, $password, $instId, $useProxy = false ) Constructor
_addArticleXml ( &$articleDoc, &$article, &$journal, $markToDelete = false ) Add the metadata XML of a single article to an XML article list.
_addSubquery ( $fieldList, $searchPhrase, $params ) Add a subquery to the search query.
_cacheMiss ( $cache, $id ) : array Refresh the cache from the solr server.
_convertDate ( $timestamp ) : string Convert a date from local time (unix timestamp or ISO date string) to UTC time as understood by solr.
_deleteFromIndex ( $xml ) : boolean Delete documents from the index (by ID or by query).
_expandFieldList ( $fields ) : string Expand the given list of fields.
_getAdminUrl ( ) : string Identifies the general solr admin endpoint from the search handler URL.
_getArticleListXml ( &$articles, $totalCount, &$numDeleted ) : string Retrieve the XML for a batch of articles to be updated.
_getAutosuggestUrl ( $autosuggestType ) : string Returns the solr auto-suggestion endpoint.
_getCache ( ) : FileCache Get the field cache.
_getCoreAdminUrl ( ) : string Identifies the solr core-specific admin endpoint from the search handler URL.
_getDihUrl ( ) : string Returns the solr DIH endpoint.
_getDocumentsProcessed ( $result ) : integer Retrieve the number of indexed documents from a DIH response XML
_getFacetingAutosuggestions ( $url, $searchRequest, $userInput, $fieldName ) : array Retrieve auto-suggestions from the faceting service.
_getFieldNames ( $fieldType ) : array Return a list of all text fields that may occur in the index.
_getInterestingTermsUrl ( ) : string Returns the solr endpoint to retrieve "interesting terms" from a given document.
_getIssue ( $issueId, $journalId ) : Issue Retrieve an issue (possibly from the cache).
_getJournal ( $journalId ) : Journal Retrieve a journal (possibly from the cache).
_getLocalesAndFormats ( $field ) : array Identify all format/locale versions of the given field.
_getOrdering ( $field, $direction ) : string Generate the ordering parameter of a search query.
_getReloadExternalFilesUrl ( ) Returns the solr endpoint to reload external files.
_getSearchQueryParameters ( &$searchRequest ) : array | null Create the edismax query parameters from a search request.
_getSearchUrl ( ) : string Returns the solr search endpoint.
_getSuggesterAutosuggestions ( $url, $userInput, $fieldName ) : array Retrieve auto-suggestions from the suggester service.
_getUpdateUrl ( ) : string Returns the solr update endpoint.
_indexingTransaction ( $sendXmlCallback, $batchSize = SOLR_INDEXING_MAX_BATCHSIZE, $journalId = null ) This method encapsulates an indexing transaction (pull or push).
_isArticleAccessAuthorized ( &$article ) : boolean Check whether access to the given article is authorized to the requesting party (i.e. the Solr server).
_makeRequest ( $url, $params = [], $method = 'GET' ) : DOMXPath Make a request
_pushIndexingCallback ( &$articleXml, $batchCount, $numDeleted ) : integer Handle push indexing.
_setQuery ( $fieldList, $searchPhrase, $spellcheck = false ) Set the query parameters for a search query.
_translateSearchPhrase ( $searchPhrase, $backwards = false ) : The Translate query keywords.
deleteArticleFromIndex ( $articleId ) : boolean Deletes the given article from the Solr index.
deleteArticlesFromIndex ( $journalId = null ) : boolean Deletes all articles of a journal or of the installation from the Solr index.
flushFieldCache ( ) Flush the field cache.
getArticleFromIndex ( $articleId ) : array Retrieve a document directly from the index (for testing/debugging purposes only).
getAutosuggestions ( $searchRequest, $fieldName, $userInput, $autosuggestType ) : array Retrieve auto-suggestions from the solr index corresponding to the given user input.
getAvailableFields ( $fieldType ) : array Returns an array with all (dynamic) fields in the index.
getInterestingTerms ( $articleId ) : array Retrieve "interesting terms" from a document to be used in a "similar documents" search.
getServerStatus ( ) : integer Checks the solr server status.
getServiceMessage ( ) : string Get the last service message.
markArticleChanged ( $articleId ) Mark a single article "changed" so that the indexing back-end will update it during the next batch update.
markJournalChanged ( $journalId ) : integer Mark the given journal for re-indexing.
pullChangedArticles ( $pullIndexingCallback, $batchSize = SOLR_INDEXING_MAX_BATCHSIZE, $journalId = null ) : integer Retrieves a batch of articles in XML format.
pushChangedArticles ( $batchSize = SOLR_INDEXING_MAX_BATCHSIZE, $journalId = null ) : integer (Re-)indexes all changed articles in Solr.
rebuildDictionaries ( ) Rebuilds the spelling/auto-suggest dictionaries.
reloadExternalFiles ( ) Reloads external files.
retrieveResults ( &$searchRequest, &$totalResults ) : array Execute a search against the Solr search server.

Method Details

__construct() public method

Constructor
public __construct ( $searchHandler, $username, $password, $instId, $useProxy = false )
$searchHandler string The search handler URL. We assume the embedded server as a default.
$username string The HTTP BASIC authentication username.
$password string The corresponding password.
$instId string The unique ID of this OJS installation to partition a shared index. @param $useProxy boolean Whether the proxy settings from config.inc.php should be considered.

_addArticleXml() public method

Add the metadata XML of a single article to an XML article list.
public _addArticleXml ( &$articleDoc, &$article, &$journal, $markToDelete = false )
$articleDoc DOMDocument
$article PublishedArticle
$journal Journal
$markToDelete boolean If true the returned XML will only contain a deletion marker.

_addSubquery() public method

NB: subqueries do not support collation (for alternative spelling suggestions).
public _addSubquery ( $fieldList, $searchPhrase, $params )
$fieldList string A list of fields to be queried, separated by '|'.
$searchPhrase string The search phrase to be added.
$params array The existing query parameters.

_cacheMiss() public method

Refresh the cache from the solr server.
public _cacheMiss ( $cache, $id ) : array
$cache FileCache
$id string The field type.
return array The available field names.

_convertDate() public method

NB: Using intermediate unix timestamps can be a problem in older PHP versions, especially on Windows where negative timestamps are not supported. As Solr requires PHP5 that should not be a big problem in practice, except for electronic publications that go back until earlier than 1901. It does not seem probable that such a situation could realistically arise with OJS.
public _convertDate ( $timestamp ) : string
$timestamp int|string Unix timestamp or local ISO time.
return string ISO UTC timestamp

_deleteFromIndex() public method

Delete documents from the index (by ID or by query).
public _deleteFromIndex ( $xml ) : boolean
$xml string The documents to delete.
return boolean true, if successful, otherwise false.

_expandFieldList() public method

Expand the given list of fields.
public _expandFieldList ( $fields ) : string
$fields array
return string A space-separated field list (e.g. to be used in edismax's qf parameter).

_getAdminUrl() public method

Identifies the general solr admin endpoint from the search handler URL.
public _getAdminUrl ( ) : string
return string

_getArticleListXml() public method

Retrieve the XML for a batch of articles to be updated.
public _getArticleListXml ( &$articles, $totalCount, &$numDeleted ) : string
$articles DBResultFactory The articles to be included in the list.
$totalCount integer The overall number of changed articles (not only the current batch).
$numDeleted integer An output parameter that returns the number of documents that will be deleted.
return string The XML ready to be consumed by the Solr data import service.

_getAutosuggestUrl() public method

Returns the solr auto-suggestion endpoint.
public _getAutosuggestUrl ( $autosuggestType ) : string
$autosuggestType string One of the SOLR_AUTOSUGGEST_* constants
return string

_getCache() public method

Get the field cache.
public _getCache ( ) : FileCache
return FileCache

_getCoreAdminUrl() public method

Identifies the solr core-specific admin endpoint from the search handler URL.
public _getCoreAdminUrl ( ) : string
return string

_getDihUrl() public method

Returns the solr DIH endpoint.
public _getDihUrl ( ) : string
return string

_getDocumentsProcessed() public method

Retrieve the number of indexed documents from a DIH response XML
public _getDocumentsProcessed ( $result ) : integer
$result DOMXPath
return integer

_getFacetingAutosuggestions() public method

Retrieve auto-suggestions from the faceting service.
public _getFacetingAutosuggestions ( $url, $searchRequest, $userInput, $fieldName ) : array
$url string
$searchRequest SolrSearchRequest
$userInput string
$fieldName string
return array The generated suggestions.

_getFieldNames() public method

Return a list of all text fields that may occur in the index.
public _getFieldNames ( $fieldType ) : array
$fieldType string "search", "sort" or "all"
return array

_getInterestingTermsUrl() public method

Returns the solr endpoint to retrieve "interesting terms" from a given document.
public _getInterestingTermsUrl ( ) : string
return string

_getIssue() public method

Retrieve an issue (possibly from the cache).
public _getIssue ( $issueId, $journalId ) : Issue
$issueId int
$journalId int
return Issue

_getJournal() public method

Retrieve a journal (possibly from the cache).
public _getJournal ( $journalId ) : Journal
$journalId int
return Journal

_getLocalesAndFormats() public method

Identify all format/locale versions of the given field.
public _getLocalesAndFormats ( $field ) : array
$field string A field name without any extension.
return array A list of index fields.

_getOrdering() public method

Generate the ordering parameter of a search query.
public _getOrdering ( $field, $direction ) : string
$field string the field to order by
$direction boolean true for ascending, false for descending
return string The ordering to be used (default: descending relevance).

_getReloadExternalFilesUrl() public method

Returns the solr endpoint to reload external files.

_getSearchQueryParameters() public method

Create the edismax query parameters from a search request.
public _getSearchQueryParameters ( &$searchRequest ) : array | null
$searchRequest SolrSearchRequest
return array | null A parameter array or null if something went wrong.

_getSearchUrl() public method

Returns the solr search endpoint.
public _getSearchUrl ( ) : string
return string

_getSuggesterAutosuggestions() public method

Retrieve auto-suggestions from the suggester service.
public _getSuggesterAutosuggestions ( $url, $userInput, $fieldName ) : array
$url string
$userInput string
$fieldName string
return array The generated suggestions.

_getUpdateUrl() public method

Returns the solr update endpoint.
public _getUpdateUrl ( ) : string
return string

_indexingTransaction() public method

It consists in generating the XML, transferring it to the server and marking the transferred articles as "indexed".
public _indexingTransaction ( $sendXmlCallback, $batchSize = SOLR_INDEXING_MAX_BATCHSIZE, $journalId = null )
$sendXmlCallback callback This function will be called with the generated XML.
$batchSize integer The maximum number of articles to be returned.
$journalId integer If given, only retrieves articles for the given journal.

_isArticleAccessAuthorized() public method

Check whether access to the given article is authorized to the requesting party (i.e. the Solr server).
public _isArticleAccessAuthorized ( &$article ) : boolean
$article Article
return boolean True if authorized, otherwise false.

_makeRequest() public method

Make a request
public _makeRequest ( $url, $params = [], $method = 'GET' ) : DOMXPath
$url string The request URL
$params mixed array (key value pairs) or string request parameters
$method string GET or POST
return DOMXPath An XPath object with the response loaded. Null if an error occurred. See _serviceMessage for more details about the error.

_pushIndexingCallback() public method

This method pushes XML with index changes directly to the Solr data import handler for immediate processing.
public _pushIndexingCallback ( &$articleXml, $batchCount, $numDeleted ) : integer
$articleXml string The XML with index changes to be pushed to the Solr server.
$batchCount integer The number of articles in the XML list (i.e. the expected number of documents to be indexed).
$numDeleted integer The number of articles in the XML list that are marked for deletion.
return integer The number of articles processed or null if an error occurred. After an error the method SolrWebService::getServiceMessage() will return details of the error.

_setQuery() public method

Set the query parameters for a search query.
public _setQuery ( $fieldList, $searchPhrase, $spellcheck = false )
$fieldList string A list of fields to be queried, separated by '|'.
$searchPhrase string The search phrase to be added.
$spellcheck boolean Whether to switch spellchecking on.

_translateSearchPhrase() public method

Translate query keywords.
public _translateSearchPhrase ( $searchPhrase, $backwards = false ) : The
$searchPhrase string
return The translated search phrase.

deleteArticleFromIndex() public method

Deletes the given article from the Solr index.
public deleteArticleFromIndex ( $articleId ) : boolean
$articleId integer The ID of the article to be deleted.
return boolean true if successful, otherwise false.

deleteArticlesFromIndex() public method

Deletes all articles of a journal or of the installation from the Solr index.
public deleteArticlesFromIndex ( $journalId = null ) : boolean
$journalId integer If given, only articles from this journal will be deleted.
return boolean true if successful, otherwise false.

flushFieldCache() public method

Flush the field cache.
public flushFieldCache ( )

getArticleFromIndex() public method

Retrieve a document directly from the index (for testing/debugging purposes only).
public getArticleFromIndex ( $articleId ) : array
$articleId
return array The document fields.

getAutosuggestions() public method

Retrieve auto-suggestions from the solr index corresponding to the given user input.
public getAutosuggestions ( $searchRequest, $fieldName, $userInput, $autosuggestType ) : array
$searchRequest SolrSearchRequest Active search filters. Choosing the faceting auto-suggest implementation via $autosuggestType will pre-filter auto-suggestions based on this search request. In case of the suggester component, the search request will simply be ignored.
$fieldName string The field to suggest values for. Values are queried on field level to improve relevance of suggestions.
$userInput string Partial query input. This input will be split split up. Only the last query term will be used to suggest values.
$autosuggestType string One of the SOLR_AUTOSUGGEST_* constants. The faceting implementation is slower but will return more relevant suggestions. The suggestor implementation is faster and scales better in large deployments. It will return terms from a field-specific global dictionary, though, e.g. from different journals.
return array A list of suggested queries

getAvailableFields() public method

NB: This is cached data so after an index update we may have to flush the index to re-read the current index state.
public getAvailableFields ( $fieldType ) : array
$fieldType string Either 'search' or 'sort'.
return array

getInterestingTerms() public method

Retrieve "interesting terms" from a document to be used in a "similar documents" search.
public getInterestingTerms ( $articleId ) : array
$articleId integer The article from which we retrieve "interesting terms".
return array An array of terms that can be used to execute a search for similar documents.

getServerStatus() public method

Checks the solr server status.
public getServerStatus ( ) : integer
return integer One of the SOLR_STATUS_* constants.

getServiceMessage() public method

Get the last service message.
public getServiceMessage ( ) : string
return string

markArticleChanged() public method

Mark a single article "changed" so that the indexing back-end will update it during the next batch update.
public markArticleChanged ( $articleId )
$articleId Integer

markJournalChanged() public method

Mark the given journal for re-indexing.
public markJournalChanged ( $journalId ) : integer
$journalId integer The ID of the journal to be (re-)indexed.
return integer The number of articles that have been marked.

pullChangedArticles() public method

This is the pull-indexing implementation of the Solr web service. To control memory usage and response time we index articles in batches. Batches should be as large as possible to reduce index commit overhead.
public pullChangedArticles ( $pullIndexingCallback, $batchSize = SOLR_INDEXING_MAX_BATCHSIZE, $journalId = null ) : integer
$batchSize integer The maximum number of articles to be returned.
$journalId integer If given, only returns articles from the given journal.
return integer The number of articles processed or null if an error occurred. After an error the method SolrWebService::getServiceMessage() will return details of the error.

pushChangedArticles() public method

This is the push-indexing implementation of the Solr web service. To control memory usage and response time we index articles in batches. Batches should be as large as possible to reduce index commit overhead.
public pushChangedArticles ( $batchSize = SOLR_INDEXING_MAX_BATCHSIZE, $journalId = null ) : integer
$batchSize integer The maximum number of articles to be indexed in this run.
$journalId integer If given, restrains index updates to the given journal.
return integer The number of articles processed or null if an error occurred. After an error the method SolrWebService::getServiceMessage() will return details of the error.

rebuildDictionaries() public method

Rebuilds the spelling/auto-suggest dictionaries.
public rebuildDictionaries ( )

reloadExternalFiles() public method

Reloads external files.
public reloadExternalFiles ( )

retrieveResults() public method

Execute a search against the Solr search server.
public retrieveResults ( &$searchRequest, &$totalResults ) : array
$searchRequest SolrSearchRequest
$totalResults integer An output parameter returning the total number of search results found by the query. This differs from the actual number of returned results as the search can be limited.
return array An array of search results. The main keys are result types. These are "scoredResults" and "alternativeSpelling". The keys in the "scoredResults" sub-array are scores (1-9999) and the values are article IDs. The alternative spelling sub-array returns an alternative query string (if any) and the number of hits for this string. Null if an error occurred while querying the server.

Property Details

$_fieldCache public_oe property

A cache containing the available search fields.
public $_fieldCache

$_instId public_oe property

The unique ID identifying this OJS installation to the solr server.
public $_instId

$_issueCache public_oe property

An issue cache.
public $_issueCache

$_journalCache public_oe property

A journal cache.
public $_journalCache

$_serviceMessage public_oe property

A description of the last error or message that occurred when calling the service.
public $_serviceMessage

$_solrCore public_oe property

The solr core we get our data from.
public $_solrCore

$_solrSearchHandler public_oe property

The solr search handler name we place our searches on.
public $_solrSearchHandler

$_solrServer public_oe property

The base URL of the solr server without core and search handler.
public $_solrServer

$_useProxySettings public_oe property

Whether the proxy settings in the config.inc.php should be considered for the web service request.
public $_useProxySettings