PHP Class Graby\SiteConfig\SiteConfig

Each instance of this class should hold extraction patterns and other directives for a website. See ContentExtractor class to see how it's used.

Author: Keyvan Minoukadeh

Afficher le fichier Open project: j0k3r/graby

Méthodes publiques

Свойство	Type	Description
$author		Use first matching element as author (0 or more xpath expressions)
$autodetect_on_failure		bool or null if undeclared
$body		Use first matching element as body (0 or more xpath expressions)
$cache_key		the options below cannot be set in the config files which this class represents
$date		Use first matching element as date (0 or more xpath expressions)
$find_string		Strings to search for in HTML before processing begins (used with $replace_string)
$http_header		Additional HTTP headers to send
$login_extra_fields		Extra fields to POST to the site's login form.
$login_password_field	string	Name of the site's login form password field. Example: password.
$login_uri	string	Site's login form URI, if applicable.
$login_username_field	string	Name of the site's login form username field. Example: username.
$next_page_link
$not_logged_in_xpath	string	XPath query to detect if login is requested in a page from the site.
$parser		string or null if undeclared
$prune		bool or null if undeclared
$replace_string		Strings to replace those found in $find_string before HTML processing begins
$requires_login	boolean	If fetching the site's content requires to authentify.
$single_page_link		we will retrieve that page and the rest of the options in this config will be applied to the new page.
$strip		Strip elements matching these xpath expressions (0 or more)
$strip_id_or_class		Strip elements which contain these strings (0 or more) in the id or class attribute
$strip_image_src		Strip images which contain these strings (0 or more) in the src attribute
$test_url		Test URL - if present, can be used to test the config above
$tidy		Process HTML with tidy before creating DOM (bool or null if undeclared)
$title		Use first matching element as title (0 or more xpath expressions)

Protected Properties

Свойство	Type	Description
$default_autodetect_on_failure
$default_parser
$default_prune
$default_tidy

Méthodes publiques

Méthode	Description
autodetect_on_failure ( boolean $use_default = true ) : boolean \| null	Autodetect title/body if xpath expressions fail to produce results.
parser ( boolean $use_default = true ) : string \| null	Which parser to use for turning raw HTML into a DOMDocument (either 'libxml' or 'html5lib').
prune ( boolean $use_default = true ) : boolean \| null	Clean up content block - attempt to remove elements that appear to be superfluous.
tidy ( boolean $use_default = true ) : boolean \| null	Process HTML with tidy before creating DOM (bool or null if undeclared).

Method Details

autodetect_on_failure() public méthode

Autodetect title/body if xpath expressions fail to produce results.

public autodetect_on_failure ( boolean $use_default = true ) : boolean \| null
$use_default	boolean
Résultat	boolean \| null

parser() public méthode

Which parser to use for turning raw HTML into a DOMDocument (either 'libxml' or 'html5lib').

public parser ( boolean $use_default = true ) : string \| null
$use_default	boolean
Résultat	string \| null

prune() public méthode

Clean up content block - attempt to remove elements that appear to be superfluous.

public prune ( boolean $use_default = true ) : boolean \| null
$use_default	boolean
Résultat	boolean \| null

tidy() public méthode

Process HTML with tidy before creating DOM (bool or null if undeclared).

public tidy ( boolean $use_default = true ) : boolean \| null
$use_default	boolean
Résultat	boolean \| null

Property Details

$author public_oe property

Use first matching element as author (0 or more xpath expressions)

public $author

$autodetect_on_failure public_oe property

bool or null if undeclared

public $autodetect_on_failure

$body public_oe property

Use first matching element as body (0 or more xpath expressions)

public $body

$cache_key public_oe property

the options below cannot be set in the config files which this class represents

public $cache_key

$date public_oe property

Use first matching element as date (0 or more xpath expressions)

public $date

$default_autodetect_on_failure protected_oe property

protected $default_autodetect_on_failure

$default_parser protected_oe property

protected $default_parser

$default_prune protected_oe property

protected $default_prune

$default_tidy protected_oe property

protected $default_tidy

$find_string public_oe property

Strings to search for in HTML before processing begins (used with $replace_string)

public $find_string

$http_header public_oe property

Additional HTTP headers to send

public $http_header

$login_extra_fields public_oe property

Extra fields to POST to the site's login form.

public $login_extra_fields

$login_password_field public_oe property

Name of the site's login form password field. Example: password.

public string $login_password_field
Résultat	string

$login_uri public_oe property

Site's login form URI, if applicable.

public string $login_uri
Résultat	string

$login_username_field public_oe property

Name of the site's login form username field. Example: username.

public string $login_username_field
Résultat	string

$next_page_link public_oe property

public $next_page_link

$not_logged_in_xpath public_oe property

XPath query to detect if login is requested in a page from the site.

public string $not_logged_in_xpath
Résultat	string

$parser public_oe property

string or null if undeclared

public $parser

$prune public_oe property

bool or null if undeclared

public $prune

$replace_string public_oe property

Strings to replace those found in $find_string before HTML processing begins

public $replace_string

$requires_login public_oe property

If fetching the site's content requires to authentify.

public bool $requires_login
Résultat	boolean

$single_page_link public_oe property

we will retrieve that page and the rest of the options in this config will be applied to the new page.

public $single_page_link

$strip public_oe property

Strip elements matching these xpath expressions (0 or more)

public $strip

$strip_id_or_class public_oe property

Strip elements which contain these strings (0 or more) in the id or class attribute

public $strip_id_or_class

$strip_image_src public_oe property

Strip images which contain these strings (0 or more) in the src attribute

public $strip_image_src

$test_url public_oe property

Test URL - if present, can be used to test the config above

public $test_url

$tidy public_oe property

Process HTML with tidy before creating DOM (bool or null if undeclared)

public $tidy

$title public_oe property

Use first matching element as title (0 or more xpath expressions)

public $title