PHP 클래스 Graby\SiteConfig\SiteConfig

Each instance of this class should hold extraction patterns and other directives for a website. See ContentExtractor class to see how it's used.
저자: Keyvan Minoukadeh
파일 보기 프로젝트 열기: j0k3r/graby

공개 프로퍼티들

프로퍼티 타입 설명
$author Use first matching element as author (0 or more xpath expressions)
$autodetect_on_failure bool or null if undeclared
$body Use first matching element as body (0 or more xpath expressions)
$cache_key the options below cannot be set in the config files which this class represents
$date Use first matching element as date (0 or more xpath expressions)
$find_string Strings to search for in HTML before processing begins (used with $replace_string)
$http_header Additional HTTP headers to send
$login_extra_fields Extra fields to POST to the site's login form.
$login_password_field string Name of the site's login form password field. Example: password.
$login_uri string Site's login form URI, if applicable.
$login_username_field string Name of the site's login form username field. Example: username.
$next_page_link
$not_logged_in_xpath string XPath query to detect if login is requested in a page from the site.
$parser string or null if undeclared
$prune bool or null if undeclared
$replace_string Strings to replace those found in $find_string before HTML processing begins
$requires_login boolean If fetching the site's content requires to authentify.
$single_page_link we will retrieve that page and the rest of the options in this config will be applied to the new page.
$strip Strip elements matching these xpath expressions (0 or more)
$strip_id_or_class Strip elements which contain these strings (0 or more) in the id or class attribute
$strip_image_src Strip images which contain these strings (0 or more) in the src attribute
$test_url Test URL - if present, can be used to test the config above
$tidy Process HTML with tidy before creating DOM (bool or null if undeclared)
$title Use first matching element as title (0 or more xpath expressions)

보호된 프로퍼티들

프로퍼티 타입 설명
$default_autodetect_on_failure
$default_parser
$default_prune
$default_tidy

공개 메소드들

메소드 설명
autodetect_on_failure ( boolean $use_default = true ) : boolean | null Autodetect title/body if xpath expressions fail to produce results.
parser ( boolean $use_default = true ) : string | null Which parser to use for turning raw HTML into a DOMDocument (either 'libxml' or 'html5lib').
prune ( boolean $use_default = true ) : boolean | null Clean up content block - attempt to remove elements that appear to be superfluous.
tidy ( boolean $use_default = true ) : boolean | null Process HTML with tidy before creating DOM (bool or null if undeclared).

메소드 상세

autodetect_on_failure() 공개 메소드

Autodetect title/body if xpath expressions fail to produce results.
public autodetect_on_failure ( boolean $use_default = true ) : boolean | null
$use_default boolean
리턴 boolean | null

parser() 공개 메소드

Which parser to use for turning raw HTML into a DOMDocument (either 'libxml' or 'html5lib').
public parser ( boolean $use_default = true ) : string | null
$use_default boolean
리턴 string | null

prune() 공개 메소드

Clean up content block - attempt to remove elements that appear to be superfluous.
public prune ( boolean $use_default = true ) : boolean | null
$use_default boolean
리턴 boolean | null

tidy() 공개 메소드

Process HTML with tidy before creating DOM (bool or null if undeclared).
public tidy ( boolean $use_default = true ) : boolean | null
$use_default boolean
리턴 boolean | null

프로퍼티 상세

$author 공개적으로 프로퍼티

Use first matching element as author (0 or more xpath expressions)
public $author

$autodetect_on_failure 공개적으로 프로퍼티

bool or null if undeclared
public $autodetect_on_failure

$body 공개적으로 프로퍼티

Use first matching element as body (0 or more xpath expressions)
public $body

$cache_key 공개적으로 프로퍼티

the options below cannot be set in the config files which this class represents
public $cache_key

$date 공개적으로 프로퍼티

Use first matching element as date (0 or more xpath expressions)
public $date

$default_autodetect_on_failure 보호되어 있는 프로퍼티

protected $default_autodetect_on_failure

$default_parser 보호되어 있는 프로퍼티

protected $default_parser

$default_prune 보호되어 있는 프로퍼티

protected $default_prune

$default_tidy 보호되어 있는 프로퍼티

protected $default_tidy

$find_string 공개적으로 프로퍼티

Strings to search for in HTML before processing begins (used with $replace_string)
public $find_string

$http_header 공개적으로 프로퍼티

Additional HTTP headers to send
public $http_header

$login_extra_fields 공개적으로 프로퍼티

Extra fields to POST to the site's login form.
public $login_extra_fields

$login_password_field 공개적으로 프로퍼티

Name of the site's login form password field. Example: password.
public string $login_password_field
리턴 string

$login_uri 공개적으로 프로퍼티

Site's login form URI, if applicable.
public string $login_uri
리턴 string

$login_username_field 공개적으로 프로퍼티

Name of the site's login form username field. Example: username.
public string $login_username_field
리턴 string

$not_logged_in_xpath 공개적으로 프로퍼티

XPath query to detect if login is requested in a page from the site.
public string $not_logged_in_xpath
리턴 string

$parser 공개적으로 프로퍼티

string or null if undeclared
public $parser

$prune 공개적으로 프로퍼티

bool or null if undeclared
public $prune

$replace_string 공개적으로 프로퍼티

Strings to replace those found in $find_string before HTML processing begins
public $replace_string

$requires_login 공개적으로 프로퍼티

If fetching the site's content requires to authentify.
public bool $requires_login
리턴 boolean

$strip 공개적으로 프로퍼티

Strip elements matching these xpath expressions (0 or more)
public $strip

$strip_id_or_class 공개적으로 프로퍼티

Strip elements which contain these strings (0 or more) in the id or class attribute
public $strip_id_or_class

$strip_image_src 공개적으로 프로퍼티

Strip images which contain these strings (0 or more) in the src attribute
public $strip_image_src

$test_url 공개적으로 프로퍼티

Test URL - if present, can be used to test the config above
public $test_url

$tidy 공개적으로 프로퍼티

Process HTML with tidy before creating DOM (bool or null if undeclared)
public $tidy

$title 공개적으로 프로퍼티

Use first matching element as title (0 or more xpath expressions)
public $title