PHP 클래스 Graby\Extractor\ContentExtractor

Uses patterns specified in site config files and auto detection (hNews/PHP Readability) to extract content from HTML files.
파일 보기 프로젝트 열기: j0k3r/graby 1 사용 예제들

공개 프로퍼티들

프로퍼티 타입 설명
$readability

공개 메소드들

메소드 설명
__construct ( array $config = [], Psr\Log\LoggerInterface $logger = null, ConfigBuilder $configBuilder = null )
buildSiteConfig ( string $url, string $html = '', boolean $addToCache = true ) : SiteConfig Returns SiteConfig instance (joined in order: exact match, wildcard, fingerprint, global, default).
findHostUsingFingerprints ( string $html ) : string | false Try to find a host depending on a meta that can be in the html.
getContent ( )
getLanguage ( )
getNextPageUrl ( )
getSiteConfig ( )
getTitle ( )
process ( string $html, string $url, SiteConfig $siteConfig = null, boolean $smartTidy = true ) : boolean $smartTidy indicates that if tidy is used and no results are produced, we will try again without it.
reset ( )
setLogger ( Psr\Log\LoggerInterface $logger )

비공개 메소드들

메소드 설명
extractBody ( boolean $detectBody, string $xpathExpression, DOMNode $node, string $type ) : boolean Extract body from a given CSS for a node.
extractTitle ( boolean $detectTitle, string $cssClass, DOMNode $node, string $logMessage ) : boolean Extract title for a given CSS class a node.
hasElements ( DOMNodeList $elems ) : boolean Check if given node list exists and has length more than 0.
removeElements ( DOMNodeList $elems, string $logMessage = null ) Remove elements.

메소드 상세

__construct() 공개 메소드

public __construct ( array $config = [], Psr\Log\LoggerInterface $logger = null, ConfigBuilder $configBuilder = null )
$config array
$logger Psr\Log\LoggerInterface
$configBuilder Graby\SiteConfig\ConfigBuilder

buildSiteConfig() 공개 메소드

Returns SiteConfig instance (joined in order: exact match, wildcard, fingerprint, global, default).
public buildSiteConfig ( string $url, string $html = '', boolean $addToCache = true ) : SiteConfig
$url string
$html string
$addToCache boolean
리턴 Graby\SiteConfig\SiteConfig

findHostUsingFingerprints() 공개 메소드

It allow to determine if a website is generated using Wordpress, Blogger, etc ..
public findHostUsingFingerprints ( string $html ) : string | false
$html string
리턴 string | false

getContent() 공개 메소드

public getContent ( )

getLanguage() 공개 메소드

public getLanguage ( )

getNextPageUrl() 공개 메소드

public getNextPageUrl ( )

getSiteConfig() 공개 메소드

public getSiteConfig ( )

getTitle() 공개 메소드

public getTitle ( )

process() 공개 메소드

Tidy helps us deal with PHP's patchy HTML parsing most of the time but it has problems of its own which we try to avoid with this option.
public process ( string $html, string $url, SiteConfig $siteConfig = null, boolean $smartTidy = true ) : boolean
$html string
$url string
$siteConfig Graby\SiteConfig\SiteConfig Will avoid to recalculate the site config
$smartTidy boolean Do we need to tidy the html ?
리턴 boolean true on success, false on failure

reset() 공개 메소드

public reset ( )

setLogger() 공개 메소드

public setLogger ( Psr\Log\LoggerInterface $logger )
$logger Psr\Log\LoggerInterface

프로퍼티 상세

$readability 공개적으로 프로퍼티

public $readability