PHP Class PicoFeed\Scraper\CandidateParser

Author: Frederic Guillot
Inheritance: implements PicoFeed\Scraper\ParserInterface
Mostra file Open project: fguillot/picofeed

Public Methods

Method Description
__construct ( string $html ) Constructor.
execute ( ) : string Get the relevant content with the list of potential attributes.
findContentWithArticle ( ) : string Find
tag.
findContentWithBody ( ) : string Find tag.
findContentWithCandidates ( ) : string Find content based on the list of tag candidates.
findNextLink ( ) : string Find link for next page of the article.
shouldRemove ( DomDocument $dom, DomNode $node ) : boolean Return false if the node should not be removed.
stripAttributes ( DomDocument $dom, DOMXPath $xpath ) Remove blacklisted attributes.
stripGarbage ( string $content ) : string Strip useless tags.
stripTags ( DOMXPath $xpath ) Remove blacklisted tags.

Method Details

__construct() public method

Constructor.
public __construct ( string $html )
$html string

execute() public method

Get the relevant content with the list of potential attributes.
public execute ( ) : string
return string

findContentWithArticle() public method

Find
tag.
public findContentWithArticle ( ) : string
return string

findContentWithBody() public method

Find tag.
public findContentWithBody ( ) : string
return string

findContentWithCandidates() public method

Find content based on the list of tag candidates.

shouldRemove() public method

Return false if the node should not be removed.
public shouldRemove ( DomDocument $dom, DomNode $node ) : boolean
$dom DomDocument
$node DomNode
return boolean

stripAttributes() public method

Remove blacklisted attributes.
public stripAttributes ( DomDocument $dom, DOMXPath $xpath )
$dom DomDocument
$xpath DOMXPath

stripGarbage() public method

Strip useless tags.
public stripGarbage ( string $content ) : string
$content string
return string

stripTags() public method

Remove blacklisted tags.
public stripTags ( DOMXPath $xpath )
$xpath DOMXPath