PHP 클래스 Goose\Modules\Formatters\OutputFormatter

상속: extends Goose\Modules\AbstractModule, implements Goose\Modules\ModuleInterface, use trait Goose\Traits\ArticleMutatorTrait, use trait Goose\Traits\NodeGravityTrait, use trait Goose\Traits\NodeCommonTrait
파일 보기 프로젝트 열기: scotteh/php-goose

보호된 프로퍼티들

프로퍼티 타입 설명
$CLEANUP_IGNORE_SELECTOR string
$SIBLING_BASE_LINE_SCORE double

공개 메소드들

메소드 설명
run ( Goose\Article $article )

비공개 메소드들

메소드 설명
addSiblings ( DOMWrap\Element $topNode )
cleanupHtml ( ) : string Scrape the node content and return the html
convertLinksToText ( DOMWrap\Element $topNode ) cleans up and converts any nodes that should be considered text into text
convertToHtml ( DOMWrap\Element $topNode ) : string
convertToText ( DOMWrap\Element $topNode ) : string Takes an element and turns the P tags into \n\n
getBaselineScoreForSiblings ( DOMWrap\Element $topNode ) : integer we could have long articles that have tons of paragraphs so if we tried to calculate the base score against the total text score of those paragraphs it would be unfair. So we need to normalize the score based on the average scoring of the paragraphs within the top node. For example if our total score of 10 paragraphs was 1000 but each had an average value of 100 then 100 should be our base.
getFormattedText ( ) : string Removes all unnecessary elements and formats the selected text nodes
getSiblingContent ( DOMWrap\Element $currentSibling, integer $baselineScoreForSiblingParagraphs ) : DOMWrap\Element[] Adds any siblings that may have a decent score to this node
getTagCleanedText ( DOMWrap\Element $item ) : string
isNodeScoreThreshholdMet ( DOMWrap\Element $topNode, DOMWrap\Element $node ) : boolean
isTableTagAndNoParagraphsExist ( DOMWrap\Element $topNode ) : boolean
postExtractionCleanup ( ) Remove any divs that looks like non-content, clusters of links, or paras with no gusto
removeNodesWithNegativeScores ( DOMWrap\Element $topNode ) if there are elements inside our top node that have a negative gravity score, let's give em the boot
removeParagraphsWithFewWords ( DOMWrap\Element $topNode ) remove paragraphs that have less than x number of words, would indicate that it's some sort of link
removeSmallParagraphs ( DOMWrap\Element $topNode )
replaceTagsWithText ( DOMWrap\Element $topNode ) replace common tags with just text so we don't have any crazy formatting issues so replace
, , , etc.

메소드 상세

run() 공개 메소드

public run ( Goose\Article $article )
$article Goose\Article

프로퍼티 상세

$CLEANUP_IGNORE_SELECTOR 보호되어 있는 정적으로 프로퍼티

protected static string $CLEANUP_IGNORE_SELECTOR
리턴 string

$SIBLING_BASE_LINE_SCORE 보호되어 있는 정적으로 프로퍼티

protected static double $SIBLING_BASE_LINE_SCORE
리턴 double