PHP Класс Goose\Modules\Formatters\OutputFormatter

Наследование: extends Goose\Modules\AbstractModule, implements Goose\Modules\ModuleInterface, use trait Goose\Traits\ArticleMutatorTrait, use trait Goose\Traits\NodeGravityTrait, use trait Goose\Traits\NodeCommonTrait
Показать файл Открыть проект

Защищенные свойства (Protected)

Свойство Тип Описание
$CLEANUP_IGNORE_SELECTOR string
$SIBLING_BASE_LINE_SCORE double

Открытые методы

Метод Описание
run ( Goose\Article $article )

Приватные методы

Метод Описание
addSiblings ( DOMWrap\Element $topNode )
cleanupHtml ( ) : string Scrape the node content and return the html
convertLinksToText ( DOMWrap\Element $topNode ) cleans up and converts any nodes that should be considered text into text
convertToHtml ( DOMWrap\Element $topNode ) : string
convertToText ( DOMWrap\Element $topNode ) : string Takes an element and turns the P tags into \n\n
getBaselineScoreForSiblings ( DOMWrap\Element $topNode ) : integer we could have long articles that have tons of paragraphs so if we tried to calculate the base score against the total text score of those paragraphs it would be unfair. So we need to normalize the score based on the average scoring of the paragraphs within the top node. For example if our total score of 10 paragraphs was 1000 but each had an average value of 100 then 100 should be our base.
getFormattedText ( ) : string Removes all unnecessary elements and formats the selected text nodes
getSiblingContent ( DOMWrap\Element $currentSibling, integer $baselineScoreForSiblingParagraphs ) : DOMWrap\Element[] Adds any siblings that may have a decent score to this node
getTagCleanedText ( DOMWrap\Element $item ) : string
isNodeScoreThreshholdMet ( DOMWrap\Element $topNode, DOMWrap\Element $node ) : boolean
isTableTagAndNoParagraphsExist ( DOMWrap\Element $topNode ) : boolean
postExtractionCleanup ( ) Remove any divs that looks like non-content, clusters of links, or paras with no gusto
removeNodesWithNegativeScores ( DOMWrap\Element $topNode ) if there are elements inside our top node that have a negative gravity score, let's give em the boot
removeParagraphsWithFewWords ( DOMWrap\Element $topNode ) remove paragraphs that have less than x number of words, would indicate that it's some sort of link
removeSmallParagraphs ( DOMWrap\Element $topNode )
replaceTagsWithText ( DOMWrap\Element $topNode ) replace common tags with just text so we don't have any crazy formatting issues so replace
, , , etc.

Описание методов

run() публичный Метод

public run ( Goose\Article $article )
$article Goose\Article

Описание свойств

$CLEANUP_IGNORE_SELECTOR защищенное статическое свойство

protected static string $CLEANUP_IGNORE_SELECTOR
Результат string

$SIBLING_BASE_LINE_SCORE защищенное статическое свойство

protected static double $SIBLING_BASE_LINE_SCORE
Результат double