PHP Class PressBooks\Modules\Import\Html\Xhtml

Inheritance: extends PressBooks\Modules\Import\Import

Méthodes publiques

Méthode	Description
import ( array $current_import ) : boolean
kneadHtml ( string $html, string $type, string $domain ) : string	Pummel the HTML into WordPress compatible dough.
kneadandInsert ( $html, string $post_type, integer $chapter_parent, string $domain )	Pummel then insert HTML into our database
setCurrentImportOption ( array $upload ) : boolean

Méthode	Description
extractCCLicense ( string $url ) : string	Expects a URL string with Creative Commons domain similar in form to: http://creativecommons.org/licenses/by-sa/4.0/
fetchAndSaveUniqueImage ( string $url ) : string	Extract url and load into WP using media_handle_sideload() Will return an empty string if something went wrong.
getAuthors ( string $html ) : array	Looks for meta data in the section of an HTML document.
getLicenseAttribution ( string $html ) : array	Looks for div class created by the license module in PB, returns author and license information.
regexSearchReplace ( string $html ) : string	Cherry pick likely content areas, then cull known, unwanted content areas
scrapeAndKneadImages ( DOMDocument $doc, string $domain ) : DOMDocument	Parse HTML snippet, save all found tags using media_handle_sideload(), return the HTML with changed paths.
scrapeAndKneadMeta ( DOMDocument $doc ) : array	Extracts section/book author and section/book license if they exist.
tidy ( string $html ) : string	Compliance with XHTML standards, rid cruft generated by word processors

Expects a URL string with Creative Commons domain similar in form to: http://creativecommons.org/licenses/by-sa/4.0/

protected extractCCLicense ( string $url ) : string
$url	string
Résultat	string	license meta value

Extract url and load into WP using media_handle_sideload() Will return an empty string if something went wrong.

Priority is given to PB generated meta data.

protected getAuthors ( string $html ) : array
$html	string
Résultat	array	$authors

Looks for div class created by the license module in PB, returns author and license information.

protected getLicenseAttribution ( string $html ) : array
$html	string
Résultat	array	$meta

public import ( array $current_import ) : boolean
$current_import	array
Résultat	boolean

Pummel the HTML into WordPress compatible dough.

public kneadHtml ( string $html, string $type, string $domain ) : string
$html	string
$type	string	front-matter, part, chapter, back-matter, ...
$domain	string	domain name of the webpage
Résultat	string

Pummel then insert HTML into our database

public kneadandInsert ( $html, string $post_type, integer $chapter_parent, string $domain )
$post_type	string
$chapter_parent	integer
$domain	string	domain name of the webpage

Cherry pick likely content areas, then cull known, unwanted content areas

protected regexSearchReplace ( string $html ) : string
$html	string
Résultat	string	$html

Parse HTML snippet, save all found tags using media_handle_sideload(), return the HTML with changed paths.

protected scrapeAndKneadImages ( DOMDocument $doc, string $domain ) : DOMDocument
$doc	DOMDocument
$domain	string	domain name of the webpage
Résultat	DOMDocument

Focus is given to CreativeCommons license information genereted by PB

protected scrapeAndKneadMeta ( DOMDocument $doc ) : array
$doc	DOMDocument
Résultat	array	$meta

public setCurrentImportOption ( array $upload ) : boolean
$upload	array
Résultat	boolean

Compliance with XHTML standards, rid cruft generated by word processors

protected tidy ( string $html ) : string
$html	string
Résultat	string	$html

protected fetchAndSaveUniqueImage ( string $url ) : string
$url	string
Résultat	string	$src