PHP Class PressBooks\Modules\Import\Html\Xhtml

Inheritance: extends PressBooks\Modules\Import\Import
Show file Open project: pressbooks/pressbooks

Public Methods

Method Description
import ( array $current_import ) : boolean
kneadHtml ( string $html, string $type, string $domain ) : string Pummel the HTML into WordPress compatible dough.
kneadandInsert ( $html, string $post_type, integer $chapter_parent, string $domain ) Pummel then insert HTML into our database
setCurrentImportOption ( array $upload ) : boolean

Protected Methods

Method Description
extractCCLicense ( string $url ) : string Expects a URL string with Creative Commons domain similar in form to: http://creativecommons.org/licenses/by-sa/4.0/
fetchAndSaveUniqueImage ( string $url ) : string Extract url and load into WP using media_handle_sideload() Will return an empty string if something went wrong.
getAuthors ( string $html ) : array Looks for meta data in the section of an HTML document.
getLicenseAttribution ( string $html ) : array Looks for div class created by the license module in PB, returns author and license information.
regexSearchReplace ( string $html ) : string Cherry pick likely content areas, then cull known, unwanted content areas
scrapeAndKneadImages ( DOMDocument $doc, string $domain ) : DOMDocument Parse HTML snippet, save all found tags using media_handle_sideload(), return the HTML with changed paths.
scrapeAndKneadMeta ( DOMDocument $doc ) : array Extracts section/book author and section/book license if they exist.
tidy ( string $html ) : string Compliance with XHTML standards, rid cruft generated by word processors

Method Details

extractCCLicense() protected method

Expects a URL string with Creative Commons domain similar in form to: http://creativecommons.org/licenses/by-sa/4.0/
protected extractCCLicense ( string $url ) : string
$url string
return string license meta value

fetchAndSaveUniqueImage() protected method

Extract url and load into WP using media_handle_sideload() Will return an empty string if something went wrong.
See also: media_handle_sideload
protected fetchAndSaveUniqueImage ( string $url ) : string
$url string
return string $src

getAuthors() protected method

Priority is given to PB generated meta data.
protected getAuthors ( string $html ) : array
$html string
return array $authors

getLicenseAttribution() protected method

Looks for div class created by the license module in PB, returns author and license information.
protected getLicenseAttribution ( string $html ) : array
$html string
return array $meta

import() public method

public import ( array $current_import ) : boolean
$current_import array
return boolean

kneadHtml() public method

Pummel the HTML into WordPress compatible dough.
public kneadHtml ( string $html, string $type, string $domain ) : string
$html string
$type string front-matter, part, chapter, back-matter, ...
$domain string domain name of the webpage
return string

kneadandInsert() public method

Pummel then insert HTML into our database
public kneadandInsert ( $html, string $post_type, integer $chapter_parent, string $domain )
$post_type string
$chapter_parent integer
$domain string domain name of the webpage

regexSearchReplace() protected method

Cherry pick likely content areas, then cull known, unwanted content areas
protected regexSearchReplace ( string $html ) : string
$html string
return string $html

scrapeAndKneadImages() protected method

Parse HTML snippet, save all found tags using media_handle_sideload(), return the HTML with changed paths.
protected scrapeAndKneadImages ( DOMDocument $doc, string $domain ) : DOMDocument
$doc DOMDocument
$domain string domain name of the webpage
return DOMDocument

scrapeAndKneadMeta() protected method

Focus is given to CreativeCommons license information genereted by PB
protected scrapeAndKneadMeta ( DOMDocument $doc ) : array
$doc DOMDocument
return array $meta

setCurrentImportOption() public method

public setCurrentImportOption ( array $upload ) : boolean
$upload array
return boolean

tidy() protected method

Compliance with XHTML standards, rid cruft generated by word processors
protected tidy ( string $html ) : string
$html string
return string $html