PHP Class Html2Text\Html2Text

ファイルを表示 Open project: soundasleep/html2text Class Usage Examples

Public Methods

Method Description
convert ( string $html ) : string Tries to convert the given HTML into a plain text format - best suited for e-mail display, etc.
fixMSEncoding ( DOMDocument $doc ) : DOMDocument Microsoft exchange emails often include HTML which, when passed through html2text, results in lots of double line returns everywhere.
fixNewlines ( string $text ) : string Unify newlines; in particular, \r\n becomes \n, and then \r becomes \n. This means that all newlines (Unix, Windows, Mac) all become \ns.
isOfficeDocument ( $html ) Can we guess that this HTML is generated by Microsoft Office?
iterateOverNode ( $node )
nextChildName ( $node )
prevChildName ( $node )

Method Details

convert() static public method

In particular, it tries to maintain the following features:

  • Links are maintained, with the 'href' copied over
  • Information in the <head> is lost
static public convert ( string $html ) : string
$html string the input HTML
return string the HTML converted, as best as possible, to text

fixMSEncoding() static public method

To fix this any element with a className of msoNormal (the standard classname in any Microsoft export or outlook for a paragraph that behaves like a line return) is changed to a line with a break
afterwards. This cleaned up document can then be processed as normal through Html2Text.
static public fixMSEncoding ( DOMDocument $doc ) : DOMDocument
$doc DOMDocument the document to clean up
return DOMDocument the modified document with less unnecessary paragraphs

fixNewlines() static public method

Unify newlines; in particular, \r\n becomes \n, and then \r becomes \n. This means that all newlines (Unix, Windows, Mac) all become \ns.
static public fixNewlines ( string $text ) : string
$text string text with any number of \r, \r\n and \n combinations
return string the fixed text

isOfficeDocument() static public method

Can we guess that this HTML is generated by Microsoft Office?
static public isOfficeDocument ( $html )

iterateOverNode() static public method

static public iterateOverNode ( $node )

nextChildName() static public method

static public nextChildName ( $node )

prevChildName() static public method

static public prevChildName ( $node )