PHP Class Symfony\Component\DomCrawler\Crawler

Author: Fabien Potencier ([email protected])
Inheritance: extends SplObjectStorage
Mostrar archivo Open project: symfony/dom-crawler Class Usage Examples

Protected Properties

Property Type Description
$uri The current URI

Public Methods

Method Description
__construct ( mixed $node = null, string $currentUri = null, string $baseHref = null )
add ( DOMNodeList | DOMNode | array | string | null $node ) Adds a node to the current list of nodes.
addContent ( string $content, null | string $type = null ) Adds HTML/XML content.
addDocument ( DOMDocument $dom ) Adds a \DOMDocument to the list of nodes.
addHtmlContent ( string $content, string $charset = 'UTF-8' ) Adds an HTML content to the list of nodes.
addNode ( DOMNode $node ) Adds a \DOMNode instance to the list of nodes.
addNodeList ( DOMNodeList $nodes ) Adds a \DOMNodeList to the list of nodes.
addNodes ( array $nodes ) Adds an array of \DOMNode instances to the list of nodes.
addXmlContent ( string $content, string $charset = 'UTF-8', integer $options = LIBXML_NONET ) Adds an XML content to the list of nodes.
attr ( string $attribute ) : string | null Returns the attribute value of the first node of the list.
children ( ) : Crawler Returns the children nodes of the current selection.
clear ( ) Removes all the nodes.
count ( ) : integer
each ( Closure $closure ) : array Calls an anonymous function on each node of the list.
eq ( integer $position ) : Crawler Returns a node given its position in the node list.
evaluate ( string $xpath ) : array | Crawler Evaluates an XPath expression.
extract ( array $attributes ) : array Extracts information from the list of nodes.
filter ( string $selector ) : Crawler Filters the list of nodes with a CSS selector.
filterXPath ( string $xpath ) : Crawler Filters the list of nodes with an XPath expression.
first ( ) : Crawler Returns the first node of the current selection.
form ( array $values = null, string $method = null ) : Form Returns a Form object for the first node in the list.
getBaseHref ( ) : string Returns base href.
getIterator ( ) : ArrayIterator
getNode ( integer $position ) : DOMElement | null
getUri ( ) : string Returns the current URI.
html ( ) : string Returns the first node of the list as HTML.
image ( ) : Symfony\Component\DomCrawler\Image Returns an Image object for the first node in the list.
images ( ) : Symfony\Component\DomCrawler\Image[] Returns an array of Image objects for the nodes in the list.
last ( ) : Crawler Returns the last node of the current selection.
link ( string $method = 'get' ) : Symfony\Component\DomCrawler\Link Returns a Link object for the first node in the list.
links ( ) : Symfony\Component\DomCrawler\Link[] Returns an array of Link objects for the nodes in the list.
nextAll ( ) : Crawler Returns the next siblings nodes of the current selection.
nodeName ( ) : string Returns the node name of the first node of the list.
parents ( ) : Crawler Returns the parents nodes of the current selection.
previousAll ( ) : Crawler Returns the previous sibling nodes of the current selection.
reduce ( Closure $closure ) : Crawler Reduces the list of nodes by calling an anonymous function.
registerNamespace ( string $prefix, string $namespace )
selectButton ( string $value ) : Crawler Selects a button by name or alt value for images.
selectImage ( string $value ) : Crawler Selects images by alt value.
selectLink ( string $value ) : Crawler Selects links by name or alt value for clickable images.
setDefaultNamespacePrefix ( string $prefix ) Overloads a default namespace prefix to be used with XPath and CSS expressions.
siblings ( ) : Crawler Returns the siblings nodes of the current selection.
slice ( integer $offset, integer $length = null ) : Crawler Slices the list of nodes by $offset and $length.
text ( ) : string Returns the node value of the first node of the list.
xpathLiteral ( string $s ) : string Converts string for XPath expressions.

Protected Methods

Method Description
sibling ( DOMElement $node, string $siblingDir = 'nextSibling' ) : array

Private Methods

Method Description
createDOMXPath ( DOMDocument $document, array $prefixes = [] ) : DOMXPath
createSubCrawler ( DOMElement | DOMElement[] | DOMNodeList | null $nodes ) : static Creates a crawler for some subnodes.
discoverNamespace ( DOMXPath $domxpath, string $prefix ) : string
filterRelativeXPath ( string $xpath ) : Crawler Filters the list of nodes with an XPath expression.
findNamespacePrefixes ( string $xpath ) : array
relativize ( string $xpath ) : string Make the XPath relative to the current context.

Method Details

__construct() public method

public __construct ( mixed $node = null, string $currentUri = null, string $baseHref = null )
$node mixed A Node to use as the base for the crawling
$currentUri string The current URI
$baseHref string The base href value

add() public method

This method uses the appropriate specialized add*() method based on the type of the argument.
public add ( DOMNodeList | DOMNode | array | string | null $node )
$node DOMNodeList | DOMNode | array | string | null A node

addContent() public method

If the charset is not set via the content type, it is assumed to be ISO-8859-1, which is the default charset defined by the HTTP 1.1 specification.
public addContent ( string $content, null | string $type = null )
$content string A string to parse as HTML/XML
$type null | string The content type of the string

addDocument() public method

Adds a \DOMDocument to the list of nodes.
public addDocument ( DOMDocument $dom )
$dom DOMDocument A \DOMDocument instance

addHtmlContent() public method

The libxml errors are disabled when the content is parsed. If you want to get parsing errors, be sure to enable internal errors via libxml_use_internal_errors(true) and then, get the errors via libxml_get_errors(). Be sure to clear errors with libxml_clear_errors() afterward.
public addHtmlContent ( string $content, string $charset = 'UTF-8' )
$content string The HTML content
$charset string The charset

addNode() public method

Adds a \DOMNode instance to the list of nodes.
public addNode ( DOMNode $node )
$node DOMNode A \DOMNode instance

addNodeList() public method

Adds a \DOMNodeList to the list of nodes.
public addNodeList ( DOMNodeList $nodes )
$nodes DOMNodeList A \DOMNodeList instance

addNodes() public method

Adds an array of \DOMNode instances to the list of nodes.
public addNodes ( array $nodes )
$nodes array An array of \DOMNode instances

addXmlContent() public method

The libxml errors are disabled when the content is parsed. If you want to get parsing errors, be sure to enable internal errors via libxml_use_internal_errors(true) and then, get the errors via libxml_get_errors(). Be sure to clear errors with libxml_clear_errors() afterward.
public addXmlContent ( string $content, string $charset = 'UTF-8', integer $options = LIBXML_NONET )
$content string The XML content
$charset string The charset
$options integer Bitwise OR of the libxml option constants LIBXML_PARSEHUGE is dangerous, see http://symfony.com/blog/security-release-symfony-2-0-17-released

attr() public method

Returns the attribute value of the first node of the list.
public attr ( string $attribute ) : string | null
$attribute string The attribute name
return string | null The attribute value or null if the attribute does not exist

children() public method

Returns the children nodes of the current selection.
public children ( ) : Crawler
return Crawler A Crawler instance with the children nodes

clear() public method

Removes all the nodes.
public clear ( )

count() public method

public count ( ) : integer
return integer

each() public method

The anonymous function receives the position and the node wrapped in a Crawler instance as arguments. Example: $crawler->filter('h1')->each(function ($node, $i) { return $node->text(); });
public each ( Closure $closure ) : array
$closure Closure An anonymous function
return array An array of values returned by the anonymous function

eq() public method

Returns a node given its position in the node list.
public eq ( integer $position ) : Crawler
$position integer The position
return Crawler A new instance of the Crawler with the selected node, or an empty Crawler if it does not exist

evaluate() public method

Since an XPath expression might evaluate to either a simple type or a \DOMNodeList, this method will return either an array of simple types or a new Crawler instance.
public evaluate ( string $xpath ) : array | Crawler
$xpath string An XPath expression
return array | Crawler An array of evaluation results or a new Crawler instance

extract() public method

You can extract attributes or/and the node value (_text). Example: $crawler->filter('h1 a')->extract(array('_text', 'href'));
public extract ( array $attributes ) : array
$attributes array An array of attributes
return array An array of extracted values

filter() public method

This method only works if you have installed the CssSelector Symfony Component.
public filter ( string $selector ) : Crawler
$selector string A CSS selector
return Crawler A new instance of Crawler with the filtered list of nodes

filterXPath() public method

The XPath expression is evaluated in the context of the crawler, which is considered as a fake parent of the elements inside it. This means that a child selector "div" or "./div" will match only the div elements of the current crawler, not their children.
public filterXPath ( string $xpath ) : Crawler
$xpath string An XPath expression
return Crawler A new instance of Crawler with the filtered list of nodes

first() public method

Returns the first node of the current selection.
public first ( ) : Crawler
return Crawler A Crawler instance with the first selected node

form() public method

Returns a Form object for the first node in the list.
public form ( array $values = null, string $method = null ) : Form
$values array An array of values for the form fields
$method string The method for the form
return Form A Form instance

getBaseHref() public method

Returns base href.
public getBaseHref ( ) : string
return string

getIterator() public method

getNode() public method

public getNode ( integer $position ) : DOMElement | null
$position integer
return DOMElement | null

getUri() public method

Returns the current URI.
public getUri ( ) : string
return string

html() public method

Returns the first node of the list as HTML.
public html ( ) : string
return string The node html

image() public method

Returns an Image object for the first node in the list.
public image ( ) : Symfony\Component\DomCrawler\Image
return Symfony\Component\DomCrawler\Image An Image instance

images() public method

Returns an array of Image objects for the nodes in the list.
public images ( ) : Symfony\Component\DomCrawler\Image[]
return Symfony\Component\DomCrawler\Image[] An array of Image instances

last() public method

Returns the last node of the current selection.
public last ( ) : Crawler
return Crawler A Crawler instance with the last selected node

nextAll() public method

Returns the next siblings nodes of the current selection.
public nextAll ( ) : Crawler
return Crawler A Crawler instance with the next sibling nodes

nodeName() public method

Returns the node name of the first node of the list.
public nodeName ( ) : string
return string The node name

parents() public method

Returns the parents nodes of the current selection.
public parents ( ) : Crawler
return Crawler A Crawler instance with the parents nodes of the current selection

previousAll() public method

Returns the previous sibling nodes of the current selection.
public previousAll ( ) : Crawler
return Crawler A Crawler instance with the previous sibling nodes

reduce() public method

To remove a node from the list, the anonymous function must return false.
public reduce ( Closure $closure ) : Crawler
$closure Closure An anonymous function
return Crawler A Crawler instance with the selected nodes

registerNamespace() public method

public registerNamespace ( string $prefix, string $namespace )
$prefix string
$namespace string

selectButton() public method

Selects a button by name or alt value for images.
public selectButton ( string $value ) : Crawler
$value string The button text
return Crawler A new instance of Crawler with the filtered list of nodes

selectImage() public method

Selects images by alt value.
public selectImage ( string $value ) : Crawler
$value string The image alt
return Crawler A new instance of Crawler with the filtered list of nodes

setDefaultNamespacePrefix() public method

Overloads a default namespace prefix to be used with XPath and CSS expressions.
public setDefaultNamespacePrefix ( string $prefix )
$prefix string

sibling() protected method

protected sibling ( DOMElement $node, string $siblingDir = 'nextSibling' ) : array
$node DOMElement
$siblingDir string
return array

siblings() public method

Returns the siblings nodes of the current selection.
public siblings ( ) : Crawler
return Crawler A Crawler instance with the sibling nodes

slice() public method

Slices the list of nodes by $offset and $length.
public slice ( integer $offset, integer $length = null ) : Crawler
$offset integer
$length integer
return Crawler A Crawler instance with the sliced nodes

text() public method

Returns the node value of the first node of the list.
public text ( ) : string
return string The node value

xpathLiteral() public static method

Escaped characters are: quotes (") and apostrophe ('). Examples: echo Crawler::xpathLiteral('foo " bar'); prints 'foo " bar' echo Crawler::xpathLiteral("foo ' bar"); prints "foo ' bar" echo Crawler::xpathLiteral('a\'b"c'); prints concat('a', "'", 'b"c')
public static xpathLiteral ( string $s ) : string
$s string String to be escaped
return string Converted string

Property Details

$uri protected_oe property

The current URI
protected $uri