PHP Class Graby\Extractor\HttpClient

Datei anzeigen Open project: j0k3r/graby Class Usage Examples

Public Methods

Method Description
__construct ( Client $client, array $config = [], Psr\Log\LoggerInterface $logger = null )
fetch ( string $url, boolean $skipTypeVerification = false, array $httpHeader = [] ) : array Grab informations from an url: - final url (after potential redirection) - raw content - content type header.
setLogger ( Psr\Log\LoggerInterface $logger )

Private Methods

Method Description
checkNumberRedirects ( string $url ) : boolean Check if number of redirect count isn't reach.
cleanupUrl ( string $url ) : string Cleanup URL and retrieve the final url to be called.
getMetaRefreshURL ( string $url, string $html ) : false | string Try to find the refresh url from the meta.
getReferer ( string $url, array $httpHeader = [] ) : string Find a Referer for this url.
getUglyURL ( string $url, string $html ) : false | string Some website (like Blogger) define an alternative url used by robots so that they can crawl the website which is usually full JS.
getUserAgent ( string $url, array $httpHeader = [] ) : string Find a UserAgent for this url.
headerOnlyType ( string $contentType ) : boolean Look for full mime type (e.g. image/jpeg) or just type (e.g. image) to determine if the request is a binary resource.
possibleUnsupportedType ( string $url ) : string | false Try to determine if the url is a direct link to a binary resource by checking the extension.
sendResults ( array $data ) : array Return results from fetch() and also re-init static variable for the next request.

Method Details

__construct() public method

public __construct ( Client $client, array $config = [], Psr\Log\LoggerInterface $logger = null )
$client GuzzleHttp\Client Guzzle client
$config array
$logger Psr\Log\LoggerInterface

fetch() public method

Grab informations from an url: - final url (after potential redirection) - raw content - content type header.
public fetch ( string $url, boolean $skipTypeVerification = false, array $httpHeader = [] ) : array
$url string
$skipTypeVerification boolean Avoid mime detection which means, force GET instead of potential HEAD
$httpHeader array Custom HTTP Headers from SiteConfig
return array With keys effective_url, body & headers

setLogger() public method

public setLogger ( Psr\Log\LoggerInterface $logger )
$logger Psr\Log\LoggerInterface