Php check url available

how to check if a https site exists in php

But as an alternative (if you have Curl enabled) you can use the following function:

If you just need the HTTP status code you can modify the function like this:

If you don’t have Curl you could try the following function:

 else < $port = isset($url_info['port']) ? $url_info['port'] : 80; @$fp=fsockopen($url_info['host'], $port, $errno, $errstr, 10); >if($fp) < stream_set_timeout($fp, 10); $head = "HEAD ".@$url_info['path']."?".@$url_info['query']; $head .= " HTTP/1.0\r\nHost: ".@$url_info['host']."\r\n\r\n"; fputs($fp, $head); while(!feof($fp)) < if($header=trim(fgets($fp, 1024))) < $sc_pos = strpos( $header, ':' ); if( $sc_pos === false ) < $headers['status'] = $header; >else < $label = substr( $header, 0, $sc_pos ); $value = substr( $header, $sc_pos+1 ); $headers[strtolower($label)] = trim($value); >> > return $headers; > else < return false; >> ?> 

Note that for HTTPS support you should have SSL support enabled. (uncomment extension=php_openssl.dll in php.ini).

If you can’t edit your php.ini and don’t have SSL support it will be difficult to get the (encrypted) headers.

You can check your wrappers (openssl and httpd) with:

$w = stream_get_wrappers(); echo 'openssl: ', extension_loaded ('openssl') ? 'yes':'no', "
\n"; echo 'http wrapper: ', in_array('http', $w) ? 'yes':'no', "
\n"; echo 'https wrapper: ', in_array('https', $w) ? 'yes':'no', "
\n"; echo 'wrappers:
', var_dump($w), "
";

You can check this question on SO for a similar problem.

Источник

Checking for existence of URL in PHP

If I display a URL link in PHP, is there any way to check for validity after the user click so as to display the user a nice custom message that the URL is broken or something like that? I do not mean an error 404 page. I guess error 404 is only for internal website pages but not external links. Please correct me if I am wrong.

4 Answers 4

My suggestion: Write a PHP batch job that regularly checks URLs (with curl or fsockopen) and marks them in your data. This way, you know that the URL is broken before you display it to the user.

Clicking a link causes the user's browser to request the resource from the server it is hosted on. Your server is not involved.

You could write some JavaScript that cancels the normal behavior of the link, uses Ajax to make a request to your server, have PHP on your server make a request to the third party site to check the response, respond to the Ajax request, and then set location to either the original URL or one for your error message … but that would be a significant slow down in performance for the user.

If you are worried about links you provide being broken, periodically check them. You could automate this (e.g. with checklink)

You should use fsockopen to find the response code of the link first then display it if it is less than 400. See http://www.scriptol.com/how-to/http-status-code-in-php.php . Perhaps the best idea would be to display a warning next to broken links (easily doable via CSS or javascript).

Rolling your own HTTP stack by hand? Wouldn't it be simlper to just use the cURL functions to fetch the header?

@symcbean It'd be even easier to use the the built-in get_headers() function (docs.php.net/get_headers)

404 is for any resource not found. If a link points to one, it is the task of the server where the resource does not exist to give you this message. Quoting the RFC for HTTP Status Codes:

10.4.5 404 Not Found

The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address. This status code is commonly used when the server does not wish to reveal exactly why the request has been refused, or when no other response is applicable.

In addition, clicking a link in the browser takes place long after PHP is done with the Page. PHP is server side, while user interaction happens completely isolated from this on the user's computer. So, there is no way to capture the click with PHP.

If you would want to capture clicks (technically it isn't), you'd have to route all external links to your own webserver that could prefetch the link via get_headers() to see if the resource exists. If not, you could present your own custom page to the user. But keep in mind that this is actually doing two HTTP requests then. First from user to your server, then from your server to the external page and your external links would probably look like this:

http://www.yourserver.com/external.php?url=example.com/someuri 

Источник

php url check available

I want to check whether url is available from my database. I choose fopen , but I test 30 rows from my database, it will cost nearly 20 seconds. Is there any way, can make it more efficient? Thanks.

'; >else< echo $row['url']. ' no
'; > > $end_t = microtime(true); $totaltime = $end_t-$start_t; echo "
".$totaltime." s"; ?>

If you want to check if the DNS is okay or the server is online you can use the snippet from Rakesh. For checking the availability of the content you can use curl (see karim79's answer) or get_headers() (see yes123's answer).

4 Answers 4

Try using fsockopen which is faster than fopen

@Rakesh, suprised speed. but neither http://www.google.com nor http://google.com they all return failure ?

@Yuli like @yes123 my mistake. Generally you should be using just the hostname, not the URL in the fsockopen call. You'll need to provide the uri, minus the host/port in the actual HTTP headers.

You can try using CURL with the CURLOPT_NOBODY option set, which uses the HTTP HEAD method and avoids downloading the entire page:

$ch = curl_init($row['url']); curl_setopt($ch, CURLOPT_NOBODY, true); curl_exec($ch); $retcode = curl_getinfo($ch, CURLINFO_HTTP_CODE); // 400 means not found, 200 means found. curl_close($ch); 

TRUE to exclude the body from the output. Request method is then set to HEAD. Changing this to FALSE does not change it to GET.

Try Bulk URL check, that is, in blocks of 10 or 20

Use the CURL options for NOBODY and HEADER ONLY, so your response will be much faster.

Also dont forget to put TIMEOUT for curl, else one BAD url may take too much time.

i was doing 50 URL checks in 20 secs.

You can't speed things up like that.

With 30 rows I assume you are connecting to 30 different urls. 20 seconds is already a good time for that.

Also I suggest you to use file_get_contents to retrive HTML
Or if you need to know the header response use get_headers();

If you want to speed up the process just spawn more process. Each of them will fetch a tot urls.

Addendum

Also don't forget about the great Zend_HTTP_Client(); that is very good for such task

Источник

is URL valid or not [duplicate]

I'm looking for a function that returns TRUE or FALSE in php, either the URL is valid or not. isValidURL($url); I think that simple. That would take into count all kind of URL possible. By valid I want it to refer to an existing page of the web or other kind of files. It just should exist.

hey guys thanks a lot for your help. By valid i want It to referes to an existing page of the web or other kind of files. It just should exist.

2 Answers 2

this does not check tlds though

this is good one, but your own example is validates to true 😉 http://stackoverflow.invalid . So it's not reliable too 😉

@Nemoden: What's wrong with the URL stackoverflow.invalid? You can setup your own local DNS to handle all sorts of TLD's. I have *.lan pointing to several local development sites behind a firewall. 😉

stackoverflow is also a valid url, extentions are not required for valid urls, localhost is a valid url too

You can check whether URL is valid or not using parse_url function which would return false if URL is not valid and an array otherwise.

Then preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:7+)?(/.*)?$|i', $url); isn't reliable too. But I have my own reason to don't care about domain zone: they will register more domain zones tommorow and I will have to add all of them in all my projects manually? To check if URL is REALLY exists I will use cURL or sockets. For quick check this functions is enough for me. I just propose it to the OP and I don't mind if he will not mark my answer as accepted. I just offer an approach which I use myself. And it's more clean rather than regex that's why I like it 😉

not reliable? schema is good, you should check if it exists the name with gethostbyname: function isValidURL($url) < $parts = parse_url($url); if((bool)$parts) < return ($parts['host'] != gethostbyname($parts['host']);>else return false; >

Источник

The end goal is a system that is capable of flagging urls as potentially broken so that an administrator can review them.

The script will be written in PHP and will most likely run on a daily basis via cron.

The script will be processing approximately 1000 urls at a go.

  • Are there any bigtime gotchas with an operation like this, what issues have you run into?
  • What is the best method for checking the status of a url in PHP considering both accuracy and performance?

200 is not the only good code. A code of 3xx means redirection, and in many cases, the page you want is where you are brought (but it's not guaranteed). 401 isn't necessarily "bad" either, but it's not a 200.

You should be careful not to hammer the same website continuously or the owner might get upset. Maybe sort the list, and for multiple URLs from the same site institute some type of delay before the next request (or go on to another site and come back to that one later).

9 Answers 9

Use the PHP cURL extension. Unlike fopen() it can also make HTTP HEAD requests which are sufficient to check the availability of a URL and save you a ton of bandwith as you don't have to download the entire body of the page to check.

As a starting point you could use some function like this:

function is_available($url, $timeout = 30) < $ch = curl_init(); // get cURL handle // set cURL options $opts = array(CURLOPT_RETURNTRANSFER =>true, // do not output to browser CURLOPT_URL => $url, // set URL CURLOPT_NOBODY => true, // do a HEAD request only CURLOPT_TIMEOUT => $timeout); // set timeout curl_setopt_array($ch, $opts); curl_exec($ch); // do it! $retval = curl_getinfo($ch, CURLINFO_HTTP_CODE) == 200; // check if HTTP OK curl_close($ch); // close handle return $retval; > 

However, there's a ton of possible optimizations: You might want to re-use the cURL instance and, if checking more than one URL per host, even re-use the connection.

Oh, and this code does check strictly for HTTP response code 200. It does not follow redirects (302) -- but there also is a cURL-option for that.

Источник

Читайте также:  Доска объявлений
Оцените статью