Php Buddies,
Do you have any clue to why I'm experiencing $variable problems that I never ever faced before ?
The following is an attempt to build a web crawler.
The cURL fetches the page. If not fetching successful then gives error.
If successful, it checks for banned words on page (content filter) and replaces if any found.
Then it extracts all the links found on the page.
Finally, it is supposed to dump the data (keywords found, links found, links count, etc.).
I have not completely finished the script to extract the imgs and count the imgs, count the links that have sought keywords in them, count the links that don't have the sought keywords in them, internal links count, external links count. But, I defined these variables as int "0" and then tried incrementing their values on each foreach loop to simulate their counts. 100% correcting their codes would follow later.
But first things first. These variables' values are not getting incremented on each foreach loop. That's problem number 2.
These variables values have been set but get undefined variables error. That's problem number 1.
These variables are failing to get dumped and so their appropriate columns in mysql tbl showing as NULL. Even though they hold initial values of "0", then those values should have got dumped. But they don't. That's problem number 3.
Do care to check the code out. I have asked these same questions in the code comments in CAPITALS for your easy spotting.
Do you have any clue to why I'm experiencing $variable problems that I never ever faced before ?
The following is an attempt to build a web crawler.
The cURL fetches the page. If not fetching successful then gives error.
If successful, it checks for banned words on page (content filter) and replaces if any found.
Then it extracts all the links found on the page.
Finally, it is supposed to dump the data (keywords found, links found, links count, etc.).
I have not completely finished the script to extract the imgs and count the imgs, count the links that have sought keywords in them, count the links that don't have the sought keywords in them, internal links count, external links count. But, I defined these variables as int "0" and then tried incrementing their values on each foreach loop to simulate their counts. 100% correcting their codes would follow later.
But first things first. These variables' values are not getting incremented on each foreach loop. That's problem number 2.
These variables values have been set but get undefined variables error. That's problem number 1.
These variables are failing to get dumped and so their appropriate columns in mysql tbl showing as NULL. Even though they hold initial values of "0", then those values should have got dumped. But they don't. That's problem number 3.
Do care to check the code out. I have asked these same questions in the code comments in CAPITALS for your easy spotting.
PHP:
<?php
//Required PHP Files.
include 'config.php';
include 'header.php';
//1). Set Banned Words.
$banned_words = array("asshole", "nut", "bullshit");
$url = 'https://en.wikipedia.org/wiki/HTTP_403';
// 2). $curl is going to be data type curl resource.
$curl = curl_init();
// 3). Set cURL options.
curl_setopt($curl, CURLOPT_URL, "$url");
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// 4). Run cURL (execute http request).
$result = curl_exec($curl);
if (curl_errno($curl))
{
echo 'Error:' . curl_error($curl);
}
$response = curl_getinfo( $curl );
//If page is fetched then replace banned words found on page.
if($response['http_code'] == '200' )
{
$regex = '/\b';
$regex .= implode('\b|\b', $banned_words);
$regex .= '\b/i';
$substitute = 'BANNED WORD REPLACED';
$clean_result = preg_replace($regex, $substitute, $result);
//Present the banned words filtered webpage.
echo $clean_result;
}
else
{
//Show error if page fetching fails.
echo "Page fetching problem!";
echo "$response[http_code]";
exit();
}
curl_close($curl);
//PROBLEM NUMBER 1:
//I HAVE DEFINED THE FOLLOWING VARIABLES BUT GET ERROR THEY HAVE NOT BEEN DEFINED! WHY IS THAT ?
//Define Variables
$keywords_count = "0";
$links_count = "0";
$keywords_links_count = "0";
$images_count = "0";
$keywords_images_count = "0";
$keywords_internal_links_count = "0";
$keywords_external_links_count = "0";
//Link Exractor starts here. It will extract all links present on the page.
function linkExtractor($clean_result)
{
$linkArray = array();
if(preg_match_all('/<a\s+.*?href=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/a>/i', $clean_result, $link_matches, PREG_SET_ORDER)){
foreach ($link_matches as $link_match) {
//PROBLEM NUMBER 2:
//WHY THE FOLLOWING VARIABLE INT VALUES DON'T INCREMENT ?
//Echo the following variable values on each foreach loop.
echo "url: $url<br>";
echo "link_matches: $link_match<br>";
$links_count++;
echo "links_count: $links_count++<br>";
$keywords_links_count++;
echo "keywords_links_count: $keywords_links_count++<br>";
$images_count++;
echo "images_count: $images_count++<br>";
$keywords_images_count++;
echo "keywords_images_count: $keywords_images_count++<br>";
$keywords_internal_links_count++;
echo "keywords_internal_links_count: $keywords_internal_links_count++<br>";
$keywords_external_links_count++;
echo "keywords_external_links_count: $keywords_external_links_count++<br>";
}
}
return $linkArray;
}
echo '<pre>' . print_r(linkExtractor($clean_result), true) . '<pre>';
//Content Filter starts here to check for banned words present on the page.
$pieces = explode(" ", $clean_result);
$keywords_count = "0";
foreach($pieces as $keyword)
{
echo $keyword."\n";
echo "keyword: $keyword<br>";
$keywords_count++;
echo "keywords_count: $keywords_count++<br>";
print_r($pieces);
//Insert the user's inputs into Mysql database using php's sql injection prevention method "Prepared Statements".
$stmt = mysqli_prepare($conn, "INSERT INTO searchengine_index(url,keywords,keywords_count,links,links_count,keywords_links_count,images_count,keywords_images_count,keywords_internal_links_count,keywords_external_links_count) VALUES (?,?,?,?,?,?,?,?,?,?)");
//PROBLEM NUMBER 3:
//WHY ALL FOLLOWING VARIABLES (THAT COME AFTER $url and $keyword DO NOT GET DUMPED INTO THEIR APPROPRIATE COLUMNS IN MYSQL TBL ? COLUMNS SHOW AS "NULL".
mysqli_stmt_bind_param($stmt, 'ssisiiiiii', $url,$keyword,$keywords_count,$link_match[$keywords_links_count],$links_count,$keywords_links_count,$images_count,$keywords_images_count,$keywords_internal_links_count,$keywords_external_links_count);
mysqli_stmt_execute($stmt);
//Check if data was successfully submitted or not.
if (!$stmt)
{
echo "Sorry! Our system is currently experiencing a problem indexing your website. We will try some other time!";
exit();
}
}
?>