Finding Banned Words On A Page And Not Within Other Words!

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
Php Lovers!

I am NOT searching for banned words within other words on a page but searching for banned words within a loaded page.
I am not actually looking for banned words within other words but within the page (meta tags, content).

And so, if I am looking for the word "cock", then the word "cockerel" should not trigger the filter.

I just tested this code and, yes, as expected the code works but as you can guess there is a lot of cpu power cycling through. One moment the page loads, the other moment it goes grey and shows signs that the page is taking too long to load. And all this on localhost. Now, I can imagine what my webhost would do!
So now, we will have to come-up with a better solution. Any ideas ?
How-about we do not get the script to check on the loaded page for all the banned words ? How-about we get the script to halt as soon as 1 banned word is found and an echo has been made which banned word has been found and where on the page ? (meta tags, body content, etc.).
Any code suggestions ? :)

Here is what I got so far:

Code:
<?php
 
/*
ERROR HANDLING
*/
 
// 1). $curl is going to be data type curl resource.
$curl = curl_init();
 
// 2). Set cURL options.
curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
 
// 3). Run cURL (execute http request).
$result = curl_exec($curl);
$response = curl_getinfo( $curl );
 
if( $response['http_code'] == '200' )
   {
    //Set banned words.
    $banned_words = array("Prick","Dick","***");
 
    //Separate each words found on the cURL fetched page.
    $word = explode(" ", $result);
    
   //var_dump($word);
 
   for($i = 0; $i <= count($word); $i++)
      {
      foreach ($banned_words as $ban) 
         {
         if (strtolower($word[$i]) == strtolower($ban))
            {
             echo "word: $word[$i]<br />";
             echo "Match: $ban<br>";
            }
         else
            {
             echo "word: $word[$i]<br />";
             echo "No Match: $ban<br>";  
            }
         }
      }
   }  
 
// 4). Close cURL resource.
curl_close($curl);
 

Rob Whisonant

Moderator
Joined
May 24, 2016
Messages
2,490
Points
113
explode and then looping through all words on the page is slowing you down. Attack the problem using a different method.

Load the page into a string.

Use preg_match with "word boundaries" on the loaded string and loop through your banned words.

To see examples, search on preg_match whole words in Google or Bing.
 

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
sunny_pro

I did as suggested but see a complete white blank page:

Code:
<?php

/*
ERROR HANDLING
*/
declare(strict_types=1);
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
mysqli_report(MYSQLI_REPORT_ERROR | MYSQLI_REPORT_STRICT);


// 1). Set banned words.
$banned_words = array("Prick","Dick","Fuck");

// 2). $curl is going to be data type curl resource.
$curl = curl_init();

// 3). Set cURL options.
curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-
words-
you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );

// 4). Run cURL (execute http request).
$result = curl_exec($curl);
$response = curl_getinfo( $curl );

if($response['http_code'] == '200' )
     {
          $regex = '/\b'; // The beginning of the regex string syntax
          $regex .= implode('\b|\b', $banned_words); // joins all the banned words to the string with correct regex syntax
          $regex .= '\b/i'; // Adds ending to regex syntax. Final i makes it case insensitive
          $substitute = '****';
          $cleanresult = preg_replace($regex, $substitute, $result);
          echo $cleanresult;
     }

  curl_close($curl);

  ?>
 

Rob Whisonant

Moderator
Joined
May 24, 2016
Messages
2,490
Points
113
Adds some echos to make sure you are receiving or getting expected results.

1. echo $result to make sure the page loaded.
2. echo $response['http_code'] to make sure you are getting a 200
3. echo $regex after you have built it to make sure it looks correct
 

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
sunny_pro
I was having word wrapping problem in my Note Pad++. Sorted now.
This edited code is working.

Code:
<?php
/*
ERROR HANDLING
*/
// 1). Set banned words.
$banned_words = array("blow", "nut", "asshole");
// 2). $curl is going to be data type curl resource.
$curl = curl_init();
// 3). Set cURL options.
curl_setopt($curl, CURLOPT_URL, 'https://www.buzzfeed.com/mjs538/the-68-words-you-cant-say-on-tv?utm_term=.xlN0R1Go89#.pbdl8dYm3X');
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true );
// 4). Run cURL (execute http request).
$result = curl_exec($curl);
if (curl_errno($curl)) {
    echo 'Error:' . curl_error($curl);
}
$response = curl_getinfo( $curl );
if($response['http_code'] == '200' )
{
    $regex = '/\b';     
    $regex .= implode('\b|\b', $banned_words);   
    $regex .= '\b/i'; 
    $substitute = '****';
    $cleanresult = preg_replace($regex, $substitute, $result);
    echo $cleanresult;
}
curl_close($curl);
?>
Original code newbies can grab:
http://phpfiddle.org/main/code/0trx-6fng
 

sunny_pro

New member
Joined
Jun 18, 2017
Messages
86
Points
0
Rob,

Do you think the code in my previous post is ok or marvellous or bad ?
 
Older threads
Replies
30
Views
13,774
Replies
3
Views
2,029
Replies
1
Views
1,556
Replies
7
Views
2,479
Newer threads
Replies
3
Views
2,701
Replies
4
Views
2,401
Replies
30
Views
8,619
Latest threads
Replies
1
Views
94
Replies
1
Views
85
Replies
3
Views
156

Referral contests

Referral link for :

Sponsors

Popular tags

You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an alternative browser.

Top