Development

Updated on

How to create your own profanity and spam blocker

Alejandro Schmeichler

In this tutorial you will learn how to create your own profanity and spam blocker for the various forms in JReviews that allow user generated content submissions. We are going to take advantage of Developer Filters introduced in JReviews 3, which allow you to extend the functionality of your directory and review site.

This is a pretty straightforward tutorial, so lets get started. You can also find the code used in this tutorial on Github.

Requirements to create your own profanity and spam blocker

To be able to complete this tutorial you need:

  • JReviews 3
  • Server file access to your server to create the necessary folders and files.

How does the profanity and spam blocker work?

Before we jump into the implementation, I want to give you a general overview of how it works so it's easier for you to follow the steps below. To be able to implement this feature, it's necessary to intercept the submitted form data to analyze the submitted text and decide whether to let the form submission process continue, or stop it and add a validation message. To do this we need to use the available form validation filters in JReviews. From the documentation you will find that these are the available filters:

  • listing_submit_validation
  • inquiry_submit_validation
  • review_submit_validation
  • discussion_submit_validation
  • owner_reply_submit_validation
  • report_submit_validation
  • claim_submit_validation
  • resources_submit_validation

We can use all of the above filters to create the profanity and spam blocker for the corresponding forms, and the nice thing about filters is that we can hook up all of them to the same code which greatly simplifies development and maintenance. Lets get started.

Steps to create your own profanity and spam blocker

  • Create the JReviews 3 developer filter file
  • Add spam and profanity lists

Create the JReviews 3 developer filter file

Filters will live in the filter_functions.php file in the JReviews overrides folder, but in order to better organize the code, we will also create a new dedicated file for the antispam filter that is called within filter_functions.php. So in this step we will create 2 files.

In Joomla create the following files:

/templates/jreviews_overrides/filters/filter_functions.php

/templates/jreviews_overrides/filters/antispam_filter.php

In WordPress create the following files:

/jreviews_overrides/filters/filter_functions.php

/jreviews_overrides/filters/antispam_filter.php

 In the filter_functions.php file you will add the code shown below and save the file. If you are already using filters for other functionality, then you can use the same file, and just add the require_once line to it so it loads the antispam_filter.php code.

<?php
defined('MVC_FRAMEWORK') or die;

require_once 'antispam_filter.php';

In the antispam_filter.php file copy the code below and save the file.

<?php
defined('MVC_FRAMEWORK') or die;

function antispam_filter($validation, $params)
{
  $files = glob(__DIR__.DS.'spamlists/*.conf');

  $list = '';
  
  foreach ($files as $file)
  {
    $list .= file_get_contents($file)."\n";
  }

  $list = preg_split("/((\r?\n)|(\r\n?))/", $list, NULL, PREG_SPLIT_NO_EMPTY);  

  foreach ( ['__raw','valid_fields','ListingType','CriteriaRating','controller','action'] as $key )
  {
    unset($params['data'][$key]);
  }

  $text = strip_tags(implode(' ',S2Array::flatten($params['data'])));

  $badWords = [];

  $context = [];
    
  foreach ($list as $regex) 
  {
      $regex = preg_replace('/(^\s+|\s+$|\s*#.*$)/i', "", $regex);

      if (empty($regex)) continue;

      $match = preg_match('/(?:[^ ]+|(?:[^ ]+ )){0,1}('.$regex.')(?: [^ ]+){0,1}/i',$text,$matches);

      // Blocked word found
      if ($match)
      {
        $badWords[$matches[1]] = $matches[1];
        $context[$matches[0]] = $matches[0];
      }

  }

  if ( !empty($badWords) ) 
  {
    // $validation[] = 'Your message doesn\'t comply with our submission guidelines';
  
    $validation[] = 'Your message doesn\'t comply with our submission guidelines. The following words are not allowed and need to be removed before your submission is accepted: <strong>'.implode(', ',$badWords).'</strong>';

  }

  return $validation;
}

Clickfwd\Hook\Filter::add('listing_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('inquiry_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('review_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('discussion_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('owner_reply_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('report_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('claim_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('resources_submit_validation', 'antispam_filter', 10);

The code is quite simple. At the bottom of the file we define the filters we want to use. These are all the form validation filters, and we provide the same callback function called antispam_filter for all of them. The first parameter of the antispam filter function is the $validation array and we can modify this on the fly to add validation messages that will prevent the form from being successfully submitted. The second parameter $params, contains the form data we can use to analyze and look for profanity and spam to make a decision.

At the beginning of the function we load any files containing lists of words we want to block. I will talk more about this in a moment. Once the words are loaded, the code proceeds to go over each one to check whether it exists in the submitted data and if it is, then it adds it to the $badWords array.

Finally, after going through the whole list, the code checks to see if the $badWords array has any data and if it does it adds the validation message. I've added 2 examples of output with the first one commented and you can modify these to meet your requirements.

Add spam and profanity lists

Before you can start using your brand new profanity and spam blocker, it's necessary to load it up with some words and strings you wish to block.

In Joomla create the following file:

/templates/jreviews_overrides/filters/spamlists/badwords.conf

In WordPress create the following file:

/jreviews_overrides/filters/spamlists/badwords.conf

Inside the badwords.conf file you will add a list of words and strings, one per line, that you wish to block. You can include comments and empty lines. Comments will begin with the pound sign #. For example:

# Profanity

word1
word2
A profanity phrase

# Spam

spam1
spam2
A spam phrase

If you are going to have long lists, you can create multiple .conf files inside the spamlists folder and these will be automatically loaded for you. You can search for existing definitions of profanity and spam to add to your lists. In the following link you can also find some examples of spam definitions you can use for your list(s).


Taking it a step further with logging

Once you have this working, you may still wonder whether the solution may be blocking "clean" submissions. To try to alleviate that concern, you can add logging to the filter so it keeps a record of all blocked words. To do that we need a new antispam_filter_log function that we will call inside the filter. The updated filter code looks like this:

 <?php
defined('MVC_FRAMEWORK') or die;

function antispam_filter($validation, $params)
{
  $files = glob(__DIR__.DS.'spamlists/*.conf');

  $list = '';
  
  foreach ($files as $file)
  {
    $list .= file_get_contents($file)."\n";
  }

  $list = preg_split("/((\r?\n)|(\r\n?))/", $list, NULL, PREG_SPLIT_NO_EMPTY);  

  $text = strip_tags(implode(' ',S2Array::flatten($params['data'])));

  $badWords = [];

  $context = [];
    
  foreach ($list as $regex) 
  {
      $regex = preg_replace('/(^\s+|\s+$|\s*#.*$)/i', "", $regex);

      if (empty($regex)) continue;

      $match = preg_match('/(?:[^ ]+|(?:[^ ]+ )){0,1}('.$regex.')(?: [^ ]+){0,1}/i',$text,$matches);

      // Blocked word found
      if ($match)
      {
        $badWords[$matches[1]] = $matches[1];
        $context[$matches[0]] = $matches[0];
      }

  }

  if ( !empty($badWords) ) 
  {
    // $validation[] = 'Your message doesn\'t comply with our submission guidelines';
  
    $validation[] = 'Your message doesn\'t comply with our submission guidelines. The following words are not allowed and need to be removed before your submission is accepted: <strong>'.implode(', ',$badWords).'</strong>';

    antispam_filter_log($context);
  }

  return $validation;
}

function antispam_filter_log($words)  
{  
    $datetime = '['.date('D Y-m-d h:i:s A').'] [IP '.$_SERVER['REMOTE_ADDR'].'] ';  
   
    $entry = $datetime;

    foreach ( $words as $word )
    {
      $entry .= $word."\r\n";
    }
    
    $filename = 'spamlog_'.date('Ymd').'.txt';  

    $filepath = __DIR__; // Current directory
    
    $handle = fopen($filepath.DS.$filename,'a+');  
    
    fwrite($handle,$entry);  
    
    fclose($handle);  
}  
 
Clickfwd\Hook\Filter::add('listing_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('inquiry_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('review_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('discussion_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('owner_reply_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('report_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('claim_submit_validation', 'antispam_filter', 10);
Clickfwd\Hook\Filter::add('resources_submit_validation', 'antispam_filter', 10);

IMPORTANT! Notice the $filepath variable inside the antispam_filter_log function. This is the location where the log files will be stored. By default it uses the same public directory where the filter_function.php file resides, but I recommend that you change this to specify a path below your site's public folder. For example, if you use DigitalOcean (affiliate link) as your host, you can change it to:

$filepath = '/srv/users/serverpilot/apps/joomla';

Where the app name is "joomla" and the public folder is '/srv/users/serverpilot/apps/joomla/public'. On other hosts it would be any path below the 'public_html' folder.

I hope you find this how to create your own profanity and spam blocker tutorial useful! Please let me know in the comments below.