Welcome Guest: Sign InRegister

Inversoft Profanity Filter Documentation

Getting started using the Inversoft Profanity Filter

This guide will help you get started using the Inversoft Profanity Filter. This covers the basics of using the regular expression filter and the Inversoft Profanity Database. For more advanced topics, consult the JavaDoc for the filter.

Installation

To install the Inversoft Profanity Filter, simply add the appropriate JAR file to your applications classpath. If you are building a JEE web application, you can drop the JAR file into the web applications WEB-INF/lib directory. Other applications might require different handling to add the JAR file to the classpath.

There are two JAR files that you might use, the JDK 1.4 version and the JDK 5.0+ version. If you are using JDK 1.4 you will need to include the JDK 1.4 JAR file. In all other cases, we recommend using the JDK 5.0+ JAR file.

If you are using version 2.0.3 or later, you will need your license file. For customers who have purchased a license for the Inversoft Profanity Filter, you can regenerated your license file at anytime by visiting the Account section of the website, selecting the correct purchase, and hitting the Regenerate License File link. If you are evaluating the Inversoft Profanity Filter and have lost your license file, you can generate a new file by downloading the evaluation version again.

Once you have your license file, place this file in the root of the application's classpath. For example, if you are building a JEE web application, you can place your license file in the WEB-INF/classes directory. If you are building a different type of application, you can place the license file in a directory on the file system and then include that directory in the classpath like this:

  Assume you have placed the license file in the directory $HOME/licenses

  $ java -cp $HOME/licenses:<application-classpath> ...

On Windows this might look something like this:

  Assume you have placed the license file in the directory %HOMEPATH%\licenses

  c:\> java -cp %HOMEPATH%\licenses;<application-classpath> ...

Once you have installed the JAR file and the license file, you can start using the filter in your application.

The ProfanityFilter interface

The main interface to the filter is com.inversoft.profanity.ProfanityFilter. This interface contains all the methods to check for, locate and replace profanity wihtin Strings. This guide will walk through how to locate all of the profanity in a String, which is the most common usage of the filter.

The findProfanity method has two variations. The first version returns an array of ProfanityResult objects, one for each profanity found. The second takes a ProfanityListener, which is called when the filter finds profanity within the String. This second method does not return a value since the listener is called for each profanity and can store the results as they are encountered. To keep things simple, we will look at the first version of this method. The JavaDoc contains information about all the other methods on this interface.

The signature of this method looks like this:

  ProfanityResult[] findProfanity(String str, int tolerance, String... types);

This method can be called from JDK 1.4 by passing an array of Strings or null as the final parameter.

The str parameter is the String to be searched. The tolerance parameter defines how lenient the filter should be with respect to profanity. The types parameter defines the types of profanity to search for. In order to suppor these parameters, The Inversoft Profanity Database has two additional attributes for each definition it contains that are used by the filters. These additional attributes and the parameters passed to the filter method can also increase the speed of filtering by reducing the work of the filter. These attributes are:

  • Rating
  • Type

The rating defines on a scale from 1 to 10 the severity of the word where 10 is the most severe and offensive and 1 is the least. The type defines the category of the word such as Slang, Swear, Drug, etc. The tolerance parameter determines the lowest rating of words to use from the database. For example, if 5 is passed as the tolerance to the filter, the filter will only search for words whose rating is 5-10. Likewise, the types parameter defines which categories of words to search for. Passing in new String[]{"Swear"} will only search for words whose type is Swear. By reducing the total set of words to search for, the filtering time is reduced.

The result of this method is an array of ProfanityResult objects. These Objects contain information about a match found in the String. The offset, length and profanity matched are all contained within the ProfanityResult. This array is sorted by the offset (start position) of the match within the String.

The ProfanityFilterFactory class

The simplest way to get up and running quickly is to use the com.inversoft.profanity.ProfanityFilterFactory class. This allows you to create a ProfanityFilter quickly. One thing to note is that creating ProfanityFilter instances is expensive and should be done sparingly. Here is an example of using the factory to create a ProfanityFilter:

  ProfanityFilter filter = ProfanityFilterFactory.newRegexFilter(5, "profanity-database-2.0.xml");

The first parameter to the method controls the tolerance level for the filter. Setting this value to 5 means that the filter instance returned will only filter profanity with a rating of 5 or higher, regardless of any other setting or parameters. Setting this value to 0 means that the filter will filter all profanity. However, this also means that the entire database will be loaded and cached in memory, which could be too large the application. You will have to experiment to determine what setting makes the most sense for your application.

The second parameter is the name of the database. This parameter can be a file reference, a URL or a classpath entry.

Calling the filter

Once the filter has been instantiated it is a simple matter of calling the findProfanity method. Here is an example of calling the method:

  // Example call using JDK 5.0
  String str = getStringFromSomewhere();
  ProfanityResult[] results = filter.findProfanity(str, 4, "Swear", "Slang");

  // using JDK 1.4 this would become
  // ProfanityResult[] results = filter.findProfanity(str, 4, new String[]{"Swear", "Slang"});

  if (results.length > 0) {
     for (int i = 0; i < results.length; i++) {
         System.out.println("Found profanity at " + results[i].getOffset());
     }
  }