Toll Free: 1-888-423-7814

Profanity Filtering 101: Embedded Words & The Scunthorp Problem

The sixth in a series of posts about the finer points of profanity filtering…

Embedded words occur when a dictionary word or proper name contain profanity:

  1. Don’t assume profanity filters are inaccurate
  2. Harry Lipshitz has a hard time creating accounts on web sites
  3. This has been documented as the Scunthorpe problem

This case is actually quite simple to handle as a sophisticated profanity filter can look for dictionary words that contain profanity and safely ignore them, either  preemptively or during the filtering process. Poorly written filters will often get caught up on these simple cases and flag large number of dictionary words as profanity. CleanSpeak pulls from a large set of dictionary words and proper names in real time, over 140,000 in all, to correctly handle this situation and avoid a potentially large number of false positives without hindering performance.

Read The Full Entry

Profanity Filtering 101: Separators

The fifth in a series of posts about the finer points of profanity filtering…

One of the more sophisticated attacks that users employ against profanity filters involves inserting separators, such as spaces or periods, between the other characters of a word so that the word can still easily be read.

The following examples illustrate how the simple process of inserting additional non-alphabetic characters between the characters of the word does not interfere with the reader’s ability to identify the word correctly:

  1. s…….m…..u…..r……f
  2. s m u r f
  3. s….m u r….f
  4. I’m going to smash it (false positive!)

It might be difficult to see the profanity in #4, but if you look at the last 4 characters on their own, you’ll see it.

Read The Full Entry

Profanity Filtering 101: Repeated Characters

The fourth in a series of posts about the finer points of profanity filtering…

“Repeated characters” is another commonly used filter attack that involves the simple repetition of characters in a word. This straightforward tactic still fools many profanity filters, most of which are not designed to ignore multiple instances of the same character:

  • heeeeeeeeeeellllllllllllooooooooooooo

CleanSpeak is capable of detecting this type of filter attack and will correctly and automatically identify words regardless of repetition.

Profanity Filtering 101: Swapping & Collapsing

The third in a series of posts about the finer points of profanity filtering…

Character swapping and collapsing is the process of replacing characters with other alphabetic characters (or removing unnecessary characters) while still retaining the phonetic structure of the word. This tactic is often used to attack filters that do not understand phonetics:

  1. Teech me guitar
  2. Attak the main castle gate

Example #1 is a simple character swap of an “a” to an “e” that still retains the same phonetic structure of the word and allows the reader to infer the original word.

Example #2, on the other hand, is an example of character collapsing. In this example the “ck” in the word “Attack” has been collapsed to a single “k” character.

Read The Full Entry

Profanity Filtering 101: Character Replacement & Leet Speak

The second in a series of posts about the finer points of profanity filtering…

Character replacement is the process of replacing certain characters with others, usually symbols, that look the same or similar. This is a popular method, often referred to as “Leet” or “L33t” speak, used to attack traditional content and profanity filters that ignore or don’t play well with non-alphabetic characters. Some examples…

  1. $ally is my neighbor
  2. |)on’t be a menace
  3. |<nive$ can be dangerous
  4. \/\/hat are you doing?

Examples #2, #3 and #4 illustrate a user using multiple characters to replace a single character. #2 is using the “|” (pipe) character and the “)” (right-parenthesis) character to create a capital “D” character. #4 is using a combination of forward and backward slashes to create a capital “W” character. Example #3 goes a step further. It is replacing two different characters in the text. Both the “K” and “S” characters are being replaced.

Read The Full Entry

Profanity Filtering 101: The Grawlix

The first in a series of posts about the finer points of profanity filtering…

You’ve seen it all over the @#$%&! place, but probably didn’t know that this ubiquitous string of characters has a name that was coined almost 50 years ago by cartoonist Mort Walker, the creator of “Beetle Bailey” and “Hi and Lois.”

“Grawlixes” is one of a series of great words (Agitrons, Blurgits, Plewds, Farkles, Digitrons,  and many more…) that Walker invented to describe the devices cartoonists employ to convey specific types of information in their comic strips. Loudly used to express obscenities right in the middle of the family oriented funny pages, the Grawlix reigns as the grandaddy of all profanity filtering options and practically defines the category of “replacement characters,” a topic we’ll explore in more detail in an upcoming post.

Read The Full Entry

Recent Blog Posts

Upcoming Events

Contact Information

Inversoft Inc.
1425 Market Street Suite 10
Denver, CO 80202

Sales: sales@inversoft.com
Support: support@inversoft.com

Toll Free: 1-888-423-7814