If you have been on the internet this week you are aware of the fake news crisis spiralling out of control. But just in case you missed it, recent headlines read something like this: Facebook is being blamed for Trump’s election, Google and Facebook Take Aim at Fake News Sites, Facebook’s fake news crisis deepens.
With great power comes great responsibility
Facebook has over 1 billion active users who utilize the platform to post, share and comment on news. When Facebook was accused of influencing the election, Zuckerberg was quick to say that was a “pretty crazy idea.” Is it really that crazy? Facebook has become a catalyst for the spread of fake news given the ease of it’s “share” button. Regardless, fake news isn’t going away anytime soon, it will likely worsen and while Facebook has taken steps to limit the sites’ use of their ad networks, there has been no push to eliminate fake news from the News Feed.
This daunting issue is not Facebook’s alone. Any platform that allows user generated content would be wise to get out ahead of this growing problem in order to prevent this spam and protect their brand.
It’s complicated, but not impossible
Google is not new to this fight. They have spent years attempting to minimize the spread of spam/fake links and misleading content. To combat this, Google built an algorithm that prioritizes the quality and relevance of an article.
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
- PR(A) is the PageRank of page A,
- PR(Ti) is the PageRank of pages Ti which link to page A,
- C(Ti) is the number of outbound links on page Ti and
- d is a damping factor which can be set between 0 and 1
PageRank follows this general rule of thumb: the more links there are directing to a specific page containing the search keyword, the more popular it is. As pages get more popular the weight of their votes increase proportionally and the resulting score increases.
People can still write anything (fake or not), but not just any piece of content will show up in the first few pages of a Google search result. As a result of this vetting process people still trust the quality and validity of articles on Google searches. Facebook would be wise to follow Google’s lead.
Google’s algorithm for determining quality rests on attention, if people are linking to a site or visiting that site, it is considered more worthwhile than a site with fewer inbound links and fewer visits (assuming all other things are equal). Google then uses that relative worth, or authority, to value outbound links. The more authoritative a site is that links to an article the more value Google gives that link.
Attention = value = authority.
For fake news it is much more difficult. Attention doesn’t necessarily = truth or authority. Facebook has proven that many times over. So, what does?
It is important to not only assess the quality of the shared content, but the authority of the people who share it. A authoritative user could flag an article as fake, and if a threshold was crossed the article could be marked as untrusted. The problem is: how does a user become authoritative? And, what safeguards prevent an authoritative user from misusing their power?
Authority is a problem we have had to tackle for our CleanSpeak clients that utilize comment/article/user reporting. A user may report a comment or article, not because it’s fake, spam or hateful, but because they don’t agree with it. CleanSpeak applies a model to every user’s behavior, which yields an authority value. Reports on messages or users are taken in aggregate and weighted based on those authorities.
While newly available extensions can create alerts based off of a manual list of False, Misleading, Clickbait-y, and/or Satirical “News” Sources, this might not catch everything. It’s a great start, but authoritative users should have the ability tag and flag sources, too (as the number of these fake news sites is bound to outgrow the list.)
CleanSpeak can filter many types of user-generated content (e.g., chat messages, forum posts and reviews). Running this material through CleanSpeak on a “per message” basis ensures each piece of content is acceptable before allowing it to be seen in your community. Filtering by message makes sense for these specific use cases. But what if you have big data that you want to filter as a whole?
According to Wikipedia, Batch processing is the execution of a series of jobs in a program on a computer without manual intervention (non-interactive). Strictly speaking, it is a processing mode: the execution of a series of programs each on a set or “batch” of inputs, rather than a single input (which would instead be a custom job).
So when might you consider batch processing?
Maybe you purchased a list of names & addresses and want to make sure they don’t contain any vulgar language before including them in your marketing campaign?
Perhaps you allow users to upload files and want to make sure they don’t contain inappropriate content?
Or you gather a list of reviews and want to check them all at once to ensure the language is acceptable before posting to your site?
Earlier this summer, we published a comprehensive Guide to User Data Security detailing steps to harden a server and secure applications. We provisioned a couple Linode servers and hardened them to the guides specifications to stand by our claim. We shared the IP addresses and proposed a challenge.
Hack This: https://hackthis.inversoft.com
We dared anyone to hack our database. To add incentive, we offered a fully loaded MacBook Pro as a reward.
The “build vs buy” decision is paramount when discussing a company’s software needs. Building custom software solutions can provide a host of benefits, but it often comes at a cost. An intelligent profanity filtering and moderation platform is a significant investment; building a comprehensive profanity filter could involve years of development time accruing significant costs. Consider the following factors when deciding whether to purchase an existing profanity filtering technology or build it internally:
When building a proprietary software you retain control of all aspects of product design, allowing you to create a customized solution to best fit your company needs.
- Control enhancements and development schedule
- Avoid the costs associated with software license fees – and in some cases maintenance and support fees
- Fully customize to fit your project scope and needs
It is important to remember that by building your own profanity filter you assume the risk if it fails.
The talented team at AOL implemented an internal profanity filter and was embarrassed by the now infamous Scunthorpe Problem. Years later, the Google filter and Facebook were stumped by the same issue. Learn more here. These filter issues produced scores of false-positives which required significant man-hours in moderation support to address.
When you purchase such a solution you get the benefit of a professionally developed and vetted technology with years of market use and added intelligence.
- Offloading filtering and moderation allows you to focus resources on core product features – essential to the long-term health of your business
- The software has been used and trusted by well-known brands with strict quality requirements
- Consistent product upgrades and new features
- Complete product support and software maintenance. When bugs or errors are discovered, you can rely on the vendor to troubleshoot and fix them rather than exhaust internal resources
- Quick deployment time
- You have a technology partner who is focused on helping you succeed and providing a better user experience
Building a filter requires extensive knowledge of natural language processing and language rules. A filter that fails to understand complex language produces misses and false positives can be damaging to a brand. Get peace of mind with a proven solution.
It is easy to overlook the cost and time involved in developing a new technology as complex as a profanity filter. A homegrown solution requires costly development time and ongoing maintenance, moderation and support. In contrast, a purchased solution can be integrated and running in just days; it provides continual upgrades based on market insights and advancements.
Profanity Filter Solutions
There are a range of profanity filtering solutions available on the market. Here are some of the reasons teams choose CleanSpeak.
Minimized False Positives. We have been working on our filtering technology for 9 years and continue to improve on it each year. We solve the Scunthorpe problem, handle all leet speak and automatically build inflections. In addition, we parse and filter BBCode markup language.
Superior Speed. Response times for our profanity filter average under 5 milliseconds allowing significant throughput to support requests at peak volume without hindering user experience.
Cloud or On-Premise. CleanSpeak has flexible hosting options to best suit your InfoSec needs.
Free. Test out CleanSpeak with a free 14-day trial. No credit card required.
We have a dedicated team of developers whose number one goal is to maintain the most accurate and efficient profanity filter on the market. Using CleanSpeak lets you devote more time to building your application, rather than your own filter.
More information about CleanSpeak can be found on the Inversoft website. Don’t hesitate to contact us with any questions you might have.
GhostShell leaked an estimated 36 million account details from 110 poorly configured MongoDB servers. This hack, dubbed Project Vori Dazel, marks one of the largest breaches this year.
“I am leaking more than 36 million accounts/records of internal data from [networks] to raise awareness about what happens when you decide not to even add a username and password as root or check for open ports, let alone encrypt the data.” GhostShell via Pastebin