If you have been on the internet this week you are aware of the fake news crisis spiralling out of control. But just in case you missed it, recent headlines read something like this: Facebook is being blamed for Trump’s election, Google and Facebook Take Aim at Fake News Sites, Facebook’s fake news crisis deepens.
With great power comes great responsibility
Facebook has over 1 billion active users who utilize the platform to post, share and comment on news. When Facebook was accused of influencing the election, Zuckerberg was quick to say that was a “pretty crazy idea.” Is it really that crazy? Facebook has become a catalyst for the spread of fake news given the ease of it’s “share” button. Regardless, fake news isn’t going away anytime soon, it will likely worsen and while Facebook has taken steps to limit the sites’ use of their ad networks, there has been no push to eliminate fake news from the News Feed.
This daunting issue is not Facebook’s alone. Any platform that allows user generated content would be wise to get out ahead of this growing problem in order to prevent this spam and protect their brand.
It’s complicated, but not impossible
Google is not new to this fight. They have spent years attempting to minimize the spread of spam/fake links and misleading content. To combat this, Google built an algorithm that prioritizes the quality and relevance of an article.
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
- PR(A) is the PageRank of page A,
- PR(Ti) is the PageRank of pages Ti which link to page A,
- C(Ti) is the number of outbound links on page Ti and
- d is a damping factor which can be set between 0 and 1
PageRank follows this general rule of thumb: the more links there are directing to a specific page containing the search keyword, the more popular it is. As pages get more popular the weight of their votes increase proportionally and the resulting score increases.
People can still write anything (fake or not), but not just any piece of content will show up in the first few pages of a Google search result. As a result of this vetting process people still trust the quality and validity of articles on Google searches. Facebook would be wise to follow Google’s lead.
Google’s algorithm for determining quality rests on attention, if people are linking to a site or visiting that site, it is considered more worthwhile than a site with fewer inbound links and fewer visits (assuming all other things are equal). Google then uses that relative worth, or authority, to value outbound links. The more authoritative a site is that links to an article the more value Google gives that link.
Attention = value = authority.
For fake news it is much more difficult. Attention doesn’t necessarily = truth or authority. Facebook has proven that many times over. So, what does?
It is important to not only assess the quality of the shared content, but the authority of the people who share it. A authoritative user could flag an article as fake, and if a threshold was crossed the article could be marked as untrusted. The problem is: how does a user become authoritative? And, what safeguards prevent an authoritative user from misusing their power?
Authority is a problem we have had to tackle for our CleanSpeak clients that utilize comment/article/user reporting. A user may report a comment or article, not because it’s fake, spam or hateful, but because they don’t agree with it. CleanSpeak applies a model to every user’s behavior, which yields an authority value. Reports on messages or users are taken in aggregate and weighted based on those authorities.
While newly available extensions can create alerts based off of a manual list of False, Misleading, Clickbait-y, and/or Satirical “News” Sources, this might not catch everything. It’s a great start, but authoritative users should have the ability tag and flag sources, too (as the number of these fake news sites is bound to outgrow the list.)
The App Store is a developer’s best friend, until your app is rejected. (Are you suffering from App Store Rejection? You aren’t alone – watch this humorous video.)
App Store Guidelines
“We will reject Apps for any content or behavior that we believe is over the line. What line, you ask? Well, as a Supreme Court Justice once said, “I’ll know it when I see it”. And we think that you will also know it when you cross it.”
(App Store Review Guidelines)
At Inversoft, we like open source and we like Java.
When we built out our platform to support our new cloud product offerings we started using Chef to help us manage our deployment strategy.
When we began working on some new backend features for our cloud product offerings, I set out to find a Chef Client written in Java in order to simplify our integration.
As luck wouldn’t have it (yes you read that correctly), I was unable to find a Java library that really made my life easier. There are other Chef libraries out there, but all of them were very lightweight wrappers around HTTP calls. Some went so far as to return the JSON response from the Chef server as a String rather than right POJO.
Rather than limping along with a library that was essentially a glorified URLConnection, I did what any software engineer would do, I wrote it myself.
Behold Barista! A native binding for Chef that provides rich domain objects and REST bindings to work with a Chef server.
Building a properly authenticated HTTP request to Chef is not great fun, so I don’t suggest you do it yourself unless you enjoy the pain. We’ve done the heavy lifting for you and we did this without using any third party encryption libraries. This means you can pick up this library without dragging along any unnecessary dependencies like Bouncycastle.
CleanSpeak can filter many types of user-generated content (e.g., chat messages, forum posts and reviews). Running this material through CleanSpeak on a “per message” basis ensures each piece of content is acceptable before allowing it to be seen in your community. Filtering by message makes sense for these specific use cases. But what if you have big data that you want to filter as a whole?
According to Wikipedia, Batch processing is the execution of a series of jobs in a program on a computer without manual intervention (non-interactive). Strictly speaking, it is a processing mode: the execution of a series of programs each on a set or “batch” of inputs, rather than a single input (which would instead be a custom job).
So when might you consider batch processing?
Maybe you purchased a list of names & addresses and want to make sure they don’t contain any vulgar language before including them in your marketing campaign?
Perhaps you allow users to upload files and want to make sure they don’t contain inappropriate content?
Or you gather a list of reviews and want to check them all at once to ensure the language is acceptable before posting to your site?
Earlier this summer, we published a comprehensive Guide to User Data Security detailing steps to harden a server and secure applications. We provisioned a couple Linode servers and hardened them to the guides specifications to stand by our claim. We shared the IP addresses and proposed a challenge.
Hack This: https://hackthis.inversoft.com
We dared anyone to hack our database. To add incentive, we offered a fully loaded MacBook Pro as a reward.