Blog spam...it's the new (well, not so new) scourge of the internet. On my site alone I get several hundred attempts a day from spammers wanting to leave their garbage on my blog. Fortunately Community Server ships with a built-in spam blocking tool; unfortunately it's not very well documented and can be a bit vague to new users/site admins. In this Tidbit I'll attempt to clear up any ambiguity about the Spam Blocker and demonstrate how to get it configured properly so that site admins can (hopefully) rid their sites of spam.
CS uses a rules based scoring system to try and identify potential spams; when CS receives any type of content it has to go through the rules configured in the Spam Blocker (content meaning a blog post, a comment on a blog, a post to a forum, gallery, etc...anything and everything that can be posted to your site by users). Within each rule itself you can specify how to score the content by assigning points (I like to call them penalties) based on criteria that you specify in the rule. When content comes into CS it has a score of 0; as it passes through the rules engine it accrues penalties based on thresholds that you can define within each rule. The default Spam Blocker settings for marking content as spam is 5 penalties (meaning it will still get stored in your database but won't be published...you can review that content to decide if it actually is spam), and the content will be automatically deleted if it reaches 10 penalties. These are the values I will use in this post; both of these values are configurable in the Spam Blocker administration tool in Control Panel.
Let's dig a little deeper as this can seem a bit complex at first glance. The 4 rules that ship with CS are:
- Forbidden Word Rule: Rates spam factor based on the existence of configured forbidden words. By default this rule will assign 5 penalties to a post that contains any forbidden word that you define within the rule. So, if you want to flag posts that mention Windows or Linux as spam (just using a hypothetical example), you would add those words to the list, and if content comes in that mentions both of these words, it would accrue 10 penalties and be automatically deleted. If it mentions just one of the words it will accrue 5 penalties and then you can decide whether to publish the content or not.
- Bad Word Count Rule: Rates spam factor based on number of occurrences of configured bad words. By default this rule will assign 2 penalties per occurrence of a specified bad word (and CS ships with a predefined list). Again, this is configurable within the rule itself so if you wanted to allow 3 occurrences of a bad word before assigning penalties, you'd simply change the "maximum number of times" setting to 3.
- Link Count Rule: Rates spam factor based on number of links (href's) in the post. This rule has no default settings, but is the most important rule to configure IMO as it will weed out 90% of spam; most spams are simply a list of links. My experience (at least as it relates to blogging, this is probably very different when it comes to the forums realm) has been that any content with more than 3 links is probably spam, so I assign 5 penalties per link that exceeds this threshold.
- IP Count Rule: Rates spam factor based on the number of recent posts from the same IP address. This rule is a little more complicated than the other rules, but is also a key one to configure properly as most spams are automated; the spammer attempts to connect to your site with a bot and post as much content as possible within a specified timeframe (usually just a couple of minutes). This rule will assign penalties to a specific number of posts from the same IP address within a specified threshold. My settings are to assign 5 penalties to content that exceeds 5 attempts within 60 seconds. You can exclude IP addresses from this rule if you're running tests on your site or whatnot, or if you are a CS application owner and regularly make posts with lots of links you can exclude your IP so that your content won't be flagged as spam.
Generally speaking these rules should cover all of your spam blocking needs, however the key factors to consider when configuring your Spam Blocker are:
- How many penalties to assign content based on exceeded thresholds (this seems to trip up site admins more than any other factor).
- How many penalties content can accrue before being automatically moderated and/or deleted.
This will of course vary from site to site. On my site (which is blog-centric) I only have rules 3 and 4 enabled and am virtually spam free (at the very least the spam isn't published to my site; I can go into Control Panel and delete it). For a forums-centric site rules 1 and 2 are probably equally as important and should be configured properly as well, but rule 3 would need to have a higher link count threshold for obvious reasons.
Of course like the rest of CS, the Spam Blocker is extensible and you can write your own rules (which is outside the scope of this post, but perhaps I'll delve into that later) if you'd like. For example, you could write a rule that permanently bans an IP address if it exceeds certain thresholds, or as Thomas Freudenberg did you can write a rule that plugs into a 3rd party service (Akismet in this case).
The Spam Blocker is a fantastic addition to Community Server, and when configured properly works quite well. Hats off to Jose Lema (aka the Fajita Man) for doing the bulk of the work on this; well done sir.
Share this post: 
|

|

|

|

|
