Last week when I was bitching about 4Comedy.com’s stealing my humor blog’s content, I had no idea I was dealing with a “feed scraper” site. All I knew was that I was being ripped off — every time I posted in this humor blog, my entire post appeared on 4Comedy.com within minutes. Needless to say, I was a very pissed off blogger.
So what did I do? First I posted comments at 4Comedy.com demanding that its thievery stop. At least I tried to. But not surprisingly, my comments never appeared there.
Next I reported its copyright violations to Google AdSense, via a link at 4comedy.com’s site. The following day I received an email telling me how to formally report a Google AdSense DMCA (Digital Millennium Copyright Act) Infringement Complaint.
Unfortunately, the procedure involved tons of time-consuming work, requiring me to assemble all sorts of documentation of 4Comedy.com’s many infringements. Then (I’m told) this documentation is forwarded to the alleged infringer. And after that, heaven-only-knows-what happens.
I responded to Google’s email with a request for some accelerated action. My justification was that in this post http://4comedy.com/?p=463, 4comedy.com had reproduced this post, in which I called it a content thief.
I didn’t receive any response to my email, but apparently it got their attention. How do I know? Because today the Google AdSense text ads disappeared from the top of 4Comedy.com’s site. Hallelujah!
But I’m getting ahead of myself. While I was waiting to hear from Google, I did some research on how to deal with RSS feed scraping content thieves. And I found some great resources, including:
1. AntiLeech, a plugin that “helps prevent content theft by sploggers” and a detailed article explaining the benefits of AntiLeech Splog Stopper: Fighting Back Against Content Thieves;
3. A tutorial, Blocking bad bots and site rippers (aka offline browsers);
4. An article entitled How you can stop dirty feedscrapers in 3 easy steps; and
5. This article about Attacking scrapers and content thieves legally.
I found the material posted at all of those links very informative, and I’m planning to give that AntiLeech plugin a try. But I was feeling a bit lazy and I was looking for some instant gratification. And, happily, I found it: A commenter named Robert posted the following suggestion here:
Alternatively, to curl up even less unproductive work, add this line to .htaccess:
Deny from 74.52.58.162
Which would even allow you to block a whole range of IP addresses in case it proves necessary…
Armed with this simple-sounding solution, I decided to try it. My first step was to identify 4Comedy.com’s IP. So I checked its trackback data, which identified its IP as 74.53.110.146. I then confirmed the IP number by pinging 4comedy.com, using my computer’s Run function: ping 4comedy.com. Next I checked my logs and verified that 4comedy.com’s IP was routinely showing up there.
Now that I had the infringer’s IP, I added this code to my .htaccess file:
Deny from 74.53.110.146
Finally, I FTPed the revised .htaccess file and, like magic, 4Comedy.com’s content thievery came to a halt: MadKane.com has been freed from the slings and arrows of 4Comedy.com’s feed scraping infringements.
Of course, that low-life feed scraper is still taking material from other sites like Comedy Central. Hey Comedy Central! Try this. You’ll like it.
UPDATE WITH ADDITIONAL RESOURCES: As I learn about additional good resources on this topic, I’ll be adding them here. Feel free to make suggestions via a comment to this post.
a: What To Do When Someone Steals Your Content is a must read.
b: Dnsstuff.com is a source of many fine tools, including a “DNS Lookup” tool — helpful in ascertaining IP addresses.
c. Plagiarism Today is an excellent source of information about plagiarism, content theft, and copyright issues online.
UPDATE 2: 4Comedy.com seems to have disappeared. I can only hope it stays that way.
UPDATE 3: 4Comedy.com is now resolving to a different domain — domainnamesbusiness.info/4comedy/index.php. But the IP is the same, so my blog should still be protected from its feed scraping.