As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label blocklist. Show all posts

When I was browsing on commandlinefu.com, I saw this entry Block the 6700 worst spamhosts: (URL edited for plain text file)
wget -q -O - http://someonewhocares.org/hosts/hosts | grep ^127 >> /etc/hosts

As of writing (2012-03-24T08:01:18Z), the list, made by Dan Pollock, has grown to 9,502 domains. That is insane! See how many spam websites we have, although not all are spams, some of the entries are legitimate advertising distributors.

To be honest, I was really tempted to use it, but the huge amount of entries did hold me back completely.

If you want to try it, I can propose you a short script as system cron task. I didn't test and I am writing in on the fly, so use it as your own risk:
cd /etc
# just in case, you haven't saved current hosts as hosts.local
[[ ! -f hosts.local ]] && exit 1
if [[ "$(curl http://someonewhocares.org/hosts/hosts -z hosts.hosts -o hosts.hosts -s -L -w %{http_code})" == "200" ]]; then
  cat hosts.local hosts.hosts > hosts
fi
You will need to run as root first
cp /etc/hosts{,.local}
The script will concatenate your current hosts and the one downloaded from the website. Set up a daily cron task for it, it will only download the file when the files get updated, the method is described as in this blog post.

Be sure to read the comments on the website, which also provides some different modifications and even a RSS feed for notification.

Yesterday, I was thinking to turn off Personalized Results, so I can see what non-signed in user would see. Don't want to use second browser/private session/etc to because I remember I used to see there is a link below the search result which you can temporarily turn off personalized results, but it has long gone.

I head over to the Search Settings, funny thing is I didn't see such option even a help entry, Turn off personal results, mentioning it. I don't have Web History running but I do have Google+ and I can see those additional results/annotations for social circles. I sometimes see Google Help doesn't help at all because things have changed, but help documentation doesn't get updated.

Anyway, what I saw is Blocking unwanted results, I don't long how long it have been sitting there, but I am glad I finally noticed it!


"You may block up to 500 sites." Oh yeah, that should be enough for me. I tried one and the website was removed from the results immediately.


I don't intent to block spam websites or content farms, those sites could be more the the population of cockroaches in the entire world. I want to block some archive websites as I mentioned in a post about Google Chrome Personal Blocklist Extension.

For those spams, I can only hope Google Search's algorithm will make them never see the light, rusting at the deep dungeon. Don't even think about it, they are rats, not dragon slayer. This is not video game, you nerdy dude! ;)

As for archive type websites, I don't really need to see them when there are original public source, even they are legit in my definition.

I have seen many times, they outrank original source which is never a good thing in my opinion. Besides, for one source, it could be four or more website doing the archiving. That means every five results, there is only one unique content, the only difference is the design/layout and ads.

Back to the blocking setting, you can download the list as text file, but I don't see an option for you to upload. If you really have many entries, it will be easy to maintain on your end with your favorite editor.

How so? Because I wouldn't just let it be simple text of lines of URL. It will be grouped with comments. For example, if the original archive site down, I can uncomment the archive sites group, then upload the temporarily updated list, so I can see those blocked archive sites.

Of course, I will have my custom shell script to generate uploading list text file, comments will be removed from output. Well, you only need one grep, actually.

But that's just how I would like to use it if you can upload. Who knows, we might even have API for this. I only really hope this setting will stay, Google has terminated too many good stuff.

Alright, enough talking, time to put into more action. Gotta add more blocked sites to the list, it's Friday!

When I saw Google released this Personal Blocklist extension, I tried it out in Chromium immediately. I would love to help improve the search result, punish those crappy websites.

There are many garbage-class websites around, they really need to be spanked. You bad boys! Some just like what this extension intends to target, the content farm. They are just awful, they fake content or gather other's content. Those website owners are shameless.

There are another type of websites, archive type. A typical case is mailing list archive. I don't like this type websites, either. Often, they outranks the originals in search results.

Another type is software download website. I have very low score for software download website. Partially because I don't need those websites because I am using Linux. I use either package manager or compiling by my own. Some of those websites scrapes FOSS hosting websites such as Google Code or Sourceforge, even the Chrome Store (formerly Chrome Extension).

Some websites I classify as rip-off. StackOverflow rip-off, Google Groups rip-off, usenet rip-off, manpages rip-off. I dislike those, they have nothing original.

I am hoping someday, I will see a real improvement in search results.

As of the extension, I have been wanting Google will release same for other browser. But they haven't and I don't know if Google will. Chrome is only third popular and no web browser dominates the market. Though Internet Explorer has 40%+ share, according to Wikipedia, it does still not count as domination. I believe the number isn't the true value reflecting everyone's preferable browser.

The blocklist is stored and processed on your computer. If you recall, there is some period time that Google allows you to delete certain page from its result. I think it's a search experiment. It had a cross icon next to page title, you click it, then you will see a Puff animation. Along with that cross icon, it's a Up arrow, which allows you to pin a page, just like the current star icon.

But it's removed for quite some time. If I recall correctly, the process is on Google's server. It's per page basis, not like this extension, it's domain-basis. I really like the feature, too bad it's removed.