As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label blocking. Show all posts

I began to use Block unwanted website in late February, just a month later, it's been broken since then for almost 4 months, a discussion was started back in late March. Like some Google products if it's not current hot product like Google+, you often get late response or even nothing from Google's staff. Lucky this time, we did have a couple of replies from Google employee.

Mar 18, the OP posted about the issue, three months later, Jun 19, finally a Google employee replied to acknowledge the issue. It's better late than never, right?

A half month later, July 4, second reply from same employee said the team is working on the issue and provided barely a workaround for unblocking, which I don't need and I already know that unblocking function. The most important part is still no mention about the problem, why the function doesn't work.

When I first noticed the issue, it's like someone pulled a minified JavaScript from Google's server deliberately.


To me, it doesn't look like something is actually broken because of coding. Like I said, it looks like being pulled, therefore the functions are not available.

I don't now what the real cause is, only Google knows it, but I guess we, the users, must have blocked a lot of sites. 500 sites allowed per user. It's a lot. I think maybe Google can't handle that kind of per-user filter. But that's only my guessing.

The thing is Google must know from beginning if they did pull JavaScript and I really hate Google for late response or lack of proper handling. Beside, suddenly when no one blocks websites, they must have noticed the database stopped growing. No way on Earth that they didn't know when the issue appeared. They are Google, this blocking data is worth to make some statistics even they have never planned to use it for ranking algorithm.

If they want to pull the functionality, it's fine by me. But they need to tell people, just put up a notification saying the function is temporarily disabled, that's really okay by most of people. Disappointed, yes, but much better than unknowing to the cause.

I sincerely feel I have become more and more dislike Google's way to manage things over recent years, and this case is just one of the reasons. They keep talking about government transparency, but they aren't even transparent enough to tell us the cause. They don't need to tell the technical detail, most people wouldn't understand, anyway. A simple summary would satisfy us, who have been waiting an answer and a resolution for nearly 4 months.

Assuming this referrer is truly from Scansafe.


I don't know how this Scansafe works, are they trying (or their client) or have they blocked that page for some reason? If so, for what? And I can't find a way to check up on their website. By the way, their website has a big chuck Flash.

Because it is a referrer, therefore someone must be viewing that page and its URL is that looooooooooooooong. Only a few days ago, I posted about not so good URLs, Google Search's is long, but this Scansafe's is a real champion.

I have to mask portions of the screenshot, I didn't try to decrypt it, but someone maybe want to and they maybe own malicious websites, which certainly will be qualified dangerous website and whoever uses Scansafe will check out. It is like reverse-honeypot.

That cryptic long text may contain encrypted sensitive information or not, but I will guess it does not. You hardly will see URL mis-include sensitive information nowadays.

There is one more thing is strange, that is HTTP. I think HTTPS will not be sent in referrer header. I am not sure about this, never thought about this part, have to check the spec. or something. Anyway, Scansafe is a security product, then how come it is only a HTTP connection when a client needs to be ensured with the maximal security while they are using Scansafe website?

Of course, the stuff above is assuming the referrer is legitimate. What if it is not, it is bogus? Then the question is who sent that and why.

If it was sent by Scansafe for whatever testing or checking purpose, then they become bad bots; if it was sent by someone else, then what's the purpose to impersonate Scansafe? Which I don't have an answer for that.

Off-topic: What is a good way to block by specific Referrer on Blogger? Seems that JavaScript is the only way. But it is not real blocking, but masking content when certain referrer is matched.

When I was browsing on commandlinefu.com, I saw this entry Block the 6700 worst spamhosts: (URL edited for plain text file)
wget -q -O - http://someonewhocares.org/hosts/hosts | grep ^127 >> /etc/hosts

As of writing (2012-03-24T08:01:18Z), the list, made by Dan Pollock, has grown to 9,502 domains. That is insane! See how many spam websites we have, although not all are spams, some of the entries are legitimate advertising distributors.

To be honest, I was really tempted to use it, but the huge amount of entries did hold me back completely.

If you want to try it, I can propose you a short script as system cron task. I didn't test and I am writing in on the fly, so use it as your own risk:
cd /etc
# just in case, you haven't saved current hosts as hosts.local
[[ ! -f hosts.local ]] && exit 1
if [[ "$(curl http://someonewhocares.org/hosts/hosts -z hosts.hosts -o hosts.hosts -s -L -w %{http_code})" == "200" ]]; then
  cat hosts.local hosts.hosts > hosts
fi
You will need to run as root first
cp /etc/hosts{,.local}
The script will concatenate your current hosts and the one downloaded from the website. Set up a daily cron task for it, it will only download the file when the files get updated, the method is described as in this blog post.

Be sure to read the comments on the website, which also provides some different modifications and even a RSS feed for notification.

Yesterday, I was thinking to turn off Personalized Results, so I can see what non-signed in user would see. Don't want to use second browser/private session/etc to because I remember I used to see there is a link below the search result which you can temporarily turn off personalized results, but it has long gone.

I head over to the Search Settings, funny thing is I didn't see such option even a help entry, Turn off personal results, mentioning it. I don't have Web History running but I do have Google+ and I can see those additional results/annotations for social circles. I sometimes see Google Help doesn't help at all because things have changed, but help documentation doesn't get updated.

Anyway, what I saw is Blocking unwanted results, I don't long how long it have been sitting there, but I am glad I finally noticed it!


"You may block up to 500 sites." Oh yeah, that should be enough for me. I tried one and the website was removed from the results immediately.


I don't intent to block spam websites or content farms, those sites could be more the the population of cockroaches in the entire world. I want to block some archive websites as I mentioned in a post about Google Chrome Personal Blocklist Extension.

For those spams, I can only hope Google Search's algorithm will make them never see the light, rusting at the deep dungeon. Don't even think about it, they are rats, not dragon slayer. This is not video game, you nerdy dude! ;)

As for archive type websites, I don't really need to see them when there are original public source, even they are legit in my definition.

I have seen many times, they outrank original source which is never a good thing in my opinion. Besides, for one source, it could be four or more website doing the archiving. That means every five results, there is only one unique content, the only difference is the design/layout and ads.

Back to the blocking setting, you can download the list as text file, but I don't see an option for you to upload. If you really have many entries, it will be easy to maintain on your end with your favorite editor.

How so? Because I wouldn't just let it be simple text of lines of URL. It will be grouped with comments. For example, if the original archive site down, I can uncomment the archive sites group, then upload the temporarily updated list, so I can see those blocked archive sites.

Of course, I will have my custom shell script to generate uploading list text file, comments will be removed from output. Well, you only need one grep, actually.

But that's just how I would like to use it if you can upload. Who knows, we might even have API for this. I only really hope this setting will stay, Google has terminated too many good stuff.

Alright, enough talking, time to put into more action. Gotta add more blocked sites to the list, it's Friday!