As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label search engine. Show all posts

I think many Blogger users have been waiting for such improvements, the most useful option probably is the robots settings for me.

Robots header tags


I used to have my template with a special meta tag, so search engines won't index archive pages or search results pages. Now, I have set up this option:


You can read this page for details of those options. The robots tags is also available in per post basis, you can change for specific post.

The result is in HTTP Response Header, not in HTML (Response Body):

robots.txt

(added at 2012-03-27T10:35:58Z)

I noticed that there are some entries of Dynamic Views from search result, and I do not want those, so I customized the robots.txt based on current default robots.txt on Blogspot:
User-agent: Mediapartners-Google
Disallow: 

User-agent: *
Disallow: /search
Disallow: /view
Allow: /

Sitemap: /feeds/posts/default?orderby=update

404


There is also a new setting for writing your own message for 404. The text (HTML is allowed) is basically put into a small message box, the one like the No Found message when you search something not in your blog.

I wrote a 404 message, I think it fits this blog's style. Try to put something in URL and see by yourself.

There seems to have a new data:blog.pageType value error_page, if you customize your template, this may come in handy when you want to have totally different layout for error page.

Redirection


The last one I think is useful is the Custom Redirection. I used to have a page call Selections, but I never updated it after I created it, so I deleted a while ago. If I want, now I can set up a redirection for it.

The other scenario is typo in post title. We all make mistakes, and sometimes you want to fix it. If you post a blog post, then you finally notice there is a typo in your blog post. It may be too late to delete it and recreate a new post with correct post title. You can not change post slug URL, you must re-post in order to fix it.

The post probably is already indexed by search engine and even shared by your readers. If you delete it, whoever follow the link will not find it.

With this new redirection feature, there is no problem. Just post and set up a redirection, then redirect from the embarrassing one to the new one.

Search Description

To be honesty, this is not so practical in my opinion. For homepage,


Even you can see Google use posts contents to describe, I don't really care because it does not read like globgrubglab.

As for posts:


The first one should not have post title and time tag in the description, but second one is correct.

With Search Description feature, the issues above can be avoid. The question is are you willing to and can you remember to write a short description for each post? I am not and probably can't. But that's just me.

Still, this is a good addition, it's good to have even I don't need it. You can never know if someday this may save your life.

As some of you may know you can search "time UK" for current local time on Google or Yahoo. I use this feature quite often, but one thing I never couldn't understand is why I can't query by the timezone.

It makes no sense to me. Since it's about time, using timezone as key is so natural to me. When you are participating international events remotely, it is not always have geographical location listed in the time table, timezone abbreviation is most likely to see. Practically, an event often is listed with two or three different local times with respective timezones.

I just don't understand why Google engineers don't get it or, the worst,not even think about it.

Okay, no timezones, fine. Then how about ISO-3166-1 alpha-2/3?
  • "time DE" gets you time in Delaware. I can't really argue about this one, because it's US-regional Google Search.
  • "time DEU" gets nothing.
  • "time GER" gets nothing.
  • "time germ" still nothing.
  • "time germa" still nothing.
  • "time german" frak still nothing.
  • "time germany" yea, finally.
Although "time CET" gets nothing on Yahoo, too, but "time DE" will get you time in Delaware and time in Germany in second line. You can also try "time NL"

Yahoo is smarter when you ask about time. Note that it's not like Google can't give you a list of times regarding the key you use, see "time US".

Maybe time is not so important in Google?

I had thought about writing a web page which can support querying by timezone, but DST is really the pain in the ass and I don't really know how to get accurate DST starting and end information.


PURGE THEM!

Darn useless results and screw you shameless net garbage generator trash website owners.

Frak! Screw them!

Where is the quality you keep blah blah blah us?

When I saw Google released this Personal Blocklist extension, I tried it out in Chromium immediately. I would love to help improve the search result, punish those crappy websites.

There are many garbage-class websites around, they really need to be spanked. You bad boys! Some just like what this extension intends to target, the content farm. They are just awful, they fake content or gather other's content. Those website owners are shameless.

There are another type of websites, archive type. A typical case is mailing list archive. I don't like this type websites, either. Often, they outranks the originals in search results.

Another type is software download website. I have very low score for software download website. Partially because I don't need those websites because I am using Linux. I use either package manager or compiling by my own. Some of those websites scrapes FOSS hosting websites such as Google Code or Sourceforge, even the Chrome Store (formerly Chrome Extension).

Some websites I classify as rip-off. StackOverflow rip-off, Google Groups rip-off, usenet rip-off, manpages rip-off. I dislike those, they have nothing original.

I am hoping someday, I will see a real improvement in search results.

As of the extension, I have been wanting Google will release same for other browser. But they haven't and I don't know if Google will. Chrome is only third popular and no web browser dominates the market. Though Internet Explorer has 40%+ share, according to Wikipedia, it does still not count as domination. I believe the number isn't the true value reflecting everyone's preferable browser.

The blocklist is stored and processed on your computer. If you recall, there is some period time that Google allows you to delete certain page from its result. I think it's a search experiment. It had a cross icon next to page title, you click it, then you will see a Puff animation. Along with that cross icon, it's a Up arrow, which allows you to pin a page, just like the current star icon.

But it's removed for quite some time. If I recall correctly, the process is on Google's server. It's per page basis, not like this extension, it's domain-basis. I really like the feature, too bad it's removed.

I found some search hits on my blog were actually the false positives. The keywords the visitors used were actually hit on the Popular Post list.

I don't like this, so I remove it. It's the same reason why my blog doesn't have Recent Posts or some other whatsoever the lists they are. I hate when I click on a result and find out that's not what I am looking for at all. I have encountered a few times, that the results were from blogs' blogroll. What can you do about it? Nothing, no one do care. Search engine providers do not care, blog owners do not care. Hits are all they care. (The last sentence is not totally true but close)

I even added <meta content='noindex' name='robots'/> to yearly/monthly archive pages for preventing more such occurrences. Search engines are still too dumb, let's all I could say.

I am thinking to add an AJAX version of Popular Posts list, but I haven't got a nice idea to do so. It's obviously I couldn't get the list from the official list since there is no API available.

If I could, I would put some buttons on. Visitors could decide if they want to load of lists of some things, such as Popular Posts, Recent Posts, Recent Comments, etc.

By the way, while I was removing the gadget, I saw Blogger has used a new good-looking dialog:

2010-11-21--07:18:27

Updated 2010-11-21


I was thinking to do something like this, which uses FeedBurner Awareness API to list popular entries. I have written an working code, here is the diff for my existing script:

diff -r c29f74858f02 src/static/g/fb/fb.js
--- a/src/static/g/fb/fb.js Mon Nov 01 19:45:52 2010 +0800
+++ b/src/static/g/fb/fb.js Sun Nov 21 14:19:49 2010 +0800
@@ -33,6 +33,21 @@
   }
 
 
+function lilbtn_g_fb_get_resyndication_data(feed_url, dates, callback) {
+  if (!feed_url || !callback)
+    return
+  google.setOnLoadCallback(function() {
+    var query = "select * from xml where url='https://feedburner.google.com/api/awareness/1.0/GetResyndicationData?uri=" + encodeURIComponent(feed_url) + (!!dates ? "&dates=" + dates : "") + "'"
+    $.getJSON("http://query.yahooapis.com/v1/public/yql?q=" + encodeURIComponent(query) + "&format=json&callback=?", function(json) {
+      if (json.error)
+        callback(json)
+      else
+        callback(json.query.results.rsp);
+      });
+    })
+  }
+
+
 function lilbtn_g_fb_text_render(feed_url, container, _texts) {
   lilbtn_g_fb_get_feed_data(feed_url, function(rsp) {
     if (container == undefined)
@@ -89,4 +104,46 @@
       ele.attr('class', ele.attr('class') + ' lilbtn_g_fb_feedcount');
     });
   }
+
+
+function lilbtn_g_fb_popular_items_render(feed_url, container) {
+  lilbtn_g_fb_get_resyndication_data(feed_url, '2010-11-11,2010-11-19', function(rsp) {
+    if (container == undefined)
+      container = 'lilbtn_g_fb_popular_items';
+    var $ele = $('#' + container);
+    if (rsp.error || rsp.stat != 'ok') {
+      var msg = 'Error on retrieving resyndication data: ' + ((rsp.error) ? rsp.error.description : rsp.err.msg) + '; Click to check out lil\u2218btn website for help';
+      ele.html('<a href="http://lilbtn.appspot.com/help" class="lilbtn_g_fb_popular_items lilbtn_error""><img src="http://lilbtn.appspot.com/img/error.png" alt="' + msg + '" title="' + msg + '"/></a>');
+      return;
+      }
+    var items = {};
+    $.each(rsp.feed.entry, function(idx_e, entry) {
+      if(entry.item)
+      $.each(entry.item, function(idx_i, item) {
+        if (items[item.url]) {
+          items[item.url].score += parseInt(item.itemviews);
+          }
+        else {
+          items[item.url] = {title: item.title, url: item.url, score: parseInt(item.itemviews)};
+          }
+        });
+      });
+    var items_sort = [];
+    $.each(items, function(url, item) {
+      items_sort.push([item.score, item]);
+      });
+    // Sort from highest to lowest score
+    items_sort.sort(function(a,b){return b[0] - a[0];});
+    items = $.map(items_sort, function(item) {return item[1]});
+    delete items_sort;
+    var $ul = $('<ul/>');
+    $.each(items, function(idx, item) {
+      var $li = $('<li/>');
+      var $a = $('<a/>').attr('href', item.url).text(item.score + ': ' + item.title);
+      $li.append($a);
+      $ul.append($li);
+      });
+    $ele.append($ul);
+    });
+  }
 // vim:ts=2:sw=2:et:ai:

I am not going to put this on. After I saw the results, I knew it's not a good idea. Because only entries currently in feed would get view counts or click-through, which doesn't reflect the real view counts. You can not really call them as popular posts. Popularly recent posts might make more sense.

I guess Popular Posts wouldn't come back anytime soon.

Or you will be googled for anything about you:

Google is more and more scary

I recently found out you can find out not only the weather and time. Age, eye color, hair color is also possible to get on Google.

Someday we might be able to type this "barack obama whole life."

Updated: I tried two more...

2010-11-12--22:18:39

2010-11-12--22:18:52

and

2010-11-12--22:22:14

also tried children, daughter, dog, those didn't work.

This is another not so useful script of mine. It is a Bash script and it gathers the search results counts via Google/Yahoo/Bing APIs to make an historical chart of specified keyword using the Annotated Timeline (with no annotations, :-))of Google Visualization API.

I made this script, search-result-count.sh, because I wanted to have a historical chart of this keyword livibetter. Yep, I searched my nickname regularly, I admit it! I like watching the number of result climbing up, which would make me feel better. :-D

I wasn't planning of using Yahoo and Bing because their APIs require AppIDs, the IDs will be out to apparently public if I use Bash to write this. I don't like it, but I couldn't resist to see the result counts from them.

Because it is still new, I could not have much data to show you. The following chart was collected about two weeks. (Bing results was not included)



The following is the screenshot of rendered HTML page:

I googled myself and made a chart. on Twitpic

Please aware of few things if you want to use this script:
  • You can use cron to run it regularly, several times a day. Don't worry, it will only update the data file once a day.
  • It will only update when three counts from three search engine are available. If any of them couldn't return the result, you may have a missing data. But it should be okay, the counts do not change much from day to day.