I always want to know who links to my blog, so I check referrer data in Blogger Stats and Google Analytics report, also set up Google Alerts. I even search for this blog's domain name to see if there is any new hits. (Use "Past 24 hours" time range, it's very useful)

But it doesn't seem to be enough for me. They always seem to be missing some links from those methods, the Alerts hasn't even got me anything for a long time.

In Webmaster Tools, you can download a CSV of link-ins by clicking a button "Download more sample links" (so, this is not complete?) in Your site on the web / Links to your site / Who links the most "More ". (Lost?)

It is a list of external links which has links to your site. Since it is long, there is no humanly way to know which are new links.

So, I wrote a simple Bash script to do the job, run it with CSV files as arguments.


You can run it with CSV files of different websites, it has no problem with that. Once the CSV files are processed, they are safe to remove. You only need to keep the first two files in the last file list in the screenshot above.

This script has a few predefined regular expression to filter out some common duplicate URLs, such as WordPress's and Blogger's archive or index-like pages. You really want to see is the posts which has link to your website in its content.