As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label Bash. Show all posts

ASCII-Pony is a screenshot information tool, but it does not display distribution logo like screenFetch, ponies from My Little Pony: Friendship Is Magic (2010-) instead, which is an animated television show for children.

https://lh3.googleusercontent.com/-4firlCDbHHk/VohyIxUeyII/AAAAAAAAIyM/l6fTI8UqOXQ/s800-Ic42/ASCII-Pony.gif

The script is even named systempony, how fitting to this pony overloading tool. You pony it for screenshot, pretty sure its a verb. As for what it actually mean, I have no clues.

christma.sh shows a calm animation with a music. It looks like a Christmas card on computer screen.

https://lh3.googleusercontent.com/-bF_h4TGMCvE/Vn3P-fHy55I/AAAAAAAAIu8/81X5epr0Bwc/s800-Ic42/christma.sh.gif

The snowman is adorable with a not-so-little cane. The focal point, the Christmas Tree, stands right in front of you without any snow, it must be a magical tree. Snowing in the background, and smokes come out of chimney, which looks very creative in that ASCII art.

Originally with the first glance, I wished there was a bit of red here and a bit of green there. However, as I look at it more, it doesnt need any colors, simply monochrome does better. Its white and thats Christmas.

christma.sh is written in Bash under the MIT License, currently git-07076bc (2015-12-25) with music Faith Noel (2004) by Trans-Siberian Orchestra.

A couple of years back I wrote about the builtin vs keyword conditional expressions, but I didnt think about another conditional use with arithmetic, that is with tests like [ $a -gt $b] or [[ $a -ne 0 ]].

Use similar code to benchmark 1 > 0 as shown below, not the real code, but you get the idea:

time for ((i = 0; i < 10000; i++)); do <test> 1 >/-gt 0 ; done

The result is:

<test> with time % slower
[ 6.647s 47.9%
[[ 4.692s 04.4%
(( 4.493s fastest
test (builtin) 6.538s 45.5%

(( is just marginally faster than [[ by my definition. Of course, they both are faster than builtin [ and test, as for /usr/bin/test, its an external command, there is no point to test, because its slow for sure.

With this result, there really isnt much difference between two, however, 1 > 0 is more readable than 1 -gt 0, literally and mathematically.

Note

Yes, [[ 1 > 0 ]] is a valid syntax, but it doesnt do what you think, its not arithmetically but lexicographically. In short, its for strings, see bash(1), and there is no such this as [[ 1 >= 0 ]].

Six months ago, I wrote Performance in shell and with C about converting a Bash project td.sh into a C. During it, I got some numbers, which showed me the performances of shell builtin and C. Almost two years ago, I wrote about sleep command, also talking about the cost of invoking external command.

Last night, suddenly I realized that I could have just given a very simple example by using true, using Bashs time builtin to time the loops:

test time
for i in {1..10000}; do      true; done 00.049s
for i in {1..10000}; do /bin/true; done 11.556s

The builtin true is 23,484% faster than the external /bin/true, numbers got from looping the following commands for 1000 iterations, also as another example:

test time
e      '(11.556-0.049)/0.049*100' 0.028s
bc <<< '(11.556-0.049)/0.049*100' 2.560s

Note

bc actually returns 23400 if without using scale.

The e above is a shell builtin, from the e.bash project I forked off e, tiny expression evaluator. After I learned about the cost, I was no longer using bc just for simple calculation for floating numbers; but with e, I do calculate floating number again in Bash scripts.

These numbers should be very clear abort using external commands. If you have a long loop and there are lots of uses of external command, then you should really consider rewriting in other programming languages.

This June, I started to working on transitioning td.sh into a C project as well as using GNU Autotools to build. When I decided to do it, it didnt come to my mind that I would have the following statistics to look at:

name rate type
Bash implementation
print_td 1856 Bash function calls
td.sh 159 Bash script executions
C implementation
td 52 C executions
Bash with C loadable extension
td 21631 Bash loadable executions
Python bindings
td.py 35 Python 2 script executions
td.py 12 Python 3 script executions

As you could see, the C loadable extension for Bash undoubtedly beats everyone else, 10+ times better in performance. Because of this, I decided to bring back vimps1, which was quietly disabled last August, when I couldnt get it to work with latest vcprompt code at the time, which was linked into vimps1.

At the time, when I didnt really have the way to measure the performance, but I knew it ran faster. How would I know? Well, thats simple, just holding down Enter and see the rate of prompt showing up. You could feel it when you used loadable extension or just pure Bash PS1.

Because of td.sh, now I have just brought back vimps1, although it has a problem with Git repositories, you could still see the benefits:

name rate
vimps1 1355
bash_ps1 141

Its more than 10 times because bash_ps1 is a seriously stripped down version of my normal Bash PS1.

Not many people are aware how bad their shell or prompt is wasting, probably nobody else cares. However, I begin to think, maybe I need a shell that having performance in its feature list. Bash, although its not a bad shell, from time to time, I do hope it could run faster. It might be wrong to think so, since I havent really seen anyone really talk about shell performance.

After I finished td.shs C transition, I thought about making an extension that would do arithmetic with float number, but thats just wrong to do so, because if you want to make Bash does float, you need to patch Bash or just use other programming language to write the whole code, I have seen a lot of people using bc. Frankly, they truly are doing it all wrong.

Note

In September, 2014, I did make a project, e.bash, by forking e, tiny expression evaluator. It is a Bash builtin, and because I can, so I did it.

They dont understand how costly of an invocation, just look at the first table, 52 vs 21631 runs, thats the prize you have to pay. If you havent known, here is a tip for you, dont use external command in Bash script unless its something Bash cant do.

Note

In Feburary, 2015, I realized there is a very simple example using true to see the performances and costs.

And if most of code is relying on external commands, maybe you should consider writing in a language could do all the tasks. Nevertheless, if thats a because I can, then be my guest.

Even since I started writing Bash script, I always use [[ for if conditional statement, the double square brackets, never the single one version unless compatibility has to be taken into account. Why did I choose it? Its because I read [[ is faster. Ive never tried to confirm it by myself, although I do know that [ is a type of shell builtin command and [[ is type of shell keyword, and [ is equivalent to test, which is also a builtin command in Bash for very long time.1

Note

[ and [[ can also be used for arithmetic comparison, such as the use of -gt operator, see another test with Arithmetic Evaluation, ((. This post only focuses on the strings.

From time to time, I often see people still using [ even in the script with a few Bash-only syntaxes or features. They are clearly not writing with compatibility as a requirement. If I have a chance, then I would probably advise the coder to change to [[. I had done so a few times in the past.

However, I never see the numbers, so I used the following to test:

time for ((i = 0; i < 10000; i++)); do <test> -z '' ; done

The result is:

<test> with time % slower
[ 00.149s 00,063.7%
[[ 00.091s fastest
test (builtin) 00.150s 00,064.8%
/usr/bin/test 13.798s 15,063.6%

[ and test possibly are synonyms since they are pretty close.

As you can see [[ definitely is the winner, and /usr/bin/test external command is the slowest. The problem with /usr/bin/test isnt that is inefficient, but external command is costly.

If you are new to Bash, just use [[.

[1]If you dont know the differences between keyword, builtin command, and external command, google them.

A couple of days ago, I typed in the following one-liner:

$ sleeptil -v 19:00; while [[ -z $(chk-jtv-lives.sh) ]]; do sleep 1m; done; beeps
Thu Dec 27 19:00:00 CST 2012 in 37 minutes 18 seconds

What it did was:

  • Wait until 7 PM using sleeptil,
  • Check chk-jtv-lives.sh if it outputs anything,
  • If not, sleep for one minute, then go to #2,
  • If yes, leaves the loop, and runs beeps.

There is a channel which broadcasts around 7:30 PM every day, sometimes, it does, sometimes, it doesnt. I dont need to check or keep press refresh button to know if the stream is live or run that checking script. Just let this one-liner run and I will hear beeping noise when the stream goes live.

When I wrote those scrips and functions, I didnt know one day I would use all of them in a one-liner. No need for a special app or browser add-on, just piece together a few codes that I have written and I have a unique and custom-made notifier.

It may not look epic or fancy as any other notifier, but it tells me: When you write a useful code and you will know it eventually. Not only it helps you but also saves your time.

I really like shell scripting, you can use Bash to do many tasks. Oftentimes, a simple one-liner does the task perfectly, and that makes you feel great.

Have you had similar experience?

I use command-line a lot and sometimes a command takes time to execute. So, months ago, I started to use the following alias:
alias beeps='for i in {1..5}; do aplay -q /usr/share/sounds/generic.wav; sleep 0.5s; done'
I ran it like:
command ; beeps
I just left the command to run and came back when I heard the beeping sound. That alias was just ugly simple and did what I need: getting notified when the job is done. It's very help for running commands don't have notifications, which most command-line program wouldn't have. I just append the beeps after the commands, such as eix-sync, emerge, etc.

I updated the alias, so it would play different sounds depending on the exit status, also it retains the exit status after it plays the sound. Although, I don't think retain of exit status is really necessary, nonetheless it's not hard to implement and nice to have.
alias beeps='(BEEPS_RET=$? ; ((BEEPS_RET)) && BEEPS=error || BEEPS=generic ; for i in {1..5} ; do (aplay -q /usr/share/sounds/$BEEPS.wav &) >/dev/null ; sleep 1 ; done ; exit $BEEPS_RET)'
With pretty-print:
(
  BEEPS_RET=$?
  ((BEEPS_RET)) && BEEPS=error || BEEPS=generic
  for i in {1..5}; do
    (aplay -q /usr/share/sounds/$BEEPS.wav &) >/dev/null
    sleep 1
  done
  exit $BEEPS_RET
)
I use sub-shell to so the variables don't contaminate the current shell. This may look better by using function, but I started with alias, so continued with alias. If you wants to use function, just replace sub-shell with function, declare the variables are local, replace exit with return.

This is probably my own oldest script and still being used besides .bashrc.

This g script was born at 2007-12-26T03:01:29+0800. (Yep, I am a timestamping nuts) Four years ago, I wrote a post about how to use it, but I doubt anyone has ever wanted to use it.

Its a simple script for quickly switching working directory. You can switch with index or keyword, it also supports Bash completion. There are a lot of script like this around, but I am happy with mine, even it had some issues.

After these years, I finally updated it to adding keyword support which I always wanted. I used to memorize the indexes of those directories, it wasnt a big issue for me. Besides that, a Makefile was written for easy installation.

Other than fixing some syntax and weird scripting (still read weird), its basically the same as you read in that old blog post.

If you havent tried directory switcher, why not try mine and create some issues for me to fix? (I am sure there are plenty to fix)

Have you ever written a long and fancy long shell one-liner, execute it, then an error due to a silly typo sitting in the middle of that command? Or you just need to compose a command which you know will be long before its written?

You can always create a file, but sometimes, its a one-time only task. So, is there any way to get the job done easily? Of course, there is and more than one method.

1   Substitution

For fixing a typo in a command you just type, you can use history expansion:

$ typo bash
-bash: typo: command not found
$ ^typo^type
type bash
bash is /bin/bash

or using fc:

$ fc -s typo=type

or substituting other than last command:

$ !-3:s/foo/bar/

The command above substitutes foo with bar in the third previous command.

2   Editing

If you just run fc, then it brings your editor up and feed it with last command. You can even give fc a range, so you can edit a list of commands at once:

$ fc -5 -3

You will have commands from 5th previous command to 3th previous command in your editor. If you are not sure the numbers, you can use -l option to get a numbered list, then issue the command with according numbers:

$ fc -l
...
$ fc 100 105

Another easier way to edit current command in Bash is to press Ctrl+X then Ctrl+E, or Esc then V if Vi mode enabled.

Having little fun with touchpad:

https://i.ytimg.com/vi/HrSgK9pe0Ko/sddefault.jpg

A/V out of sync, sorry about that!

Its a Bash script, you will need to enable SHMConfig for monitoring touchpad activity. I divided my touchpad into 3x2 cells, each with one audio file from /usr/share/sounds. The sound files may not be available on your system, just edit the script for audio files you have.

Synaptics actually provides three programs, called Just for Fun. They are Windows-only, which is not much of a surprise. I dont know how exactly these programs are, how much fun they could be. But I decided to create a simple one of my own.

After I have a working code, I realized there is a thing called keyboard and it would work even better than touchpad. Just I made this, well, because I can.

I was thinking to allow multiple tapping, but thats not very possible. Even though there is information about how many fingers on touchpad, but you dont have the position of each finger, only the first ones.

There are few different approaches for displaying a message to user around the logging screen, /etc/issue (prelogin message and identification file), /etc/motd (message of the day), or the fortune files, and probably more. If you use Graphical Login interface (Display Manager), those will not work for you, your system may go into X window environment before you can even catch a glimpse of issue.

I use Console Display Manager (CDM), the issue file works better for me among those options and I already have it showing gentoo in purple text for almost three years since I started using Gentoo.

I wanted to make it more interesting, seeing the same purple text everyday had got me bored. Need to spice it up! Dont you think? This is what I have now:

http://2.bp.blogspot.com/-oBtPxQQMJWk/T4FWHIz4nfI/AAAAAAAADNM/SeQDlDUCSY0/s1600/2012-04-08--06:20:42.png

Gotta hear something Thats What She Said from TWSS.

The code is not so complicated, basically a one-liner:

cat /etc/issue.logo \
    <(echo -e '\e[1;34m') \
    <(xmllint --xpath '//item[1]/description/text()' <(wget -q -O - http://www.twssstories.com/rss.xml) |
      sed 's/&lt;\/\?p&gt;//g;q' |
      fold -s) \
    <(echo -e '\e[0m') \
    /etc/issue.orig \
    > /etc/issue

I use xmllint to extract text of the first description node, remove unwanted escaped HTML tag, wordwrap the text, then produce the file with color code and the purple text.

Generally, I only log in once a day, so I put this script to daily system cron job since I will only see this one-time everyday. Besides, TWSS doesnt have many entries a day, not even one a day.

Originally, I wasnt planning using TWSS but xkcd comics. Hoping img2txt of libcaca can magically output good ASCII from image, but thats just impossible with only 80 characters in width. I am thinking if you have framebuffer, then it might have a chance with rendering of the image.

TWSS is the first source for the text, I am planning to include more sources, randomly pick one, something like quotes or some ASCII-art. As long as websites provide API or RSS feed, I will try to add to my issue file generation script.

1   Introduction

This is a series about Bash scripting. The content is what I have learned from scripting or some interesting points what I have read about. What will be in the series? Scripting, of course. My experience, tips, tricks, and pitfalls, which I have learned over the years in good ways or bad ways. The purpose of this series is meant to help your Bash Scripting Made Easy.

They will be written in blogging post format and kept short and clean as much as possible. Basically independent posts and not relying on previous posts. I dont plan to have a regular posting schedule, when I have something to write about, I write and publish it.

What will not be in the series:

  • Interactive shell usage - this is about shell scripting not how you use Bash interactive shell.
  • (Common) Command-line tools - the topics will not be focusing on how to use awk, sed, cut, etc. The main focus is Bash scripting, but may be mentioning some tools. For example, parsing tip using Bash and a little help from other tools.

2   Feedback

Please feel free to give your comment on the posts, or here if its about entire series. You can suggest anything if there is something you want to read about Bash scripting.

3   Index

3.1   By published order

When I was browsing on commandlinefu.com, I saw this entry Block the 6700 worst spamhosts: (URL edited for plain text file)
wget -q -O - http://someonewhocares.org/hosts/hosts | grep ^127 >> /etc/hosts

As of writing (2012-03-24T08:01:18Z), the list, made by Dan Pollock, has grown to 9,502 domains. That is insane! See how many spam websites we have, although not all are spams, some of the entries are legitimate advertising distributors.

To be honest, I was really tempted to use it, but the huge amount of entries did hold me back completely.

If you want to try it, I can propose you a short script as system cron task. I didn't test and I am writing in on the fly, so use it as your own risk:
cd /etc
# just in case, you haven't saved current hosts as hosts.local
[[ ! -f hosts.local ]] && exit 1
if [[ "$(curl http://someonewhocares.org/hosts/hosts -z hosts.hosts -o hosts.hosts -s -L -w %{http_code})" == "200" ]]; then
  cat hosts.local hosts.hosts > hosts
fi
You will need to run as root first
cp /etc/hosts{,.local}
The script will concatenate your current hosts and the one downloaded from the website. Set up a daily cron task for it, it will only download the file when the files get updated, the method is described as in this blog post.

Be sure to read the comments on the website, which also provides some different modifications and even a RSS feed for notification.

You may want to read a similar post for cURL, I wrote about the reason for preventing downloading same content.

1   The flow

It seems fairly easy, too:

% wget http://example.com -S -N
--2012-03-23 20:27:23--  http://example.com/
Resolving example.com... 192.0.43.10
Connecting to example.com|192.0.43.10|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 302 Found
  Location: http://www.iana.org/domains/example/
  Server: BigIP
  Connection: Keep-Alive
  Content-Length: 0
Location: http://www.iana.org/domains/example/ [following]
--2012-03-23 20:27:23--  http://www.iana.org/domains/example/
Resolving www.iana.org... 192.0.32.8
Connecting to www.iana.org|192.0.32.8|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 23 Mar 2012 12:27:24 GMT
  Server: Apache/2.2.3 (CentOS)
  Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT
  Vary: Accept-Encoding
  Connection: close
  Content-Type: text/html; charset=UTF-8
Length: unspecified [text/html]
Server file no newer than local file `index.html' -- not retrieving.

You may have noticed the difference, it does not use If-Modified-Since as cURL does. This request should be a HEAD request, Wget determines whether to GET or not based on the Last-Modified and Content-Length, which was not sent by the server and this would be a problem when a server sends it with 0 in length, but actually the contents length is non-zero. A case for it is Blogger:

% wget http://oopsbroken.blogspot.com --server-response --timestamping --no-verbose
  HTTP/1.0 200 OK
  X-Robots-Tag: noindex, nofollow
  Content-Type: text/html; charset=UTF-8
  Expires: Fri, 23 Mar 2012 12:38:47 GMT
  Date: Fri, 23 Mar 2012 12:38:47 GMT
  Cache-Control: private, max-age=0
  Last-Modified: Fri, 23 Mar 2012 10:49:23 GMT
  ETag: "f5024c0a-c96f-464f-b96b-d89efdd69010"
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Content-Length: 0
  Server: GSE
  Connection: Keep-Alive
  HTTP/1.0 200 OK
  X-Robots-Tag: noindex, nofollow
  Content-Type: text/html; charset=UTF-8
  Expires: Fri, 23 Mar 2012 12:38:47 GMT
  Date: Fri, 23 Mar 2012 12:38:47 GMT
  Cache-Control: private, max-age=0
  Last-Modified: Fri, 23 Mar 2012 10:49:23 GMT
  ETag: "f5024c0a-c96f-464f-b96b-d89efdd69010"
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
2012-03-23 20:38:48 URL:http://oopsbroken.blogspot.com/ [44596] -> "index.html" [1]

Every time you run, it always gets updated even the content is the same.

In the infopage of Wget:

   A file is considered new if one of these two conditions are met:

  1. A file of that name does not already exist locally.

  2. A file of that name does exist, but the remote file was modified
     more recently than the local file.

  [snip]

   If the local file does not exist, or the sizes of the files do not
match, Wget will download the remote file no matter what the time-stamps
say.

If the sizes do not match, then Wget will GET the file. In case of Blogger, it returns with:

Content-Length: 0

Which is incorrect, since the contents length isnt not really zero. The problem is Wget believes it. The file length is 44596, they are not match, therefore Wget updates the file.

To avoid this, you need --ignore-length option:

% wget http://oopsbroken.blogspot.com --server-response --timestamping --no-verbose --ignore-length
  HTTP/1.0 200 OK
  X-Robots-Tag: noindex, nofollow
  Content-Type: text/html; charset=UTF-8
  Expires: Fri, 23 Mar 2012 12:42:06 GMT
  Date: Fri, 23 Mar 2012 12:42:06 GMT
  Cache-Control: private, max-age=0
  Last-Modified: Fri, 23 Mar 2012 10:49:23 GMT
  ETag: "f5024c0a-c96f-464f-b96b-d89efdd69010"
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Content-Length: 0
  Server: GSE

Now Wget does not try to get the file because of the Content-Length.

2   Issues

There are several issues or difficulties when using Wget instead of cURL.

2.1   Incompatible with -O

As you can see, I didnt use -O for specifying output file because it is incompatible with -N (--timestamping), which disables -N. You need to deal with the downloaded filename. Basically, you can use basename or it can be index.html or some bizarre names if query is presented.

2.2   When to process

You can not rely on status code, you need to check if file gets updated in either seeing timestamp change of file or parsing output of Wget to see if a file is saved. Not really a smart way to deal with.

However, if you saves timestamp, then it can be used to check and you dont really need to keep a local file. Well, not true. You still need to keep a local file since Wget need to get timestamp from local file. I cant find anyway to specify a timestamp.

3   Conclusion

I recommend using cURL instead of Wget. You may manually add request header for dealing with these issues, but using cURL is much easier. So why bother?

If you have good way to deal with the issues (not really manually do the whole process in your script), feel free to comment with codes. There may be some useful options I miss when I read the manpage and infopage.

I just accidentally discovered there is a Bash builtin help (by typing hel for auto-complete), and it is really helpful.

The following is an example:
$ help help
help: help [-dms] [pattern ...]
    Display information about builtin commands.
    
    Displays brief summaries of builtin commands.  If PATTERN is
    specified, gives detailed help on all commands matching PATTERN,
    otherwise the list of help topics is printed.
    
    Options:
      -d        output short description for each topic
      -m        display usage in pseudo-manpage format
      -s        output only a short usage synopsis for each topic matching
        PATTERN
    
    Arguments:
      PATTERN   Pattern specifiying a help topic
    
    Exit Status:
    Returns success unless PATTERN is not found or an invalid option is given.
You can read the options, arguments, and explanations. I realized I had spent (wasted) too much time on scrolling and searching in Bash manpage for the same stuff.

I love timestamp. I have keys in Vim for that. So I added a new keybinding for Bash shell:

# Timestamping
"\C-xt": "\"$(date --utc +%Y-%m-%dT%H:%M:%SZ)\" "

I use readline to input the string, the same trick I use to input quotation marks. I found no ways to grab the output of an external command in readline. So, a string with Command Substitution is the only option I have.

When I press Ctrl+X then t, the string "$(date --utc +%Y-%m-%dT%H:%M:%SZ) " will be entered and it will be replaced with the timestamp by Bash.

Everyday, there are some new services are born and some have to be shut down. There is no eternity for many things. Websites certainly don't have that.

So, here is the one-liner for Gists:
page=0; while let page++; wget -q -O - "https://api.github.com/users/$USER/gists?page=$page&per_page=100" | grep -o 'git://.*\.git'; do :; done | while read git_url; do git clone $git_url; done

and for public repos:
page=0; while let page++; wget -q -O - "https://api.github.com/users/$USER/repos?page=$page&per_page=100" | grep -o 'git://.*\.git'; do :; done | while read git_url; do git clone $git_url; done

You may need to edit $USER to match your username on GitHub.

This is only for one-time run and doesn't have any error handling. You should run it in a specifically created directory for storing repos. It won't update if you add new repos afterwards. But you can add some condition check for updating when a repo is already presented in the filesystem.

It's not the time for me or anyone else to use it, since GitHub is alive and probably won't be out of business or gets closed anytime soon.

I have this thought because there is another service closed, which I used when I was still on Twitter, due to being merged into bigger company.

Six or seven years ago, I lost data on a harddisk. Since then, I have been trying not to store data on local disks. I am lazy, never want to do the backups. It's not like it's hard, the script is easy to write, just I don't like put the backup harddrive online when the system only needs it once a while when the backing up is in progress.

Certainly, you can pay some money for so-called cloud storage or just remote backup storage. It doesn't seem to matter to me, someday they will be gone and does a backup of a backup is like, well, WTH was I doing that in the first place?

Anyway, for backing up public stuff on GitHub is easy.

Updated on 2012-03-15: If you also use Bitbucket, here is the one-liner.

I always want to know who links to my blog, so I check referrer data in Blogger Stats and Google Analytics report, also set up Google Alerts. I even search for this blog's domain name to see if there is any new hits. (Use "Past 24 hours" time range, it's very useful)

But it doesn't seem to be enough for me. They always seem to be missing some links from those methods, the Alerts hasn't even got me anything for a long time.

In Webmaster Tools, you can download a CSV of link-ins by clicking a button "Download more sample links" (so, this is not complete?) in Your site on the web / Links to your site / Who links the most "More ". (Lost?)

It is a list of external links which has links to your site. Since it is long, there is no humanly way to know which are new links.

So, I wrote a simple Bash script to do the job, run it with CSV files as arguments.


You can run it with CSV files of different websites, it has no problem with that. Once the CSV files are processed, they are safe to remove. You only need to keep the first two files in the last file list in the screenshot above.

This script has a few predefined regular expression to filter out some common duplicate URLs, such as WordPress's and Blogger's archive or index-like pages. You really want to see is the posts which has link to your website in its content.