As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label url. Show all posts

If you are a user of Pentadactyl, you might have noticed whenever you press y on a video of YouTube, you always gets the shortlink. It's been a few months since I first noticed this nice feature.

Recently, I wanted to learn about it, before I dug into codes to find out this fantastic feature is from, I always assumed that's somehow YouTube's effort. I thought there must be some way to intercept URL copying in address bar. Some hook, I supposed. But I was wrong totally, however, gladly, I didn't dive into those minified JavaScript codes.

I first discovered YouTube have a shortlink relation in its HTML,
<link rel="shortlink" href="http://example.com/promo">
This really gave me a hint, so I read Pentadactyl's changes and saw this addition:

Added 'yankshort' option. [b8]

 It is all clear to me now. By default,
:set yankshort
--- Options ---
  yankshort=youtube.com,bugzilla.mozilla.org
There seems no options to disable shortlink yanking unless you remove the site from yankshort option, if you really need the full URL and don't want to copy from address bar or statueline. A quick resolution is to map a new key to yank full URL:
map Y :yank gBrowser.contentDocument.location.href<CR>
I really have no idea why I thought YouTube did that. Anyway, I think browsers should provide a link copying button or menu item with shortlink as an option, I saw none in Firefox. I guess this is the reason that YouTube provides shortened link option in share panel.

When I was trying out latest features of Blogger, I didnt think much about Custom Redirection. The only uses I had are for when you delete a page or a post, or you re-post with typo correction.

1   For a page or a post

Last a couple of day, I realized we can do more with it. For example, many bloggers have About page, as you know the pages on Blogger have slightly different URL path, which is:

/p/page-slug-name.html

I will bet some people dislike /p, least to say, and probably page-slug-name.html as well. With redirection, you can set up one or all of the following as redirected URLs:

/PageSlugName
/page-slug-name
/Page%20Slug%20Name

Imagine, you have /About instead of this awkward /p/about.html. It is easier to type. The last one redirects when a URL like /Page Slug Name is entered, the browsers will automatically to convert space to %20 to meet the specification.

As for post, its the best way to get rid of /????/??/blah.html. If you have some outstanding post, create a redirection for it with an easy memorable name for yourself.

2   For an index

I recently started two series which is labeled with special labels Song of the Day and Bash Scripting Made Easy. On blogger, you can click on a label and get all posts which are labeled with it. That feature is part of searching functional and its URL is little longer. For instance:

/search/label/Song%20of%20the%20Day

With the redirection, I can shorten it to just:

/SotD

Isnt that much easier to remember and to type when you want to share with friends. You dont need to find the post and copy the link. You just type in.

3   For a label feed

You can also shorten the URL of a label feed, which is way too long and no one can ever memorize it:

/feeds/posts/default?category=Bash%20Scripting%20Made%20Easy

to:

/BSME-Feed

4   For bizarre URLs

In Google Webmaster Tools, it offers you a report of 404 URLs, which are linked from within your website, for example:

2012-03-26--12:45:32

I know some of them are parsed from inline JavaScript, which I embed in posts. The Googlebot really works to hard to find links which may not even be valid ones.

I can either obscure the link look-alike in JavaScript, which requires me editing and I dont want to; or I can set up a redirection for it to get rid of the error reports. (Or just leave it there, I chose this way ;)

5   Final thought

You can redirect anything to anything, that about sums it up.

I was reading UTF-8 and Unicode FAQ for Unix/Linux, I found many links are dead. Thats the beginning of why I wrote this scrip, linkckr.sh.

http://farm6.static.flickr.com/5100/5416743679_3bb1f5e404_z.jpg

Give it a filename or a URL:

./linkckr.sh test.html
./linkckr.sh http://example.com

It does rest for you. You might want to tee, because there is no user interface, it prints results. If a page has many links, you may flood the scrollback buffer. The script is simple, actually, it does too much for me. (Ha, who needs coloring.)

I dont grep the links from HTML source, there always is a missing point in regular expression or the regular expression looks like Hulk. I decided to see if I could use xmllint to get valid links. It means only from normal <a/>, not those hidden somewhere or using JavaScript to open, nor URLs in HTML when you read it plainly with interpreting as HTML. It only takes /HTTPS?/ URLs to check.

The checking is using cURL and only used HEAD request, so you might get 405 and this script does not re-check with normal GET request. Also, those return 000, which might mean timeout after 10 seconds waiting for response. If a URL is redirected with 3xx, then cURL is instructed to follow up, and the last URL is shown to you.

There are few interesting points while I wrote this script. Firstly, I learned xmllint can select nodes with XPath:

xmllint --shell --html "$1" <<<"cat //a[starts-with(href,'http')]"

And standard input will be seen as command input in xmllints shell.

Secondly, cURL supports output format using -w:

curl -s -I -L -m 10 -w '%{http_code} %{url_effective}\n' "$url"

Note that even you specify a format, the headers of requests are still printed out. The output with the format is appended at last. The script retrieves the last line using sed '$q;d', if you are not familiar with such syntax, you should learn it. sed is quite interesting. Then it parses with built-in read, another interesting I have learned by myself long ago. Using cut is not necessary and its not so good, though read would have problem with additional spaces if those have significant meaning.

The rest is boring Bash. There is a bug I have noticed, the HTML entity in link, that would cause issue.

I made two small Bash scripts. They are very short and simple, so I just paste them here:

chk-url-browser.sh

#!/bin/bash
# A script to check if select a url, if so, then open it in browser

# Or http://search.yahoo.com/search?p=
# Or http://search.live.com/results.aspx?q=
SEARCH_ENGINE="http://www.google.com/search?q="
# Or use firefox, opear, etc
LAUNCHER=xdg-open

url=$(xsel -o)
[[ $(egrep "(ht|f)tps?://" <<< $url) ]] || exit 1
$LAUNCHER "$url"

terminal-search.sh

#!/bin/bash
# A script to search text from X clipboard

# Or http://search.yahoo.com/search?p=
# Or http://search.live.com/results.aspx?q=
SEARCH_ENGINE="http://www.google.com/search?q="
# Or use firefox, opear, etc
LAUNCHER=xdg-open

text=$(xsel -o)
text=$(python -c """import urllib ; print urllib.quote('$text')""")
$LAUNCHER "$SEARCH_ENGINE$text"

If you are using VTE-based terminal, then you dont need the first one. If you may also not need it, check out this post.

The second one also relies on Python to do URL encoding. I dont know if there is a common program for that, but Python is quite common too.

I bind Win+R to chk-url-browser.sh and Win+G to terminal-search.sh in Fluxbox, using the following in ~/.fluxbox/keys:

# Open url
Mod4 R :Exec ~/bin/chk-url-browser.sh
# Search from terminal
Mod4 G :Exec ~/bin/terminal-search.sh