As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label cURL. Show all posts

A week ago, I removed all packages relating to Java from my system. I thought I had absolutely no need of them until I tried to minify CSS, via Makefile, using YUI Compressor which is written in Java. The same situation would have happened when I need to minify JavaScript files, I use Google Closure Compiler, same, written in Java.

Oh, right, thats why I still kept the Java runtime environment, the open source one, IcedTea, I thought to myself.

At this point, you can bet that I want to get rid of Java for real. For IcedTea binary package, its about 30+ MB plus near 9 MB of YUI Compressor and Google Closure Compiler. To be fair, they dont use a lot of space, but I just dont like to have Java on my system since there are only two programs need it. Besides, in order to have IcedTea installed, it pulls two virtual packages and two more packages for configurations.

So, my choice was to use online ones as I already know Google hosted one at http://closure-compiler.appspot.com/ and there is also a popular Online YUI Compressor hosted by Mike Horn.

Its only a matter of commands for achieving Java-less, using curl, first one is for YUI Compressor, second one is for Google Closure Compiler:

curl -L -F type=CSS -F redirect=1 -F 'compressfile[]=input.css; filename=input.css' -o output.min.css http://refresh-sf.com/yui/
curl --data output_info=compiled_code --data-urlencode js_codeinput.js http://closure-compiler.appspot.com/compile > output.min.js

Replace the input and output filenames for yours. If you need to pipe, then use - instead of inputfile, - indicates the content comes from standard input.

Although YUI Compressor can also minify JavaScript, however I found Google Closure Compiler does better job by only a little. If you use YUI Compressor for JavaScript as well, simply change to type=JS.

Both (Online) YUI Compressor and Google Closure Compiler have some options, you can simply add to the command. It shouldnt be hard since you have a command template to work from. I only use the default compression options, they are good enough for me.

I was reading UTF-8 and Unicode FAQ for Unix/Linux, I found many links are dead. Thats the beginning of why I wrote this scrip, linkckr.sh.

http://farm6.static.flickr.com/5100/5416743679_3bb1f5e404_z.jpg

Give it a filename or a URL:

./linkckr.sh test.html
./linkckr.sh http://example.com

It does rest for you. You might want to tee, because there is no user interface, it prints results. If a page has many links, you may flood the scrollback buffer. The script is simple, actually, it does too much for me. (Ha, who needs coloring.)

I dont grep the links from HTML source, there always is a missing point in regular expression or the regular expression looks like Hulk. I decided to see if I could use xmllint to get valid links. It means only from normal <a/>, not those hidden somewhere or using JavaScript to open, nor URLs in HTML when you read it plainly with interpreting as HTML. It only takes /HTTPS?/ URLs to check.

The checking is using cURL and only used HEAD request, so you might get 405 and this script does not re-check with normal GET request. Also, those return 000, which might mean timeout after 10 seconds waiting for response. If a URL is redirected with 3xx, then cURL is instructed to follow up, and the last URL is shown to you.

There are few interesting points while I wrote this script. Firstly, I learned xmllint can select nodes with XPath:

xmllint --shell --html "$1" <<<"cat //a[starts-with(href,'http')]"

And standard input will be seen as command input in xmllints shell.

Secondly, cURL supports output format using -w:

curl -s -I -L -m 10 -w '%{http_code} %{url_effective}\n' "$url"

Note that even you specify a format, the headers of requests are still printed out. The output with the format is appended at last. The script retrieves the last line using sed '$q;d', if you are not familiar with such syntax, you should learn it. sed is quite interesting. Then it parses with built-in read, another interesting I have learned by myself long ago. Using cut is not necessary and its not so good, though read would have problem with additional spaces if those have significant meaning.

The rest is boring Bash. There is a bug I have noticed, the HTML entity in link, that would cause issue.