As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label timestamp. Show all posts

You may want to read a similar post for cURL, I wrote about the reason for preventing downloading same content.

1   The flow

It seems fairly easy, too:

% wget http://example.com -S -N
--2012-03-23 20:27:23--  http://example.com/
Resolving example.com... 192.0.43.10
Connecting to example.com|192.0.43.10|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 302 Found
  Location: http://www.iana.org/domains/example/
  Server: BigIP
  Connection: Keep-Alive
  Content-Length: 0
Location: http://www.iana.org/domains/example/ [following]
--2012-03-23 20:27:23--  http://www.iana.org/domains/example/
Resolving www.iana.org... 192.0.32.8
Connecting to www.iana.org|192.0.32.8|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 23 Mar 2012 12:27:24 GMT
  Server: Apache/2.2.3 (CentOS)
  Last-Modified: Wed, 09 Feb 2011 17:13:15 GMT
  Vary: Accept-Encoding
  Connection: close
  Content-Type: text/html; charset=UTF-8
Length: unspecified [text/html]
Server file no newer than local file `index.html' -- not retrieving.

You may have noticed the difference, it does not use If-Modified-Since as cURL does. This request should be a HEAD request, Wget determines whether to GET or not based on the Last-Modified and Content-Length, which was not sent by the server and this would be a problem when a server sends it with 0 in length, but actually the contents length is non-zero. A case for it is Blogger:

% wget http://oopsbroken.blogspot.com --server-response --timestamping --no-verbose
  HTTP/1.0 200 OK
  X-Robots-Tag: noindex, nofollow
  Content-Type: text/html; charset=UTF-8
  Expires: Fri, 23 Mar 2012 12:38:47 GMT
  Date: Fri, 23 Mar 2012 12:38:47 GMT
  Cache-Control: private, max-age=0
  Last-Modified: Fri, 23 Mar 2012 10:49:23 GMT
  ETag: "f5024c0a-c96f-464f-b96b-d89efdd69010"
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Content-Length: 0
  Server: GSE
  Connection: Keep-Alive
  HTTP/1.0 200 OK
  X-Robots-Tag: noindex, nofollow
  Content-Type: text/html; charset=UTF-8
  Expires: Fri, 23 Mar 2012 12:38:47 GMT
  Date: Fri, 23 Mar 2012 12:38:47 GMT
  Cache-Control: private, max-age=0
  Last-Modified: Fri, 23 Mar 2012 10:49:23 GMT
  ETag: "f5024c0a-c96f-464f-b96b-d89efdd69010"
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Server: GSE
2012-03-23 20:38:48 URL:http://oopsbroken.blogspot.com/ [44596] -> "index.html" [1]

Every time you run, it always gets updated even the content is the same.

In the infopage of Wget:

   A file is considered new if one of these two conditions are met:

  1. A file of that name does not already exist locally.

  2. A file of that name does exist, but the remote file was modified
     more recently than the local file.

  [snip]

   If the local file does not exist, or the sizes of the files do not
match, Wget will download the remote file no matter what the time-stamps
say.

If the sizes do not match, then Wget will GET the file. In case of Blogger, it returns with:

Content-Length: 0

Which is incorrect, since the contents length isnt not really zero. The problem is Wget believes it. The file length is 44596, they are not match, therefore Wget updates the file.

To avoid this, you need --ignore-length option:

% wget http://oopsbroken.blogspot.com --server-response --timestamping --no-verbose --ignore-length
  HTTP/1.0 200 OK
  X-Robots-Tag: noindex, nofollow
  Content-Type: text/html; charset=UTF-8
  Expires: Fri, 23 Mar 2012 12:42:06 GMT
  Date: Fri, 23 Mar 2012 12:42:06 GMT
  Cache-Control: private, max-age=0
  Last-Modified: Fri, 23 Mar 2012 10:49:23 GMT
  ETag: "f5024c0a-c96f-464f-b96b-d89efdd69010"
  X-Content-Type-Options: nosniff
  X-XSS-Protection: 1; mode=block
  Content-Length: 0
  Server: GSE

Now Wget does not try to get the file because of the Content-Length.

2   Issues

There are several issues or difficulties when using Wget instead of cURL.

2.1   Incompatible with -O

As you can see, I didnt use -O for specifying output file because it is incompatible with -N (--timestamping), which disables -N. You need to deal with the downloaded filename. Basically, you can use basename or it can be index.html or some bizarre names if query is presented.

2.2   When to process

You can not rely on status code, you need to check if file gets updated in either seeing timestamp change of file or parsing output of Wget to see if a file is saved. Not really a smart way to deal with.

However, if you saves timestamp, then it can be used to check and you dont really need to keep a local file. Well, not true. You still need to keep a local file since Wget need to get timestamp from local file. I cant find anyway to specify a timestamp.

3   Conclusion

I recommend using cURL instead of Wget. You may manually add request header for dealing with these issues, but using cURL is much easier. So why bother?

If you have good way to deal with the issues (not really manually do the whole process in your script), feel free to comment with codes. There may be some useful options I miss when I read the manpage and infopage.

After I added a keybinding for Bash, I think why not make it working globally?

Here is what I just added to my DWM configuration:

diff --git a/dwm-config.h b/dwm-config.h
index 1994463..517976b 100644
--- a/dwm-config.h
+++ b/dwm-config.h
 -125,6 +125,8  static const char *monitor_expand_cmd[] = { "monitorExpand.sh", NULL };

 static const char *lock_cmd[] = { "xlock", "-mode", "blank", "-startCmd", "monitorOff.sh", "-timeout", "15", "-dpmsoff", "1", NULL };

+static const char *ts_cmd[] = SHCMD("xdotool keyup t ; xdotool type --clearmodifiers $(date --utc +%Y-%m-%dT%H:%M:%SZ)");
+
 static Key keys[] = {
   /* modifier                     key         function        argument */
   { MODKEY,                       XK_p,       spawn,          {.v = bashrun_cmd } },
 -155,6 +157,8  static Key keys[] = {
   { MODKEY,                       XK_F2,      spawn,          {.v = monitor_switch_cmd} },
   { MODKEY,                       XK_F3,      spawn,          {.v = monitor_expand_cmd} },

+  { MODKEY|ShiftMask,             XK_t,       spawn,          {.v = ts_cmd} },
+
   { MODKEY|ShiftMask,             XK_m,       toggle_ffm,     {0} },
   { MODKEY|ShiftMask,             XK_r,       toggle_rules,   {0} },

It uses xdotool to send the key events. Since I bound it to t, it is necessary to send a keyup even in prior or the timestamp string will be eaten entirely or partially, depending on when I release the key. Also, --clearmodifiers is required, or the characters in timestamp string may trigger other keybindings in DWM.

You can bind the key in your Window Manager, some may not be able to run with sh but a simple executable file, you can always create a script and put the commands in that script and make it executable.

I love timestamp. I have keys in Vim for that. So I added a new keybinding for Bash shell:

# Timestamping
"\C-xt": "\"$(date --utc +%Y-%m-%dT%H:%M:%SZ)\" "

I use readline to input the string, the same trick I use to input quotation marks. I found no ways to grab the output of an external command in readline. So, a string with Command Substitution is the only option I have.

When I press Ctrl+X then t, the string "$(date --utc +%Y-%m-%dT%H:%M:%SZ) " will be entered and it will be replaced with the timestamp by Bash.

ts is a Perl script from moreutils. You use it with pipe, it receives stdout of another program. The basic idea is doing:

command | ts

Its useful when you need to know when a program processes something. But its not enough for me, I wrote a Bash script ts.sh to do more.

#!/bin/bash

DATECMD='date +%H:%M:%S'

process_stdout() {
  while read line; do
    echo -e "\e[44m$($DATECMD)\e[0m $line"
  done
}

process_stderr() {
  while read line; do
    echo -e "\e[41m$($DATECMD)\e[0m $line" 1>&2
  done
}

if (( $# == 0 )); then
  process_stdout
else
  exec 3>&1
  ( bash -c "$*" | process_stdout ) 2>&1 1>&3 | process_stderr
  exec 3>&-
fi

It can timestamp stdout and stderr of a command with different color. See the following example:

http://farm2.static.flickr.com/1380/5160212248_15b5bf58b7.jpg

The timestamps on red is the stderr of emerge, the rest on blue is the stdout. This script could also be used with pipe, but it would only timestamp on stdin, which is a commands stdout via pipe. Its best to use it like:

ts.sh command arg1 arg2 ...

An example with shell script as command:

ts.sh while true \; do echo tik tok \; sleep 1 \; done