As of 2016-02-26, there will be no more posts for this blog. s/blog/pba/
Showing posts with label backup. Show all posts

Everyday, there are some new services are born and some have to be shut down. There is no eternity for many things. Websites certainly don't have that.

So, here is the one-liner for Gists:
page=0; while let page++; wget -q -O - "https://api.github.com/users/$USER/gists?page=$page&per_page=100" | grep -o 'git://.*\.git'; do :; done | while read git_url; do git clone $git_url; done

and for public repos:
page=0; while let page++; wget -q -O - "https://api.github.com/users/$USER/repos?page=$page&per_page=100" | grep -o 'git://.*\.git'; do :; done | while read git_url; do git clone $git_url; done

You may need to edit $USER to match your username on GitHub.

This is only for one-time run and doesn't have any error handling. You should run it in a specifically created directory for storing repos. It won't update if you add new repos afterwards. But you can add some condition check for updating when a repo is already presented in the filesystem.

It's not the time for me or anyone else to use it, since GitHub is alive and probably won't be out of business or gets closed anytime soon.

I have this thought because there is another service closed, which I used when I was still on Twitter, due to being merged into bigger company.

Six or seven years ago, I lost data on a harddisk. Since then, I have been trying not to store data on local disks. I am lazy, never want to do the backups. It's not like it's hard, the script is easy to write, just I don't like put the backup harddrive online when the system only needs it once a while when the backing up is in progress.

Certainly, you can pay some money for so-called cloud storage or just remote backup storage. It doesn't seem to matter to me, someday they will be gone and does a backup of a backup is like, well, WTH was I doing that in the first place?

Anyway, for backing up public stuff on GitHub is easy.

Updated on 2012-03-15: If you also use Bitbucket, here is the one-liner.

I just download the data from one of my App Engine application by following Uploading and Downloading, I used this new and experimental bulkloader.py to download data into a sqlite3 database. You don't need to create the Loader/Exporter classes with this new method

It does explain how to download and upload, but, as for, uploading is only for production server. You have to look into the command line options, it's not complicated.

Here is a complete example to dump data:
$ python googleappengine/python/bulkloader.py --dump --kind=Kind --url=http://app-id.appspot.com/remote_api --filename=app-id-Kind.db /path/to/app.yaml/
[INFO ] Logging to bulkloader-log-20091111.001712
[INFO ] Throttling transfers:
[INFO ] Bandwidth: 250000 bytes/second
[INFO ] HTTP connections: 8/second
[INFO ] Entities inserted/fetched/modified: 20/second
[INFO ] Opening database: bulkloader-progress-20091111.001712.sql3
[INFO ] Opening database: bulkloader-results-20091111.001712.sql3
[INFO ] Connecting to brps.appspot.com/remote_api
Please enter login credentials for app-id.appspot.com
Email: [email protected]
Password for [email protected]:
.[INFO ] Kind: No descending index on __key__, performing serial download
.......................................................................................................................................................................................
.................................
[INFO ] Have 2160 entities, 0 previously transferred
[INFO ] 2160 entities (0 bytes) transferred in 134.6 seconds

And the following is for upload to Development Server using the sqlite3 database which we just download (not the CSV):
$ python googleappengine/python/bulkloader.py --restore --kind=Kind --url=http://localhost:8080/remote_api --filename=app-id-Kind.db --app_id=app-id
[INFO ] Logging to bulkloader-log-20091111.004013
[INFO ] Throttling transfers:
[INFO ] Bandwidth: 250000 bytes/second
[INFO ] HTTP connections: 8/second
[INFO ] Entities inserted/fetched/modified: 20/second
[INFO ] Opening database: bulkloader-progress-20091111.004013.sql3
Please enter login credentials for localhost
Email: [email protected] <- This does not matter, type anything
Password for [email protected]: <- Does not matter
[INFO ] Connecting to localhost:8080/remote_api
[INFO ] Starting import; maximum 10 entities per post
........................................................................................................................................................................................................................
[INFO ] 2160 entites total, 0 previously transferred
[INFO ] 2160 entities (0 bytes) transferred in 31.3 seconds
[INFO ] All entities successfully transferred

You will need to specify the app id, which must match the Development server is running on.

This may be no need once the bulkloader.py is stable.

I wrote a simple Python script which utilizes gdata-python-client to download the layout templates of your Blogger blogs. You can read and download from BackupTemplate.py.

It is easy to use, just make sure you have gdata-python-client on you computer. You will be prompted for your email and password. Here is a screenshot how it is working:


Layout templates of all blogs will be saved in current directory with filename like
blogId-blogName-YYMMDD-HHMMSS.xml
You can also add a file called BackupTemplate.yaml:
email: [email]
password: [password]
So you won't be asked for account every time.