Skip to main content

Blog Backup & Wget

I posted a question to blogger support about a week ago about backing up my blog. I had realized that I had all this work out there, but I was completely relying on their servers to keep my work safe. I also had no easy way to move the data over to my own FTP server if I ever wanted.

This was a few days before weblogs.com abruptly closed.

Blogger posted a solution on their help site under advanced topics. Their solution was to make some complicated manual configuration change to put all your blog entries on one page, then save it. I was not in a rush to do this. I was tempted to just go out and browse each entry and save it.
I had also set the option to have blogger email me each post, and I set up a filter for the emails. But there is a problem with this solution: you only get the initial entry, you don't get any updates if you edit an entry.

Then I suddenly realized that I might be able to use wget to backup my blog. Wget is a line command utility that can be used to fetch a web page.
It turned out to be even easier than I expected! All I had to do was add the '-r' recurse line command switch to the program, and it traced through all my entries and saved them to relative files on my hard drive. I was really impressed by all the great options in wget.

But I decided I wanted more. I wanted to backup my image files too (posted on my ISP web server and Hello's photos1.blogger.com server). So I wrote a shell script to extract all the image files, and put them in a list file to use with wget.
Then I updated the script to do my site wget first. And I set both wget commands to only do newer files. I run it on my windows machine under cygwin. I would imagine it would run fine under linux or other UN*X platforms.
I am sharing my script under a GNU-like open source: getmyblog.

For those running windows without cygwin, I found that there is a windows port of wget (and a bunch of other un*x utilities) at unxutils.sourceforge.net. I have tested the wget.exe program, it it works great for vanilla windows backup of a blog site, for just the html files.

On an interesting note, I had been playing with wget a few days before as a way to post xml-rpc to blogger's API (which is in deprecation - the old API will be going away). It worked very well for a simple command, but I did not test creating an entry with it. I imagine it would not be hard to make a script to upload files backed up for a blog via wget going to the xml-rpc.

Comments

Keith Horowitz said…
Update - it seems that blogger is sending an email on at least some updates. I don't know if they made a change, or I just missed it on a few edits, or if it matters what you edit.
Hm - I made a minor correction to the latest entry, but I didn't recieve an updated email. Perhaps it was because it was during the same minute as the original post? Anyway - I still say you can't count on email as a backup for your blog, but it does add to the safety.

Popular posts from this blog

Hiking Blodgett Peak 12/25/2005

So Christmas day Sunday, but we don't do Christmas. And a day off Monday. No real plans until Sunday evening for Channukah. My toe finally feels well enough for a hike. Blodgett Peak has been calling to me for months - especially since I learned there was a geocache on top. So I get up a bit early - early for a day off from work - and head out for a hike. I don't know how far I'll get - but I want to at least get to the top of Blodgett Peak. I've got about 8 geocaches I can try for, depending on how I do. A couple are up in Pike National Forrest, past Blodgett Peak. It is slower going than I expected. I spend more time than I wanted looking for the first 4 geocaches - I only found 2 of them. The trail is Icy and muddy. It is not a great trail - it is not well prepared like the trail going up Pikes Peak. It is very easy to lose the trail - subtle paths seem to go off in many directions. In many places, the trail seems to go up very steep, loose gravel. Step

1000 Greatest Movies

Found on Misanthropic-Tendencies From the NY Times - The Best 1,000 Movies Ever Made I've highlighted the ones I've seen from the list. As it is a big list, I've set it to be hidden. I've added some favorite quotes to ones I've seen. Show/Hide the list below A À Nous la Liberté (1932) About Schmidt (2002) Absence of Malice (1981) Adam’s Rib (1949) Adaptation (2002) The Adjuster (1991) The Adventures of Robin Hood (1938) Affliction (1998) The African Queen (1952) L’Age d’Or (1930, reviewed 1964) Aguirre, the Wrath of God (1972, reviewed 1977) A.I. (2001) Airplane! (1980) "I picked the wrong week to give up sniffing glue" Aladdin (1992) "Poof! Whaddya want?" Alexander Nevsky (1939) Alice Doesn’t Live Here Anymore (1975) Alice’s Restaurant (1969) Aliens (1986) '...In space no one can hear you scream.' All About Eve (1950) All About My Mother (1999) All Quiet on the Western Front (1930) All That

Nobody Blogrolling Me?

Accoriding to wholinkstome there are a bunch of links to me. I know only some people use blogrolling, and I've seen other people have me in their lists of linked blogs. I have 30+ people in my blogrolling list on the side ---> But wholinestome shows that nobody appears to be linking to here. Should I care? Should I remind others to blogroll me? (Feels like begging or nagging.) Should I delete the ones I blogroll who use blogroll and don't blogroll me? (Seems petty.) Should I assume something is wrong with wholinksme? (I've peaked at a couple sites, I'm not there, do I need to do a full inventory?) Should I assume something is wrong with blogroll for my site, and people just can't add me for some reason? (Could it be that simple?) Maybe it is because I haven't done '100 things' yet. :) What would you do? :)