Skip to main content

Blog Backup & Wget

I posted a question to blogger support about a week ago about backing up my blog. I had realized that I had all this work out there, but I was completely relying on their servers to keep my work safe. I also had no easy way to move the data over to my own FTP server if I ever wanted.

This was a few days before weblogs.com abruptly closed.

Blogger posted a solution on their help site under advanced topics. Their solution was to make some complicated manual configuration change to put all your blog entries on one page, then save it. I was not in a rush to do this. I was tempted to just go out and browse each entry and save it.
I had also set the option to have blogger email me each post, and I set up a filter for the emails. But there is a problem with this solution: you only get the initial entry, you don't get any updates if you edit an entry.

Then I suddenly realized that I might be able to use wget to backup my blog. Wget is a line command utility that can be used to fetch a web page.
It turned out to be even easier than I expected! All I had to do was add the '-r' recurse line command switch to the program, and it traced through all my entries and saved them to relative files on my hard drive. I was really impressed by all the great options in wget.

But I decided I wanted more. I wanted to backup my image files too (posted on my ISP web server and Hello's photos1.blogger.com server). So I wrote a shell script to extract all the image files, and put them in a list file to use with wget.
Then I updated the script to do my site wget first. And I set both wget commands to only do newer files. I run it on my windows machine under cygwin. I would imagine it would run fine under linux or other UN*X platforms.
I am sharing my script under a GNU-like open source: getmyblog.

For those running windows without cygwin, I found that there is a windows port of wget (and a bunch of other un*x utilities) at unxutils.sourceforge.net. I have tested the wget.exe program, it it works great for vanilla windows backup of a blog site, for just the html files.

On an interesting note, I had been playing with wget a few days before as a way to post xml-rpc to blogger's API (which is in deprecation - the old API will be going away). It worked very well for a simple command, but I did not test creating an entry with it. I imagine it would not be hard to make a script to upload files backed up for a blog via wget going to the xml-rpc.

Comments

Keith Horowitz said…
Update - it seems that blogger is sending an email on at least some updates. I don't know if they made a change, or I just missed it on a few edits, or if it matters what you edit.
Hm - I made a minor correction to the latest entry, but I didn't recieve an updated email. Perhaps it was because it was during the same minute as the original post? Anyway - I still say you can't count on email as a backup for your blog, but it does add to the safety.

Popular posts from this blog

Hiking Blodgett Peak 12/25/2005

So Christmas day Sunday, but we don't do Christmas. And a day off Monday. No real plans until Sunday evening for Channukah. My toe finally feels well enough for a hike. Blodgett Peak has been calling to me for months - especially since I learned there was a geocache on top. So I get up a bit early - early for a day off from work - and head out for a hike. I don't know how far I'll get - but I want to at least get to the top of Blodgett Peak. I've got about 8 geocaches I can try for, depending on how I do. A couple are up in Pike National Forrest, past Blodgett Peak. It is slower going than I expected. I spend more time than I wanted looking for the first 4 geocaches - I only found 2 of them. The trail is Icy and muddy. It is not a great trail - it is not well prepared like the trail going up Pikes Peak. It is very easy to lose the trail - subtle paths seem to go off in many directions. In many places, the trail seems to go up very steep, loose gravel. Step

1000 Greatest Movies

Found on Misanthropic-Tendencies From the NY Times - The Best 1,000 Movies Ever Made I've highlighted the ones I've seen from the list. As it is a big list, I've set it to be hidden. I've added some favorite quotes to ones I've seen. Show/Hide the list below A À Nous la Liberté (1932) About Schmidt (2002) Absence of Malice (1981) Adam’s Rib (1949) Adaptation (2002) The Adjuster (1991) The Adventures of Robin Hood (1938) Affliction (1998) The African Queen (1952) L’Age d’Or (1930, reviewed 1964) Aguirre, the Wrath of God (1972, reviewed 1977) A.I. (2001) Airplane! (1980) "I picked the wrong week to give up sniffing glue" Aladdin (1992) "Poof! Whaddya want?" Alexander Nevsky (1939) Alice Doesn’t Live Here Anymore (1975) Alice’s Restaurant (1969) Aliens (1986) '...In space no one can hear you scream.' All About Eve (1950) All About My Mother (1999) All Quiet on the Western Front (1930) All That

Death in the Family

My father passed away on Sunday afternoon. He had Alzheimers for a number of years. I keep explaining to everyone who expects me to be devastated that I've already done a bunch of mourning. I know the service on Friday will be sad and emotional, but so far I haven't cried. I last saw him this past summer. I am sure he recognized me, but he was already showing that he didn't know who I was. So in my mind, I started to say goodbye then. When I went to my uncle's funeral a few months ago, and saw much cousins speaking about their father, I realized that I would be in that same spot in not too long. Sometimes with Alzheimers, the person's body stay much longer after their mind has gone. My dad was just at that point of slipping both complete beyond who he was and beyond the point of living comfortably. He was no longer happy. So it was a very good time for him to have his final rest. I was glad my sister was able to be with him to the end. His lung