[ home ]

Publishing a Web site with a handful of bash scripts

Using Gedit to edit this page. Terminal is running on the panel

Why use a large and complex application like wordpress when all you want to do is publish about one new page per week?

I used Wordpress to publish this site from October 2004 until February 2010, just over five years. I have made a modest donation. I have never had any kind of spam or exploit problem. I have noticed that the time between upgrades is decreasing, and that the size and complexity of the WordPress application itself is increasing.

Before WordPress, I used Moveable Type, and before that I tried GreyMatter and Open Journal. The fascination was being able to write from any Web browser.

Recently, I realised that my blog posts fell into two categories: quick link posts and longer reflective pieces. The longer pieces were almost always written on this desktop PC in the back bedroom 'office' about once every two weeks. This realisation coincided with a WordPress security issue that was a bit more serious than usual. Upgrading a WordPress installation on a hosted server account is not rocket science but it can be time consuming. It involves backing up the WordPress folder itself, including all the image and document files I have uploaded (about 125Mb over the five years), then backing up and downloading the MySql database, then uploading the new script files (7.5 Mb of them with WordPress 2.9!) and upgrading the database. There have been a few glitches with the upgraded database as well, at least on this particular server.

Around this time, I set up a pinboard bookmarking account, mainly because I wanted to have a way of synchronising bookmarks accross this PC, my own laptop and the College laptop, but also because of the unusual fee model and the very clean design. It slowly began to dawn on me that I could run a very nice link-blog from the pinboard account. Maciej Cegłowski who develops Pinboard wrote an article suggesting that a server side application may not be the best way for most people to publish. He then followed this up with instructions on how to run WordPress locally using Xampp and then gather the Web pages using wget, and them mirror these static html pages up to plain non-scripted Web server space. I implemented this arrangement, and realised that

So I googled about Bash scripts to run on this Linux box. What I ended up with is

I love the fact that you can use commands in backticks to invoke the scripts that list the pages and grab the rss feed and store them as variables in a second bash script like this...

 PINBOARD=`perl rss2html.pl http://feeds.pinboard.in/rss/u:keithpeter/`      
 PAGELIST=`perl page-list.pl 2*.html`

In the directory above 'pages' on my PC, I have another 'publish' script that runs lftp with the mirror command.

Writing and publishing page on my Web site are now separate processes; the writing phase includes

  1. Write the page in Gedit using Markdown syntax and save it with a title using the file name convention
  2. Run webconvert on the file to put the file through the Markdown script and capture it as an HTML file with a header and footer, the footer containing the date.
  3. View the new html page in the Web browser.
  4. Change the page text so the words say what I want them to say.
  5. Iterate
  6. Check a day or so later

This writing process is familiar from Tex and LaTeX, editing in a text editor and 'compiling' and viewing in a dvi viewer. This feels more relaxed than typing in a textbox in a Web page. The publishing phase involves...

  1. Change up a directory and run my 'publish' script, which
  2. Runs the make-index-page script which in turn,
  3. Calls the rss2html to capture and convert my Pinboard bookmarks, and the page-list script to make a list of the Web pages in the directory, and formats these into the index page and finally.
  4. Runs the lftp mirror command that uploads all the modified pages to the Web server

The 'publish' script looks like this

 cd pages
 ./make-index-page.sh > index.html
 cd ../
 lftp -f bodmas-mirror.txt

and is run from the directory above the one with the collection of Web pages. LFTP is used in script mode and the bodmas-mirror.txt file contains commands like this...

 # lftp script file - uploads all the files in the /home/user/yourdocs/pages/ directory including subfolders 
 # but excluding certain types of file. This file is saved with a txt extension.
 lftp -u username,password ftp.yourside.com/directory-above-pages   
 mirror -R -c -v -X *.sh -X *.pl -X *.sed -X *.inc --log=/home/user/yourdocs/lftp.log /home/user/yourdocs/pages/

As you can see, I decided to prevent the script files being uploaded to the remote server using the -X *.file command. The capital X or --exclude-glob option allows the use of 'globs' like *.pl.

The great thing about off-line authoring is that I can tidy up this rather string and ceiling wax collection of hacks into a single perl script or similar at some point in the future. All I have to do is to stick to the file name conventions I've started using. No data or processes on the server, which is just serving HTML files. I chose Markdown over Textile or one of the wiki formatting syntaxes because Markdown was part of Caballero's Webconvert and because I used Markdown a lot when I published this site through MovableType.

File name convention

The 'articles' have names like this: YYYYMMDD--category_title_words.txt which are then converted to .html files by Webconvert. Encoding the date when the first draft of the page was started means that it is easy to produce a list of pages in reverse order of addition (using the 'pop' as opposed to the 'shift' keyword in the indexing script). Webconvert's makefooter script always adds the last modified date to the page.

At present, the only other kind of file is index.html. However using the --category format in the file name will make it possible to have category.html pages in future by using a regexp filter in the page indexing script.

As long as I stick to this convention, I can hack the scripts I actually use into something decent and perhaps in perl so the 'system' becomes transferable between Linux / Mac Os and Windows.

WGET to save the static appearance of the Wordpress blog

The WGET mirror command was used to grab the static appearance of the Wordpress blog as a set of Web pages with local relative links. These were then re-uploaded to the same directory on my Web space as formerly occupied by the WordPress php scripts.

I may restore the WordPress blog (or run it locally) so that I can

More on pinboard for link blogging

Weblogs started in the days before search engines. A Weblog simply logged useful or interesting pages elsewhere on the Web within a certain subject area. The same function is still needed because of the huge number of Websites available now. Google and Bing can help you find the useful material, but using someone else's searches as a starting point can save time. I use pinboard for this kind of short signposting post. I include the latest 50 pinboard posts on the bodmas index page using a script by Jonathan Eisenzopf called rss2html.pl. The script runs on my local machine and pulls out the pinboard rss feed as formatted html.

On my stock Debian Lenny system I had to install the following perl modules.

perl -MCPAN -e "install XML::Parser"
perl -MCPAN -e "install XML::RSS"

Note added 5th April 2010: I installed Ubuntu 10.4 beta ('lucid lynx') on this box and found that I needed to install the library libxml-rss-perl using aptitude, the second command above failed.

Then running the command

perl rss2html.pl http://feeds.pinboard.in/rss/u:keithpeter/ > pinboard.html

at the command prompt fetched an html version of my 50 most recent pinboard entries. As mentioned above, the command line can be issued from within a bash script by enclosing it in backticks, and then the output can be captured as a variable, and included in HTML code echoed to the prompt later in the script.

Keith Burnett, Last update: Sun Aug 28 2011