Settlers 3 is a PC strategy game from the late 90s that I spent countless hours in. It included a very detailed manual on the disc, full of images, tips, and stats. I’ve begun playing the game’s 2018 remaster, and discovered the manual’s original site format was not available anywhere online… until now!
Here’s a few sample pages:
As with this blog and my portfolio, the manual is hosted on GitHub pages. I had to make a few changes to get the site working, and documenting them here might help similar projects.
The manual was originally created to run on Windows, and as such had a mixture of uppercase and lowercase filenames. When running on a server, this suddenly became an issue, as one page might link to the incorrect casing of a page!
Luckily I’ve used Ant Renamer many, many times before and it can handle bulk renaming of a thousand or so files easily:
Now the files were all consistent, the links needed to be fixed! I had a bit of trouble with this, but got there eventually with VS Code’s regex find and replace.
I wrote a regex to find every string between
a href="/EXAMPLE/example.htm) using
(?<=href=)(.*?)(?=\.htm). I then replaced all of these with a lowercased version (
\L$1). In most cases this didn’t change anything, but it fixed 60-70 links that were broken. I then repeated this process with
Finally, I did a simple find and replace for
.htm, to make sure all the links went to the correct extension.
Now that navigation was sorted, and the sites looked good, I uploaded them. I sorted out GitHub pages, the DNS records etc, thinking I was done, and checked it in the browser. Only to be greeted by… question marks sprinkled liberally over the sites, especially the German one!
It worked perfectly on my machine, but got filled with question marks when served from GitHub pages. Hmmm…
This ended up taking far too long to diagnose, but I eventually discovered the sites were encoded in
utf-8. Detecting encoding is much trickier than filetype, as a file can be valid in multiple encodings at once, there’s no guaranteed way to tell. My local files were ambiguous, and Windows / Chrome was figuring it out, but my remote files were explicitly UTF-8 like almost every other page on the internet.
VS Code and Notepad++ both have “convert encoding” functionality, but neither seemed able to actually do the job, and certainly not in bulk. Luckily, File Encoding Checker can do it easily:
Finally, I added a quick and dirty language selector linking to the 3 manuals. Done!