I used to maintain a blog/forum/wall type thing at www.tedp.net. It started when I was an undergraduate at University of Toronto Scarborough. Each student was assigned something like 10MB of webspace to play with. I used Perl and CGI to create a blog, and eventually created a very rudimentary user management system so that my classmates could post as well. They could even submit a profile photo and attach pictures to their posts. The page “went live” on March 21 2002 at 1:25pm according to the HTML I wrote at the time. I suppose if I’d had any entrepreneurial ambition at the time I could have beat Facebook to the punch. I eventually moved it to paid hosting when 10MB became a pretty impractical limit.
I decided recently to take the site down, mainly because it hasn’t been actively updated in years and I didn’t want to pay for the hosting anymore. But I thought that I should maintain some record of it, as my classmates and I were pretty active in posting there for a few years. I wrote a small C# program that would connect to the MySQL database that backed the site, download all the post information, and output an HTML document containing each post as it would have been rendered on the site. A typical post looked something like this:
The HTML document I created was simply hundreds of these stacked on top of one another. All good so far. However, I also wanted to print the document to PDF, where page breaks suddenly became an issue. Simply using Chrome’s “Save as PDF” functionality would result in documents that look like this:
Notice how the break occurs in the middle of one “post”. My first stab at fixing this used the CSS page-break-after property on a div placed between each post like so:
<div style="page-break-after: always;"></div>
This was slightly better, but now each post was on its own page. What I wanted was to suggest that a page break should occur there. Turns out CSS doesn’t have a way to suggest a page break, but it does have a way to do the opposite, suggest against one. I simply wrapped each post in a div with the page-break-inside property set as follows:
<div style="page-break-inside: avoid;">
...post goes here...
I also ran into some problems with unicode characters not rendering properly. I think because in the early days I used to edit my posts in Word then copy/paste them into the website I ended up with many unicode characters from the MS Windows Character Set in the posts. They rendered with varying degrees of success but looked terrible in Chrome. I had problems with left and right single and double quotes, long dashes and upside down question marks. The reason for the last one is that for a time there was a fad amongst my group of friends for using upside down question marks instead of normal ones on instant messaging and websites like ted.net. Chris Sorensen started it all. This is the code I used to strip them out:
postText = postText.Replace("\u0091", "\u0027");
postText = postText.Replace("\u0092", "\u0027");
postText = postText.Replace("\u0093", "\u0022");
postText = postText.Replace("\u0094", "\u0022");
postText = postText.Replace("\u0096", "\u002D");
postText = postText.Replace("\u0097", "\u002D");
postText = postText.Replace("\u00BF", "\u003F");
The full PDF can be found here. It’s in chronological order, which is the opposite of how it would have been presented on the website, but makes it easier to read as a document. At first it’s just me, others join in after a few pages.