For my first real blog entry, I will talk about some important programming aspects involved in displaying Greek text on the web. There are several issues involved that are frequently handled very poorly by various web pages. This blog entry will go step by step through the process, including the actual code later on. The four parts to be considered are:
- Character encoding
- Character sets, font files and Unicode
- Preferences and customization
- Conversions
The first one involves setting the correct MIME type for the document, ensuring proper display. Without this the Greek text is likely to appear as garbage. The user can override the encoding but how many users know that this is what they are supposed to do? How many even know how to do it?
The user can change the encoding on the View menu, selecting Encoding (or Character Encoding in Firefox) and then selecting Unicode (UTF-8). For an example of what erroneous encoding can do to your Greek display, check out Justin’s First Apology here: http://khazarzar.skeptik.net/books/justinus/apolog1g.htm
They selected a Cyrillic encoding and the result is obvious. If you change the encoding to UTF-8 the text suddenly becomes legible (provided you can read Greek to begin with, of course.) Notice also how the German on that first page becomes legible as well.
They should have set the encoding in their web document and freed the, quite possibly clueless, user from the rather cryptic task. It is a good practice to start every web page with the following unless you have a very good reason not to.
For HTML, start every document with this line:
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
This ensures that the text is displayed correctly, no matter what language you are using. For server-side scripting it is much the same, here is what it looks like in Perl:
print "Content-type: text/html;charset=utf-8 \\n\\n";
Make sure that you include the two newline characters or it will fail. For other scripting languages, check here: http://www.w3.org/International/O-HTTP-charset
Next, we will discuss the character sets, font files and Unicode. The official Greek Unicode chart can be found here: http://www.unicode.org/charts/ Note that there are two charts, both being 16 bits, the first ranges from 0×370 to 0×3ff and covers the regular upper case and lower case letters as well as the characters with tonoi. The second chart ranges from 0×1f00 to 0×1fff and covers all the lower and upper case letters with their diacritical marks. The way they are laid out is pretty decent and helps when converting characters. Most of the well-known Greek fonts available for download cover these characters. At least one standard Windows font also covers the entire range (Tahoma.)
This ties into our third point neatly. Everybody has different tastes in Greek fonts. I, personally, like the Tahoma font because it is clean, crisp, widely available and looks good when displayed in a normal size. I find it fairly essential that users are allowed to customize the font choice if the site is heavily dependent on Greek characters. There is really no excuse not to do this since it is rather uncomplicated. I won’t talk much about server-side font selection since there are about a thousand ways of doing this and if you know how to do server-side programming then you don’t need me to explain to how to work the font selection. Much can be done client-side, however, using Javascript and the Document Object Model.
The method I have chosen is to modify the global stylesheet although there some cross-browser issues. It is also rather fuzzy since the entries look like JSON but they really aren’t. This is a problem with most objects that didn’t originate from the Javascript core, it looks like a duck, quacks like a duck, but try and treat it like a duck and you’ll be sad. I will probably write more on this issue on a future date, especially the problems with the Array object as returned from the DOM and other places. Anyways, this is all solvable as will be seen below.
When allowing the user to select the font you should also be kind enough to remember his selection and set it upon his next return. Let’s first present the complete but simple example of how to do all this.
<html>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<head>
<style type=text/css>
/* Mark all elements that display Greek with class=greek
* You can add whatever elements you want in addition to the font-family
*/
.greek
{
font-family: Tahoma;
}
</style>
<script language=JavaScript>
function setFont( fontName )
{
var theRules = new Array ();
if ( document.styleSheets[ 0 ].cssRules )
theRules = document.styleSheets[ 0 ].cssRules
else if (document.styleSheets[ 0 ].rules)
theRules = document.styleSheets[ 0 ].rules
for ( var n = 0; n < theRules.length; n++ )
if ( theRules[ n ].selectorText == '.greek' )
theRules[ n ].style.fontFamily = fontName;
setCookie ( 'userFontName', fontName );
}
function setCookie ( cName, cValue )
{
var exdate = new Date();
exdate.setDate ( exdate.getDate() + 365 ); // Set for one year.
document.cookie = cName+ '=' + escape ( cValue ) + ';expires=' + exdate;
}
function getCookie ( cName )
{
if ( document.cookie.length > 0 ) // Are cookies turned on?
{
start = document.cookie.indexOf ( cName + '=' )
if ( start != -1 )
{
start = start + cName.length + 1;
end = document.cookie.indexOf ( ";", start );
if ( end == -1 ) end = document.cookie.length;
return ( unescape ( document.cookie.substring ( start, end ) ) );
}
}
return ( null );
}
function winLoad ()
{
if ( ck = getCookie ( 'userFontName' ) )
{
setFont ( ck );
var fontSelect = document.getElementById ( 'fontSelect' );
for ( n = 0; n < fontSelect.options.length; n++ )
if ( fontSelect.options[ n ].value == ck )
fontSelect.selectedIndex = n;
}
}
</script>
</head>
<body onload="winLoad();">
<div class=greek>
ἐπειδήπερ πολλοὶ ἐπεχείρησαν ἀνατάξασθαι διήγησιν περὶ τῶν πεπληροφορημένων…
</div>
<div>
Regular text here…
<select id=fontSelect onChange=”setFont ( this.options[ this.selectedIndex ].value );”>
<option value=Tahoma>Tahoma
<option value=SPIonic>SPIonic
</select>
</div>
<div class=greek>
καθὼς παρέδοσαν ἡμῖν οἱ ἀπ’ ἀρχῆς αὐτόπται καὶ ὑπηρέται γενόμενοι τοῦ λόγου
</div>
</body>
</html>
If you want to try out this program, make sure that you save the document in a format that supports Unicode. Word or Wordpad will both do this, just pick Save As… and change the Save as type… If you see garbage on your screen, you saved it in a format that doesn’t support wide characters.
The body of the program is pretty simple. There is a DIV tag marked as containing Greek text (you could mix the greek in with other languages as long as the font selected has those characters), then a select which allows you to SELECT a font and then another section of Greek. You can add as many fonts in the SELECT as you like.
We have an onload event for this document. It gets the cookie (if it exists), changes the stylesheet and makes the SELECT start with the current font selection. The cookie is set for a year, simply change the 365 to some other numbers of days if you wish. The setFont function goes to the first stylesheet, finds the ‘greek’ class and sets the font. It also updates the cookie.
That’s it. Nothing to it. Anyone is free to copy the above and use it as they see fit.
So now we can display the font properly, we know the character set layout, we can let the user select a font and remember it for future use. What’s left? The hardest part, as a matter of fact.
Conversion is an interesting topic. When I say conversion, I mean conversion between upper case, lower case, betacode, stripping diacriticals, HTML character entities and so on… It is entirely lame that I have to transliterate Greek into betacode on some sites in order to do a search when I have the Tavultesoft Keyman (which I highly recommend to everyone, it is excellent and free) installed.
I don’t know of any conversion programs out there, I searched, so I ended up writing my own in Perl. I was going to post it as part of this entry but I am realizing that it is not yet quite ready for public consumption. If you need it in a hurry let me know, otherwise I will simply post a link to it here once it is finished, which won’t be long. Really. It essentially does all the conversions I mentioned above. It came in handy when trying to marry up the MorphGNT and XML version of Strong’s, which is a story in itself. I will relay that in one of my next entries. The MorphGNT is actually surprisingly accurate, more so than many other GNT sites and tools that I have seen. The Strong’s…? Not so much. That blog entry will also give me a chance to rant about the poor use of computers, the NA27, ridiculous pricing and some fairly pathetic approaches to the whole technology issue with regards to biblical studies.
For now, this was my first entry. I doubt anyone will read this far. If you did, then I hope I have been of some assistance. I have worked with this for a while now and have gathered some knowledge in this area, so any questions are welcome, since I realize that this was a rather short entry that left out a large number of details.
Julian