The Karma Project: Code Less, Teach More

June 29, 2009

i18n Issues

Filed under: News — bryanwb @ 6:42 am

I want to keep the directory and file structure  for a Karma lesson as simple as possible. Ideally, the html markup, css, and javascript should not be locale-specific. All translated text should be in .po files. I have been looking around at different i18n strategies for web sites and they all appear to only facilitate i18n for text, not for audio neither for right-to-left reading layouts. I haven’t looked carefully at Adobe Flex’s i18n mechanism. Perhaps it provides support for those aspects of i18n.

I have done some research on i18n support for websites and desktop software. I want to thank Sayamindu Dasgupta for being extremely helpful. He sent me many links but the best is perhaps the GNU manual on Gettext.

Various i18n frameworks handle i18n text quite well but they all seem to have 3 significant holes.

1. They lack native digit support. Most people don’t know this but many countries use different characters for the digits 0-9

For example, in Nepali script the digit 5 is . Javascript lets you print as a string, but there is no obvious way to do the following:

var myScore = 555;

$(“scoreBox”).html(myscore);

where myscore displays using the Nepali digits

2. They lack an easy way support to right-to-left reading layouts. It seems to me that most web frameworks require you to use different css files for different locales. This isn’t very elegant.

3. Sound — No frameworks have an easy mechanism to link to different sound files depending on locale. This doesn’t map seamlessly to the gettext paradigm as .po files would have to hold links to external sound files. It isn’t feasible to embed binary sound files directly into .po files.

Here are some ideas I have for the above problems. Please comment!


Supporting Native Digits

I hoped their would already be simple javascript library for this but they all seems to focus on changing the periods and commas according to locale, such as jquery-i18n. Sayamindu directed me to this feature of GlibC which does appear to convert the digits to the correct localized character. I still haven’t dug into the GLibC library to understand how it works its magic. I the meantime I have a cruder idea on how to output native digits.

I could write  simple javascript method that parses the digits to be printed and adds each digit to the unicode base for a given language. For example,

Devanagari numerals (Nepali, Hindi) use the range 0966 – 096F for 0-9 and Arabic uses 06F0-06F9. I could simply  add the base for the particular writing script and print out the result as a unicode string.

function kL10nDigit (digit, lang) {

baseNoLang = lookupBaseNumber(lang);

var myDigit = baseNoLang + digit;

return  “\u” + myDigit;

}

var myscore = 555;

var listMyscore = makeIntoList(myscore);

$(“scoreBox”).html($.map(myscore, kL10nDigit));

The map function here returns the 555 as the unicode string “\u0970\u0970\u0970” or ५५५ and $(“scoreBox”).html(….) sets the content of the scorebox to that value.

Some Nepalis I have met argue that Nepali kids don’t really need to learn the Devaganari digits and I have met others who feel very passionately that it is essential. This is just another example of how many people have very different but strongly held beliefs about what seem like a small detail to many of us. As software developers, it isn’t my role to decide what is taught but to empower educators and communities to make those decisions for themselves.


Right-to-Left Layouts

Most websites seem to use separate CSS stylesheets when changing between left-to-right and right-to-left layouts. I would really like to avoid this if at all possible. Redundant stylesheets means redundant code, which means codebase fragmentation and more room for errors to creep in. I would really like to label a set of HTML elements and be able to simply reverse the order according to locale.

For Example:

The main CSS file — karma.css

.horizontal_elements { direction: ltr; }

The HTML — index.html

<div class=”horizontal_elements”>

<button id=”btn1″ /> <button id=”btn2″ /> <button id=”btn3″ />

</div>

For a right-to-left script, I want to reverse the order of the buttons but not the order of all the elements on the page. That would turn the page effectively upside down. Ideally, I could just change the direction property in .horizontal_elements to “rtl” . Unfortunately that doesn’t work and I don’t know why.

I could write a simple javascript method that checks if the locale is right-to-left and if true reverses the order of all elements w/in a div elemen having the class “horizontal_elements” . This is feasible but it would be much nicer to do it by changing the “direction” property for the <div> element.

Sound

Sorry, don’t have time to write about that this morning! next blog post 😉


Miscellaneous

GNU gettext seems to assume that the first version of a piece of software will have an English language locale. Can we use the html element id as the msgid in the .pot file?

What about text that shouldn’t be locale specific? for example, portions of an English lesson that are meant to be in English regardless of the locale?

A key priority, trying to adhere as closely as possible to the Model-View-Controller design pattern. Why? Because it is easy to read and understand code that follows MVC and this pattern makes lets teams easily divide up work between less technical designers and their programmer colleagues.

UPDATE #1:

Thanks to Guy Sheffer and Tzafrir Cohen for showing me how to use the “dir” attribute in <html dir=”rtl”> to set the text direction for document. That certainly works for buttons but doesn’t seem to work for <label> or <span> elements. Will have to investigate further.

UPDATE #2:

I have found that <label> and <span> element do change direction properly is the language embedded in them contains text written RTL instead of English, which I was using.

Advertisements

11 Comments »

  1. We are happy to help.
    I was actually just including in OLPC Israel report, that there are major bugs in sugar regarding RTL. I have no idea how places with Arabic and Urdu speakers get a long with them.

    Comment by GuySoft — June 29, 2009 @ 4:20 pm

    • Please do let me know when you publish the report. I would love to take a stab at the problem.

      Comment by Sayamindu — June 29, 2009 @ 9:23 pm

  2. Why not use two CSS files, one for the main design and one for directionality?
    This way if you change something you don’t have to change it in all CSS files.

    Comment by Ori Idan — June 29, 2009 @ 4:31 pm

    • I can use two css files as long as the one for directionality __only__ affects directionality. The idea in my little head is that folks like our team in Nepal –where we use a LTR script — can create learning activities that can be changed in very easy ways to accomplish RTL scripts like Hebrew and Arabic. Can you make a suggestion on how best to do this? your help is much appreciated.

      Comment by bryanwb — June 30, 2009 @ 12:55 am

      • The second CSS should be used only for directionality and should be included in the HTML after the standard CSS so it can change the directionality of some of the classes.

        Comment by Ori Idan — June 30, 2009 @ 5:37 am

      • Ori, could you send me a link to a good example?

        תודה

        Comment by bryanwb — June 30, 2009 @ 5:59 am

      • I am not sure I have an exact example but I think http://www.w3c.org.il can serve as an example. This is a website I built based on a given design, the given design was left to right and I had to do it right to left for hebrew so I added another CSS. However the CSS I added has changed many attributes of the classes so that is not the best example.

        Comment by Ori Idan — June 30, 2009 @ 6:58 am

  3. “GNU gettext seems to assume that the first version of a piece of software will have an English language locale. Can we use the html element id as the msgid in the .pot file?”

    Sure you can. However, there needs to be some way of telling the translator what the original string is. You can use the English strings as comments for the msgid block to do that.

    Comment by Sayamindu — June 29, 2009 @ 9:22 pm

    • I know they use .pot files in gallery2, and drupal. You can look at the code there.

      Comment by GuySoft — June 30, 2009 @ 10:17 am

      • PHP has built in support for GNU gettext.
        I think you have to enable it and put each string in a conversion function like: _(“string”)
        The underscore is an alias to the function.
        This is similar to the way it works in C.

        Comment by Ori Idan — June 30, 2009 @ 10:20 am

      • אורי@

        Karma is a simple framework to run interactive learning activities offline, so using gettext w/ php or another server-side scripting language isn’t an option.

        Loading the strings will have to be done by client-side javascript each time the page loads in the browser. I also don’t want to clutter the markup w/ stuff like % _(“The Score is”) that will be unreadable to graphic designers.

        Comment by bryanwb — June 30, 2009 @ 10:49 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: