The Karma Project: Code Less, Teach More

July 12, 2009

i18n Challenges for Client-side HTML and JS

Filed under: News — bryanwb @ 4:40 am

I have done a good bit of reading and a whole lot of learning for Karma lately but not much productive coding. The Central Problem: we need an i18n solution for Karma that meets the following requirements:

  1. Works entirely on the client-side, not in server-side templates such as those used by smarty, django, and ruby-on-rails
  2. Supports embedding translatable strings in both javascript code and in html markup
  3. Doesn’t violate the holy precept of Unobtrusive javascript, i.e. no javascript inline with the HTML
  4. Abstracts or hides the tricky parts of i18n from newbies so they don’t get confused when they are just trying to create something simple
  5. Integrates w/ the GNU gettext machinery so our translators can use pootle at http://translate.sugarlabs.org

Oops! Forgot that Karma is also supposed to support L10n of images and audio. I haven’t even gotten to that problem yet.

I was really hoping that a software solution for this problem already existed and I would be spared the difficulty. Sadly, no. Virtually all of the i18n solutions for javascript and html, run on the server-side. That is they inject localized strings into html using inline php, python, or ruby code.

For example, a php i18n for html might look like this, pardon my bad PHP

<label id=Score><? _(“Your score is”) ?> </label>

The _(” “) is the standard invocation of the GNU gettext library. I really would prefer to have no code inline with the html. The beauty of web programming is that you don’t have to know anything about programing to get started with it. You can just start copying and pasting until u get so stuck that you have to learn at least a little bit of programming to go farther. A lot of professional programmers hate this phenomena but I laud it. It quickly rewards novices and that is what Karma hopes to do. Littering the html with inline code will make it less understandable to newbies who don’t know what “i18n” stands for.

I have hit on a two-pronged solution.

  1. Within javascript code, translatable strings should be marked as _(” “), in accordance with the gettext convention.
  2. HTML text will be marked in a manner that does not involve inline code and does not render the markup invalid

Accomplishing #2 will be a bit complicated. Here is how I intend to do it:

Tools
The first job is to grab all the translatable strings from the html using a python library such as BeautifulSoup or lxml. After many hours, I gave up in frustration. Both are extremely powerful tools but are written to be used by sophisticated developers. Manipulating the DOM with either of them is quite difficult compared to using jQuery. lxml and BeautifulSoup parse your html into a tree of elements from the ElementTree class. These elements are generic and not specific to html so you can’t access each elements attributes by name.

lxml relies on XPath expressions, which are very powerful but much more complex to use that then CSS selector syntax from sizzlejs — the css selector engine that jquery uses. With lxml or Beautiful Soup, you can use XPath expression to get the proper html elements but then you still have the problem of accessing the attributes and inline html.

In contrast, JQuery operates on DOM elements and you have full access to all of the properties of an HTML element.

Here is how easy it is to work w/ jQuery

from http://git.sugarlabs.org/projects/karma/repos/mainline/blobs/i18n-experimental/yes_no/utils/envjs/kgettext.js#line9

This little snippet grabs all the inline html from all html tags between $(” , , , “)

$("h1, h2, h3, title, label, button").each(function(){
 	print("msgctxt \"HTML Tag: " + $(this)[0].tagName + " ID: " + $(this).attr('id') + "\"");
 	print("msgid \"" + $(this).html() + "\"\n");
}); 

Short and sweet! I tried hard to do the same w/ lxml and failed after a couple hours of attempts.

But I don’t need to just translate inline html, I also need to translate attributes of particular html tags, such as the keywords attribute for meta and the alt=”” attribute for images.

So I wrote a little javascript function that grabs the attribute value for an arbitrary tag and attribute name.

var printAttr = function(selector, elemAttr) {
	$(selector).each(function(){
 	print("msgctxt \"HTML Tag: " + $(this)[0].tagName + " ID: " + $(this).attr('id') + "\"");
 	print("msgid \"" + $(this).attr(elemAttr) + "\"\n");
 	
 	});
 	return this;
};
	
printAttr('meta', 'content');
printAttr('img[alt]', 'alt');

Wow, that was easy! One small problem, I need to grab the translatable strings from an html file and write them to a .po file. In its most popular context — the web browser — javascript is sandboxed so it has no access to the local filesystem. I/O operations like reading and writing to the local filesystem are an impossibility from the browser for good security reasons. To do file I/O I need to run javascript from the command-line. There are some options for running javascript from the command-line but they are quite immature compared to the python, perl, ruby, even php runtimes. You can compile Mozilla’s Spidermonkey so that it has filesystem access, but it still isn’t easy to use.

Right now, Mozilla’s rhino is the preferred method for running javascript from the command-line. It is a javascript interpreter written in Java. As far as command-line interpreters go, rhino is not bad but not nearly as wonderful as the fantastic IPython interpreter/command-line. Rhino doesn’t seems to get a lot of love nor attention but it works.

As mentioned earlier, jQuery operates on DOM elements, it doesn’t merely parse a file. This means we need to load the files that make up a html document and render a DOM page in memory. Envjs does this for us. John Resig came up w/ enjvs in order to automate testing. His original blog post is still the definitive source of documentation. It isn’t easy to get started w/ envjs. There is a small community around it and even the main line of development is well labeled. It is here http://github.com/thatcher/env-js/

I had to install the “sun-java-jdk” package on Jaunty in order to install the proper version of Java for envjs.

Here is how I use envjs and jquery to parse an html document:

   cd envjs/
   java1.5 -jar rhino/js.jar           // u should use the rhino distribution that comes with envjs and it only works with Java1.5
   js> load('dist/env.rhino.js');
   js> window.location = 'index.html';
   js> load('jquery.js');
   js> print($('h1, h2, h3, button').html());   // print inline HTML depending on css selectors in $(". . .")
   .  . . . . .

Alright! I can now wrap this up in a script and it will print my “print” statements to standard out. Uh Oh, rhino also prints INFO messages and warnings generated when I load jquery.js. I need some way to scrub the output but it would be so much easier if I just just assemble a big string containing my text and then write it to a file. In rhino, you have to call a bunch low level java.io….. calls that I really don’t want to be bothered to figure out nor maintain. There is nothing wrong with java, but I am already juggling bash, python, and javascript in my head. There isn’t space in there for another programming language.

The solution may be in narwhal, a server-side interpreter that promises to give me easy-peasy filesystem access and even be able to run shell commands from within javascript. Tom Robinson, one of the lead drafters of serverjs and lead developer of narwhal, is helping me get started with it. I don’t know how easy or difficult it will be to load envjs within narwhal. That is a project for tomorrow. Let me take this opportunity to say that Tom has been very helpful in answering my questions on #serverjs. Some day soon I hope I can contribute back something that makes narwhal more useful.

Narwhal does have a lovely API for system access:

file.write(“path”, “string”)
f = file.open(“path”, “w”);
f.write(“string”); f.write(“string”);
f.write(“string”); f.close()
os = require(“os”); os.popen(“command”);

If narwhal proves easy to use, I may consider using it instead of setup.py or a Makefile for build and maintenance tasks.

We have hit the limit of what I have learned so far but I will brief you on what I plan next.

I will use the script kgettext.js to grab all of the translatable strings out of my html markup and store it as package_html.pot . Next, I will use xgettext to grab all of the strings marked with _(” . . . “) from my .js files to package_js.pot. jsgettext will help me with this. Subzero has already played with jsgettext and found that its use of the Prototype.js library conflicts with jQuery used on the same page.

I will take both files and merge them like this using the gettext utility msgcat.

msgcat package_html.pot package_js.pot > package.pot

Hopefully, this will go smoothly. As always, I will keep u updated.

Advertisements

4 Comments »

  1. Have you looked at TinyMCE rich text editor? It does in JS translations. There might be something useful there.

    Comment by Dave Bauer — July 12, 2009 @ 10:08 pm

    • I haven’t. I will check it out. Thanks Dave

      Comment by bryanwb — July 13, 2009 @ 1:04 am

  2. I like the idea of using JS to handle translations, since it’s the same tool already used in the framework. Keeps the barrier to entry for hacking Karma itself low.

    About jQuery/Prototype.js conflicts, if you use jQuery (and Prototype.ks, I’m not sure it can) in compatibility mode it shouldn’t happen. I think you should be doing that anyway, so that Karma developers can use their framework of choice.

    It might pay off to rewrite jsgettext to jQuery, though.

    Comment by Lucian — July 13, 2009 @ 12:35 pm

    • You’re right that it may pay off to rewrite jsgettext to use jquery, we may have to work on that later this fall.

      It is relatively easy to use js to grab the translatable strings. Grabbing the strings is the easy part, updating existing .pot files is the challenging part that makes me cringe a little bit.

      Comment by bryanwb — July 14, 2009 @ 8:01 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: