The Karma Project: Code Less, Teach More

July 16, 2009

Localizing Images and Audio Files

Filed under: News — bryanwb @ 8:55 am

I am posting an e-mail I wrote to the Sugar Developers list in order to bring my questions to the attention to even more people that are smarter than myself. And those sugar developers are pretty smart!

I need your thoughts on the matter of localizing audio files and image
files for Karma.

I have two strategies in mind:

1) Integrate w/ pootle
2) Not integrated w/ pootle

1) Integrate w/ pootle.

When I grab the translatable strings from my files I also grab the
filenames of the audio and image files and stick the value of the src=””
attribute into in the .po as the msgid
for example:

>> index.html
<img src=”./images/house.png”> </img>

<audio src=”./sounds/yes.ogg”> </audio>

>> karma.pot

msgid “./images/house.png”
msgstr “”

msgid “./sounds/yes.ogg”
msgstr “”

>> ne-NP.po

msgid “./images/house.png”
msgstr “./images/ne-NP_house.png”

msgid “./sounds/yes.ogg”
msgstr “./sounds/ne-NP_yes.ogg”

>> es-SP.po

msgid “./images/house.png”
msgstr “./images/es-SP_house.png”

msgid “./sounds/yes.ogg”
msgstr “./sounds/es-SP_yes.ogg”

Then to load the localized strings, I simply use a jQuery CSS selector
to write the translated strings into the markup at page load.

the code is basically:

var localeStrings = load(‘pofile_converted_to.json’);

for (var msgid in localeStrings){
//if an element matches the msgid, replace it with the msgstr
if (msgid.tagName === img or audio){
$(:contains(msgid)).attr(‘src’, localeStrings[msgid]);
}
else {
$(:contains(msgid)).html(localeStrings[msgid]);
}
}

Using the .po file for everything that can be localized makes everything
nice and consistent

That said, the whole point of putting stuff in .po files is that u can
input the translated value directly into the .po with a text editor or
the pootle interface. Going to all this trouble when pootle doesn’t let
u upload audio and image files seems a bit of a waste.

2) Orrr, I could just check for images and sounds that are prefixed w/
the current locale at run-time and load them.

The problem with this it doesn’t give us any forward-compatibility if we
get pootle or find another solution to support crowd-sourcing
translation of audio and image assets.

Thoughts and advice are most appreciated.

Advertisements

July 12, 2009

i18n Challenges for Client-side HTML and JS

Filed under: News — bryanwb @ 4:40 am

I have done a good bit of reading and a whole lot of learning for Karma lately but not much productive coding. The Central Problem: we need an i18n solution for Karma that meets the following requirements:

  1. Works entirely on the client-side, not in server-side templates such as those used by smarty, django, and ruby-on-rails
  2. Supports embedding translatable strings in both javascript code and in html markup
  3. Doesn’t violate the holy precept of Unobtrusive javascript, i.e. no javascript inline with the HTML
  4. Abstracts or hides the tricky parts of i18n from newbies so they don’t get confused when they are just trying to create something simple
  5. Integrates w/ the GNU gettext machinery so our translators can use pootle at http://translate.sugarlabs.org

Oops! Forgot that Karma is also supposed to support L10n of images and audio. I haven’t even gotten to that problem yet.

I was really hoping that a software solution for this problem already existed and I would be spared the difficulty. Sadly, no. Virtually all of the i18n solutions for javascript and html, run on the server-side. That is they inject localized strings into html using inline php, python, or ruby code.

For example, a php i18n for html might look like this, pardon my bad PHP

<label id=Score><? _(“Your score is”) ?> </label>

The _(” “) is the standard invocation of the GNU gettext library. I really would prefer to have no code inline with the html. The beauty of web programming is that you don’t have to know anything about programing to get started with it. You can just start copying and pasting until u get so stuck that you have to learn at least a little bit of programming to go farther. A lot of professional programmers hate this phenomena but I laud it. It quickly rewards novices and that is what Karma hopes to do. Littering the html with inline code will make it less understandable to newbies who don’t know what “i18n” stands for.

I have hit on a two-pronged solution.

  1. Within javascript code, translatable strings should be marked as _(” “), in accordance with the gettext convention.
  2. HTML text will be marked in a manner that does not involve inline code and does not render the markup invalid

Accomplishing #2 will be a bit complicated. Here is how I intend to do it:

Tools
The first job is to grab all the translatable strings from the html using a python library such as BeautifulSoup or lxml. After many hours, I gave up in frustration. Both are extremely powerful tools but are written to be used by sophisticated developers. Manipulating the DOM with either of them is quite difficult compared to using jQuery. lxml and BeautifulSoup parse your html into a tree of elements from the ElementTree class. These elements are generic and not specific to html so you can’t access each elements attributes by name.

lxml relies on XPath expressions, which are very powerful but much more complex to use that then CSS selector syntax from sizzlejs — the css selector engine that jquery uses. With lxml or Beautiful Soup, you can use XPath expression to get the proper html elements but then you still have the problem of accessing the attributes and inline html.

In contrast, JQuery operates on DOM elements and you have full access to all of the properties of an HTML element.

Here is how easy it is to work w/ jQuery

from http://git.sugarlabs.org/projects/karma/repos/mainline/blobs/i18n-experimental/yes_no/utils/envjs/kgettext.js#line9

This little snippet grabs all the inline html from all html tags between $(” , , , “)

$("h1, h2, h3, title, label, button").each(function(){
 	print("msgctxt \"HTML Tag: " + $(this)[0].tagName + " ID: " + $(this).attr('id') + "\"");
 	print("msgid \"" + $(this).html() + "\"\n");
}); 

Short and sweet! I tried hard to do the same w/ lxml and failed after a couple hours of attempts.

But I don’t need to just translate inline html, I also need to translate attributes of particular html tags, such as the keywords attribute for meta and the alt=”” attribute for images.

So I wrote a little javascript function that grabs the attribute value for an arbitrary tag and attribute name.

var printAttr = function(selector, elemAttr) {
	$(selector).each(function(){
 	print("msgctxt \"HTML Tag: " + $(this)[0].tagName + " ID: " + $(this).attr('id') + "\"");
 	print("msgid \"" + $(this).attr(elemAttr) + "\"\n");
 	
 	});
 	return this;
};
	
printAttr('meta', 'content');
printAttr('img[alt]', 'alt');

Wow, that was easy! One small problem, I need to grab the translatable strings from an html file and write them to a .po file. In its most popular context — the web browser — javascript is sandboxed so it has no access to the local filesystem. I/O operations like reading and writing to the local filesystem are an impossibility from the browser for good security reasons. To do file I/O I need to run javascript from the command-line. There are some options for running javascript from the command-line but they are quite immature compared to the python, perl, ruby, even php runtimes. You can compile Mozilla’s Spidermonkey so that it has filesystem access, but it still isn’t easy to use.

Right now, Mozilla’s rhino is the preferred method for running javascript from the command-line. It is a javascript interpreter written in Java. As far as command-line interpreters go, rhino is not bad but not nearly as wonderful as the fantastic IPython interpreter/command-line. Rhino doesn’t seems to get a lot of love nor attention but it works.

As mentioned earlier, jQuery operates on DOM elements, it doesn’t merely parse a file. This means we need to load the files that make up a html document and render a DOM page in memory. Envjs does this for us. John Resig came up w/ enjvs in order to automate testing. His original blog post is still the definitive source of documentation. It isn’t easy to get started w/ envjs. There is a small community around it and even the main line of development is well labeled. It is here http://github.com/thatcher/env-js/

I had to install the “sun-java-jdk” package on Jaunty in order to install the proper version of Java for envjs.

Here is how I use envjs and jquery to parse an html document:

   cd envjs/
   java1.5 -jar rhino/js.jar           // u should use the rhino distribution that comes with envjs and it only works with Java1.5
   js> load('dist/env.rhino.js');
   js> window.location = 'index.html';
   js> load('jquery.js');
   js> print($('h1, h2, h3, button').html());   // print inline HTML depending on css selectors in $(". . .")
   .  . . . . .

Alright! I can now wrap this up in a script and it will print my “print” statements to standard out. Uh Oh, rhino also prints INFO messages and warnings generated when I load jquery.js. I need some way to scrub the output but it would be so much easier if I just just assemble a big string containing my text and then write it to a file. In rhino, you have to call a bunch low level java.io….. calls that I really don’t want to be bothered to figure out nor maintain. There is nothing wrong with java, but I am already juggling bash, python, and javascript in my head. There isn’t space in there for another programming language.

The solution may be in narwhal, a server-side interpreter that promises to give me easy-peasy filesystem access and even be able to run shell commands from within javascript. Tom Robinson, one of the lead drafters of serverjs and lead developer of narwhal, is helping me get started with it. I don’t know how easy or difficult it will be to load envjs within narwhal. That is a project for tomorrow. Let me take this opportunity to say that Tom has been very helpful in answering my questions on #serverjs. Some day soon I hope I can contribute back something that makes narwhal more useful.

Narwhal does have a lovely API for system access:

file.write(“path”, “string”)
f = file.open(“path”, “w”);
f.write(“string”); f.write(“string”);
f.write(“string”); f.close()
os = require(“os”); os.popen(“command”);

If narwhal proves easy to use, I may consider using it instead of setup.py or a Makefile for build and maintenance tasks.

We have hit the limit of what I have learned so far but I will brief you on what I plan next.

I will use the script kgettext.js to grab all of the translatable strings out of my html markup and store it as package_html.pot . Next, I will use xgettext to grab all of the strings marked with _(” . . . “) from my .js files to package_js.pot. jsgettext will help me with this. Subzero has already played with jsgettext and found that its use of the Prototype.js library conflicts with jQuery used on the same page.

I will take both files and merge them like this using the gettext utility msgcat.

msgcat package_html.pot package_js.pot > package.pot

Hopefully, this will go smoothly. As always, I will keep u updated.

July 6, 2009

Automated Assessment is the Killer App

Filed under: News — bryanwb @ 7:14 am

This subject deserves an in-depth blog post, but I only have a few minutes to spare, so here is a perfunctory treatment.

Automated Assessment is the Killer App for Using Computers in Education .

It just is. This statement will likely repulse many of you interested in computers in education and I ask those of you to understand this statement in a context much different than your own.

I have noticed that a large portion of the people interested in using computers in education come from high-performance schooling environments that are very structured and closely monitored. They had teachers that graded their homework, conducted regular teacher parent conferences, and completed regular status reports. In this context, more assessment isn’t really needed. However, this situation does not represent schools in the vast majority of the developing world nor a large chunk of the developed world. Homework isn’t graded, exams happen at the end of the year with nothing in the middle, and most teachers have so many students that it is nigh impossible for them monitor each child’s progress.

Banish from your mind that assessment === mindless instructionism. At a basic level, assessment is any means by which a teacher can monitor an individual student’s progress, i.e. what they have learned, what they haven’t learned, any deficiencies in their understanding, bad/good work habits, etc. A key aspect of any modern pedagogy is that the teacher must have a good sense of how each student is progressing.

The Magic Educational formula isn’t so Magical

I have listened to a lot of people talk about success stories of this or that education methodology or X country’s educational success and virtually all of them boil down to a simple formula I call the The Universal Formula For Educational Success (TUFFES).

“Intelligent Adult, Passionate about Teaching” + “Manageable Number of Kids (usually less than 30)” === “Educational Success!”

On any of my trips to educational events, I heard stories like the following “We paired an MIT student with a struggling inner city kid and the result was an awesome Carbon-Neutral robot!” “Teacher X in California led her 20 students in a series of constructionist projects and the kids had amazing results!” Examples like this reduce neatly to TUFFES

Here in Nepal, most kids never have their homework graded nor their essays critiqued. They take one exam at the end of the year that determines whether they advance to the next grade. The average class size is around 50 children but it can be as high as 100. This scenario is typical of many countries. Achieving TUFFES is feasible within 30 years, but not the next 10. An important advantage that using laptops in the classroom offers is automated assessment. And by automated assessment, I mean a system that automatically collects basic information about students.

Frankly I don’t get excited about Student X is doing amazingly well. I care about Students Y and Z that are way behind and in danger of failing out. I don’t care about the superlatives. I care about the masses, the average, the majority. It is easy to get carried away thinking about little geniuses in rural Nepal, but the real magic, the real social change will come from elevating the average level of education. To do that we need to ensure students read at grade level, grasp basic mathematics, etc.

By having children work with digital learning activities such as EPaath, we can easily collect basic information about each child’s academic progress. It isn’t feasible that a teacher w/ 100 children could grade each one’s homework, but they could periodically look at a simple bar graph describing for Student X what she is doing well and what she is having trouble with, if X is reading at grade level or below. If student X’s reading Level in Nepali is at Grade 2 when student X is already in Grade 6.

We haven’t implemented this feature in EPaath, due to lack of resources but it will be an extremely important part of Karma. I will write more about it in weeks ahead.

Create a free website or blog at WordPress.com.