Open Questions for Internationalization and Localization

Introduction

Currently, edX has no engineering team or product owner around the internationalization (i18n) and localization (l10n) of the edX platform, edx.org website, and associate products (Drupal, insights, etc). This document is a first pass at the outstanding questions around i18n and l10n we have, primarily from a product perspective, that a full or part time globalization team would need to answer to help us become secure and stable in our translations support. These questions, and their eventual answers, will form the backbone of our globalization strategy. 

Nothing about how we do translations of anything (even the platform) is at all easy. Even if we are handed a set of translated strings, if they are for a language we don't support yet, there is significant effort to ramp, execute and maintain a new language. If we are going to move toward deeper language support (e.g. translate the Web site and documentation), and/or more languages (for key markets or partners), and/or content translations (even if we act only as a broker), then we need a business plan for approaching the Global market, we need to shore up what we have (e.g. our Hindi platform translation percentage is below 50% -- that's a language we already "support"), and we'll need more technical and process infrastructure to make significant progress.

Before we move forward with additional support for languages - whether that is releasing new languages on edx.org or building new features that allow further localization - we need to define what "supporting" a language is, what our and our partner's commitments to translation are, and what are target markets are. We need to figure out a long-term strategy plan that will enable sustainable, productive support of a small number of language groups. In addition to the commitment of a full or part time product owner in this area, we will need a dedicated engineering resource as well as input from our marketing and business teams to help us identify target markets and ensure we have a budget that will help us meet our globalization needs.

Q1: edX Commitment

We need the full commitment of edX to a long term globalization initiative. This will include staffing for a product owner and engineering team, marketing and business resources as needed, and a budget to cover translation-related expenses. A recommendation in this area is likely to form out of the Globalization working group currently meeting throughout Q3 (led by Johannes).

To be perfectly clear, right now, edX does not have a significant commitment to globalization. There is no official product owner, just Beth acting for the product team overall in addition to her full time responsibilities. There is no engineering support. Sarina updates existing translations and handles transifex, a 3 to 8 hour per week commitment that is unaccounted for in the team's official engineering commitment and is primarily done on the evenings/weekends because it has to be done, but is not budgeted or accounted for.

Committing to internationalization will require money, and it will slow velocity. It will take time to do correctly. This is borne out through research into what other companies have done, and internal knowledge from people within edX who have been part of globalization initiatives at other companies.

Commercialization Impact (Mobile)

Poor localization of mobile applications (under commitment, mis-translations, lack of context review) will lead to low ratings on the app store which may be difficult to bounce back from.

UX Impact

UX velocity will be slowed by numerous UX impact that globalization has. For example, we need to make sure all pages render correctly both in LTR (left-to-right) and RTL (right-to-left) languages, such as Arabic. Further, we need to make sure testing is robust enough to account for languages, such as German, with traditionally longer strings, and character sets, such as Mandarin, that are taller and wider than Latin characters.

Doc Impact

Translation of documentation will take awhile to do, since there are many hundreds of pages of documentation with many thousands of words. Doc team attention will be needed to make sure there's firm standards across all documentation, especially terms that relate to our software, to guarantee consistent translation. There will also be impact to changing existing documentation: changes to existing documentation will break all the translated copies, so changes will have to be more carefully considered (see Q3B below).

Engineering Teams Impact

Engineers at this point are generally used to the engineering requirements for internationalization, and for the most part are following guidelines correctly. However bugs are found each week. A push for globalization should insist that the engineering teams pay particular attention to proper internationalization rules while they are coding, before things are merged to master. When planning feature roll-outs, there will need to be coordination with the localization team to make sure that the new feature set makes it into a contextualization test harness (see Q2A, Q3C below). 

Changes to existing features and strings will need to be carefully considered. Changing platform strings that are already translated into our target languages may break the user experience for non-English users. Everyone across the company needs to be acutely aware of this challenge, and plan requests accordingly. Capriciously changing strings or capitalization without a clear reason is no longer acceptable (see Q3B below).

 

Overall, the impact to the teams is not insignificant and needs to be factored in to story estimates and overall planning for projects. Globalization is not easy or free.

Q2: Process for Releasing New Languages

We need to figure out what the process for releasing new languages ought to be, and (with input from business) what those languages should be. We need a localization manager to manage this process full-time.

Q2A: Contextualization reviews?

Particularly, we need to figure out - for each new language released - how much of a contextualization review we want, what that review would entail, and who would perform the review. For example, a full contextualization review may have over 200 discrete steps to follow. It may be safer and more expedient to hire a contractor to do this rather than asking a partner to do so.

Further, determining what the full feature set is on edx.org and setting up a test course or test courses to have all of edx.org's feature set present, in addition to writing clear, step-by-step guidelines to help a contextual reviewer understand how to interact with every element, would be a significant undertaking by the engineering team. For an example of this, consider describing all the states of the verified certificate workflow. Since there are a lot of different error messages you can get, we'd need to describe all possible paths through the workflow (use a valid credit card, don't use a valid credit card, forget your name, take your picture incorrectly, retake your ID photo, and so on). As Q3A will bring up, we should likely perform this contextualization review first and foremost with our already released and "supported" languages.

Q3: Defining long-term support and shore up existing languages

A localization manager will work with business and marketing to define our goals around globalization and figure out our target markets and roll-out strategies. This manager will work with the engineering team to figure out a process for rolling out new features, getting those features translated, and making sure we have a contextualization review harness that incorporates all new features. The localization team will need direct engineering support to maintain and improve the current set of rickety, partially-automated translation infrastructure tools.

Q3A: What is our support plan for existing, "live" languages?

We currently have 5 languages live: Chinese, Spanish (Latin America), Portuguese (Brazil), Hindi, and French. Of these languages, only Portuguese has gone through a rigorous contextualization review, and only one, in May/June of 2014. For Portuguese, we paid a team from Local Concept to translate the platform and do a contextualization review. We drew up a Localization Smoke Test Plan for the team to follow. This was hastily done and may have missed features. The plan is certainly out of date for the current platform.

The remaining languages went through a contextualization review that consisted of an ad-hoc spreadsheet where we asked course teams to note any issues they had with translations over a 2-3 week period of using the platform. This is not thorough and likely missed many, many user-facing strings and situations that needed review. For example users familiar with the edX platform likely do not test registering with a username that's already been taken or forgetting their password. However these are frequently-encountered situations for edX users and need to be contextually reviewed by a team.

So, before we proceed further, we need to figure out how to apply the contextualization plan (Q2A) to our currently-existing languages. 

Q3B: What is our SLA with our customers regarding translation turn around time?

How do we deal with a living codebase? Our platform strings (text areas) change constantly. With a once-per-week release, this means that the chance that there are changed strings live on the platform that do not yet have any translations is very high. For example, the Destination edX team recently did an overhaul of the phrasing on the login and registration pages. Users viewing these pages in Chinese, who would have previously seen the entire login workflow in Chinese, saw the workflow only in English for about four weeks until the volunteer, open source translators on Transifex provided translations for all the new strings. 

This is a sub-optimal user experience, and Chinese is one of the fastest-responding teams that we have. Other languages have a slower turn around time - Portuguese took over 2 months to get the new strings fully translated. 

How do we ensure translations are reasonably up to date (and how do we define that)? We need to figure out an SLA for the turn around time for new or changed strings and define how we are going to meet that goal. We will also need to define how we will support contextual review here - we don't want to make teams do a full contextual review for only a few hundred changed strings. So we need to figure out a way to isolate ALL changed strings, locate them in the platform, and release a contextualization review plan to translation teams on a pre-determined regular basis.

Q3C: How do we deal with translation support for new, large features?

For most features or string changes, it is likely ok for there to be a small period where the translations are not yet supplied. However, for certain larger, long-awaited features (a good example might be Verified Certs), or hugely visible breaking changes (like the login/registration page changes), we may wish to have supported languages have translations available at the same time as launch.

This will require coordinating the effort of translation teams and engineering teams. We will need to make sure strings are available on Transifex in advance of a feature's release, and that translators can perform a contextualization review before the feature is live on the site. However we cannot involve translation teams too early. This will lead to wasted effort both in translation and contextualization review as the feature's strings change and progress through development. 

We need to define when we would want to provide this type of support, what the timelines would be, who from edX would coordinate with the translation teams (this would require a significant amount of work from a project manager or product owner to coordinate the engineering team's work with the translation teams), etc.

Q3D: What is our content strategy, and how does that inform what languages should we support?

We need to figure out what languages, from a business and marketing perspective, we want to support. We need to critically examine this because releasing new languages is not free. Releasing and maintaining languages live on edx.org represents a significant cost. We should think critically about dropping support for languages that aren't driving registrations or users to our site. We should focus on bringing in new languages only when they satisfy business objectives.

Q3E: Engineering support for automated tools

Currently we have some automatic tooling that works okay. It is no one particular team's responsibility to maintain these tools, and as such, reported bugs are not getting addressed. The tools are not particularly robust and do not automate a great deal of the workflow, requiring weekly manual intervention. We need to make sure there's engineering support to maintain and improve these tools going forward. A robust set of tools will potentially allow a less technical localization manager to maintain a large part of the translation workflow.

Supplementary Materials

In September 2015, Sarina drew up a product plan for localization. This plan represents a potential way forward, but is dependent upon significant resources, particularly a localization engineering team, a product owner for the localization product, and a budget to pay translators. Currently, this product plan is unfunded and unimplemented.