Internationalization Using PHP and GetText
Author: Luis Argerich (salutia.com)
Publishing date: 14.12.2000 17:53
If you are a web developer using PHP from small web applications
to large corporate portals with millions of page views, you may
suddenly find that you need your site or application translated to
another language. When your company opens www.company.com.br you
need your application in Portuguese, then you'll need it in
Japanese and so on. You don't have a "translation" problem, you
have an internationalization problem.
Translating only implies simple string translation of code, you
may do it in a clever way or a painful (very) way scanning all the
code and translating strings. This method tends to produce errors
(Have you ever seen a site in Spanglish?) and it is not reusable,
just a patch. Internationalization defines a strategy to build
sites and applications that you can easily translate into other
languages. This is the goal of this article.
Enter GNU
GNU has a very good set of tools to produce internationalizable
applications called gettext, it was developed with "C" and "C++"
applications in mind but is also very easy to use in other
languages such as PHP. Gettext is probably installed on your Unix
station, try a gettext command like 'gettext' from the command line
to see if you have it. If not (rare) download the package from
ftp.gnu.org and install it. If you have gettext the next stage is
to use it from PHP.
Internationalizing PHP Scripts
To use gettext in your PHP scripts you'll have to modify all
your scripts, however this will have to be done only once. The
modifications are not very difficult and you'll probably want to
build some utility to scan the code and prompt 'want to modify
this?' etc... If you don't want to do that, no problem at all. So
what do you have to do in your code? Very simple just replace all
the "strings" with something like:
print(_("Hello world"));
Yes, you use the very unknown PHP function "underscore", which
is an alias to the not-short-to-write "gettext" function. Every
string that you may translate sometime will have to be translated.
Once you get used to this and see the advantages you'll find that
you always write your strings wrapped with the "_" function.
Extracting Strings from the Code
Gettext provides a utility called 'xgettext' to extract all the
strings from your scripts to a file called a "po" file, you have to
use:
$xgettext -a src/*.php
To extract all the strings. Read the GNU's documentation for
'xgettext' for more options . After xgettexting your code you will
have a 'messages.po' file which may look like this one:
# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR Free Software Foundation, Inc.
# FIRST AUTHOR , YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2000-12-08 19:15-0300\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME \n"
"Language-Team: LANGUAGE \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
#: prueba.php:12
msgid "Hello world"
msgstr ""
#: prueba.php:12 prueba.php:13
msgid ""
msgstr ""
#: prueba.php:13
msgid "This is a test"
msgstr ""
This will be the file that you have to pass to translators, they
will have to set msgstr for each entry to the proper translation of
the string for the target language. You can add comments (using
#comment) for each entry to provide context if needed such as:
#: prueba.php:12
# This is displayed at the beginning of the script
msgid "Hello world"
msgstr ""
Etc, after this you have a "master" po file where all the msgstr
strings are "". Using gettext on a string that is not translated
will simply output the msgid. This is a good feature.
Once translators have translated the 'messages.po' file, you'll
have several files for different languages. It is time yo use them
from PHP.
Producing mo Files
You have to produce a .mo file to use gettext, this is done
using the 'msgfmt' command for each .po file:
$msgfmt messages.po -o messages.mo
Setting up Directories
The best way to use gettext is to build a "locale" directory in
some branch of your code tree and inside this directory you create
one subdirectory for each language, inside each language's direcory
you create an LC_MESSAGES directory, where you put .mo and .po
files for the language. Example:
/src
/locale/en/LC_MESSAGES/messages.mo
messages.po
es/LC_MESSAGES/messages.po
messages.mo
Use this URL
http://lcweb.loc.gov/standards/iso639-2/bibcodes.html#op to find
the 2 character codes for languages. It's nice to follow
standards...
You'd probably start liking your "neat" structure of languages a
lot, you are ready to decide which language to use in PHP, simply
add, in your main script, this code:
// Set language to Spanish
putenv ("LC_ALL=es");
// Specify location of translation tables
bindtextdomain ("messages", "./locale");
// Choose domain
textdomain ("messages");
The important line is putenv("LC_ALL=es") which basically tells
which language to use, with this PHP will use your
/locale/es/LC_MESSAGES/messages.mo file to translate all the
strings. If you want a different language just change:
putenv ("LC_ALL=en);
And "auto-magically" you have an English application.
Conclusions and Code
Gettext is a very good way to standardize your PHP code to
support internationalization, you modify or write your code only
once and you can have your site or application in several different
languages with only a minor modification. One side effect of this
strategy is that you have to make sure that all the strings are set
in the content or logic layer not in the presentation layer. If you
use XSLT to parse XML generated from PHP for example you'll have to
make sure that no strings are added in the XSLT layer. I think this
is quite good.
In inter.tar.gz you'll find a subtree that you can put wherever
you want inside your document tree, I ship examples in English and
Spanish, change the environment variable from "en" to "es" and
browse "prueba.php" (the Spanish file),and see the changes. Edit
the .po files and rebuild the .mo files using msgfmt to play the
translator role. As perl fans says "there are many ways of doing
things..." this is a neat one, try it.