
                       X H A R B O U R - Internationalization model

                               API and usage manual

                                Giancarlo Niccolai

                                  gian@nccolai.ws


$Id: hbi18n.txt,v 1.7 2005/01/20 08:27:57 brianhays Exp $


/*  $DOC$
 *  $FUNCNAME$
 *      Xharbour I18N
 *  $CATEGORY$
 *      Xharbour Enhacements
 *  $ONELINER$
 *      XHARBOUR - Internationalization model
 *  $DESCRIPTION$
 *
 *
 *      STATUS OF THE DOCUMENT
 *      ======================
 *
 *      This is a small draft to explain the working of the i18n system in xharbour.
 *      There could be error or even typos, and the document is Work IN Progress...
 *      But the system i18n in itself is pretty complete, and is NOT subject to
 *      radical changes: it can be used in a production environment.
 *
 *
 *      I18N CONCEPT
 *      ============
 *
 *
 *      What do we mean for internationalization ?
 *      ------------------------------------------
 *
 *      For i18n we mean a subcomponent of the localization (l10n) problem. I18N is
 *      the ability to translate the user interface of a program into any possible
 *      language. Currently, i18n() only supports languages in western encodings,
 *      as right to left and unicode supported languages are NOT supported by
 *      XHARBOUR. But i18n() is ready to use these extensions as soon as they
 *      are available.
 *
 *      While i18n is translation of the UI into many languages, l10n is the usage
 *      of local national standards for dates, monetary values, mathematica symbols
 *      (many countries have comma and dot meaning swapped).
 *
 *      This document and the i18n system only handles the first problem, in the future
 *      we could have a standard interface for producing both internationalized and
 *      localized programs.
 *
 *
 *      How is this accomplished?
 *      -------------------------
 *
 *      How are we able to translate a program into any possible language without
 *      having to recompile it, or without having to put a comprehensive prebuilt
 *      table of translation somewhere?
 *
 *      The thing goes like this: you write down your program as always, without
 *      worryng about writing your application in a specified language. You should
 *      just write your application using plain English, and surround each
 *      static string with the i18n() function call, like this:
 *
 *      /* Minimal i18n program */
 *
 *      Procedure MAIN()
 *         ? i18n( "Hello world in your language!" )
 *      RETURN
 *
 *      When your program is complete, you will use xharbour support for i18n in
 *      these steps:
 *
 *      1) compile your project using the -j switch, obtaining an
 *         "Harbour international list" (or .hil) file as output.
 *
 *      2) use a program called hbdict (available under utils/) to translate
 *         every string you want to be translated into a language; you can
 *         create in this way any number of "harbour international tables", or
 *         .hit.
 *
 *      3) put the .hits into a subdirectory available to your program, that must
 *         be called i18n/.
 *
 *      4) at startup, the xharbour program will search for a file named after the
 *         environment variable LANG (plus .hit as extension) in the i18n/ directory
 *         under his start directory. If this file is found, your program will be
 *         displayed in the language you have chosen via the LANG variable.
 *
 *      It is also possible to configure search paths and file names, and even to
 *      change language at any moment in the program.
 *
 *
 *
 *      USING XHARBOUR TO PRODUCE THE INTERNATIONAL LIST
 *      ================================================
 *
 *      When you have a reasonablely complete program that you want to test in a
 *      foreign language, you must compile it with the "/j" flag to obtain the
 *      hil file, that is built with every string that you put inside an i18n()
 *      call.
 *
 *      * NOTE: you don't have to translate all the strings of a program to test it.
 *        Actually, you don't have to do any traslation at all to have it working.
 *        This system allows you to build translations only when you are pretty
 *        confident that the program has reached a fairly stable moment of its
 *        developement. Also, this systems allow foreign users to add their own
 *        translation (and then send it to you or share with other ousers), so
 *        that you are freed from this part of development.
 *
 *      If used without any parameter, /j (or -j, depending on the system you
 *      are compiling on), will create an hit file named after your source, along
 *      with your source's code.
 *
 *      This goes like that:
 *
 *      harbour -n -j myprog.prg
 *
 *      will generate myprog.c AND myprog.hil.
 *
 *      If the hil file is already present, the strings being internationalized
 *      in you program will be added to the HIL file. This means that, if you
 *      don't regularily delete the hil file, it will grow endlessy, as no merge
 *      mechanism is currently available, both to speed up compilation and to
 *      keep harbour code simple as possible.
 *
 *      If you give a parameter to -j option, you will be able to direct the hil
 *      output that file. Combined with the fact that hil files are never
 *      overwritten, but just "grown", an average build process for a project
 *      built on many prg files, and to be internationalized, could be this:
 *
 *
 *      del myproject.hil
 *      harbour -n -jmyproject.hil main.prg
 *      (on success) harbour -n -jmyproject.hil func1.prg
 *      (on success) harbour -n -jmyproject.hil func2.prg
 *      ...
 *      link...
 *
 *
 *
 *      BUILDING A DICTIONARY USING HBDICT
 *      ==================================
 *
 *      Hbdict is an utility (currently RUDIMENTAL) that allows to build dictionaries.
 *      It has two command line parameters: the first is the input file, and can be
 *      an hil that you want to translate, or a pre-existing hit. The output is
 *      an hit that you want to create.
 *
 *      The API used by hbdict is available in harbour RTL, and this makes possible to
 *      build dictionary translators in a very simple fashon. The code that handles the
 *      tables is no more than 50 xharbour lines, and using hbdict as a template, fairly
 *      complete dictionary utilities can be built. Anyway, hbdict has anything an
 *      essential dictionary editor should have, and will have more in the future.
 *
 *      As for hit files names, you are invited to choose names after the international
 *      language names schemes. It is a 5 character code, having two lowercase letters
 *      indicating the main language, an underscore and two uppercase letters indicating
 *      the subtype of the language. In example:
 *
 *      British English: en_UK
 *      Italian: it_IT
 *      Swiss Italian: it_CH
 *      French: fr_FR
 *      American English: en_US
 *      Japanese: ja_JP  (jp is the nation code, while ja is the language).
 *
 *      The "en_US" language code is currently used as "international", or
 *      "no translation". But this may change in the future, so you can develop
 *      applications in i.e. Spanish and AFTER that you have built them, you
 *      can traslate them to English.
 *
 *      Anyway, you can select any name you like; you will have to make sure that
 *      your LANG variable will be set to that name (without the hit extension) to
 *      load automatically the language at startup, but we'll discuss this point
 *      later.
 *
 *      Now, if you launch for example:
 *
 *      hbdict file.hil it_IT.hit
 *
 *      you will see the hbdict interface. With arrows you will be able to see all
 *      the strings that were found in i18n() function calls in your sources. Pressing
 *      enter, you can set a new translation, or change it; the window in which the
 *      translations are made is a simple memoedit, so you can use standard memoedit
 *      keys to do edits.
 *
 *      Pressing "E" (or a key that is named in the window), you can alter the default
 *      title, language name and author of the dictionary. Pressing "S" will save your
 *      output hit table. I suggest to do this often.
 *
 *      To modify an existing hit file, you can set it as both for input and output
 *      file:
 *
 *      hbdict it_IT.hit it_IT.hit
 *
 *
 *
 *      IMPORTANT: How to modify existing hits AFTER a new hil compilation?
 *      -------------------------------------------------------------------
 *
 *      If you have done some modification after having created the hits, and/or you set
 *      as output file an existing hit file, the new strings will be merged into the
 *      hit file without destroying the old existing translations.
 *
 *
 *
 *
 *
 *
 *
 *      USING the I18N API IN YOUR PROGRAMS
 *      ==================================
 *
 *      I18N system is automatically initialized at xharbour startup. The standard
 *      process consists in searching a file named after the "LANG" environment
 *      variable, plus the extension .hit, in the i18n/ subdirectory of the file
 *      startup directory. If no file can be found, i18n will silenty fail and the
 *      untraslated strings being in the program will be displayed.
 *
 *      NOTE: LANG variable is usually defined in unix systems to allow their own
 *            i18n to be enabled. So, this variable should be found in an average
 *            unix enviroment, and it will have the naming scheme that we have
 *            proposed above. TODO: remove the '@' lang extension, like
 *            it_IT@euro
 *
 *      If you don't have access to the LANG variable, or if you have your own way
 *      to determine what language, or what .hit file, you want to load, you can
 *      simply add this code to your program:
 *
 *      Procedure MAIN()
 *
 *        HB_I18nSetPath( "C:\WhereAreHits\" )
 *        HB_I18nSetLanguage( "NameOfTheLanguageFileWithoutHITextension")
 *        ...
 *
 *      This functions have also an error diagnostic, so you can ask your user
 *      for non default actions.
 *
 *
 *      This simple example explains how is possible to configure the i18n system
 *      at runtime:
 *
 *
 *      ************************************************************
 *      * i18ntest.prg
 *
 *      #include "inkey.ch"
 *
 *      Procedure MAIN()
 *         LOCAL nChoice
 *         LOCAL aLanguages
 *         LOCAL aLangCodes := { "en_US", "it_IT", "fr_FR" }
 *
 *         SET COLOR TO W+/B
 *
 *         nChoice := 1
 *        DO WHILE nChoice < 4 .and. nChoice > 0
 *           aLanguages := { ;
 *              i18n( "International" ), ;
 *              i18n( "Italian" ), ;
 *              i18n( "French" ), ;
 *              i18n( "Quit" ) }
 *
 *            CLEAR SCREEN
 *            @2,10 SAY i18n( "X H A R B O U R - Internationalization test " )
 *            @4,10 SAY i18n( "Current language: " ) + HB_I18NGetLanguageName() +;
 *                        "(" +HB_I18NGetLanguage() +")"
 *            @6,10 SAY i18n( "This is a test with a plain string")
 *
 *            @12,10 SAY i18n( "Select Language: " )
 *            MakeBox( 12,40, 20, 55 )
 *            nChoice := Achoice(13, 41, 19, 54, aLanguages,,, ;
 *               Ascan( aLangCodes, { |x| x == HB_I18NGetLanguage() } ) )
 *
 *            IF nChoice > 0 .and. nChoice < 4
 *               HB_I18NSetLanguage( aLangCodes[ nChoice ] )
 *            ENDIF
 *         ENDDO
 *
 *
 *      RETURN
 *
 *      PROCEDURE MakeBox( nRow, nCol, nRowTo, nColTo )
 *        @nRow, nCol, nRowTo, nColTo ;
 *              BOX( Chr( 201 ) + Chr( 205 ) + Chr( 187 ) + Chr( 186 ) +;
 *              Chr( 188 ) + Chr( 205 ) + Chr( 200 ) + Chr( 186 ) + Space( 1 ) )
 *      RETURN
 *
 *      ---------------------------
 *
 *      This program displays some strings and then allows to select a language among the
 *      ones that are in the list. Notice that each element of the list must be in the
 *      i18n() string to be correctly translated. Notice also that we could have used
 *      any "language name" in the aLangCodes array, the important thing is that a file
 *      .hit with that name were present in the current i18npath, that is "i18n/" by
 *      default.
 *
 *      ---------------------------
 *
 *
 *
 *      HB_I18N api
 *      ===========
 *
 *      I18n API is divided into three layers. The upper level layers has the
 *      function needed to load and to use an hit file, and includes the i18n function.
 *
 *      The middle level is meant to manage hil and hit tables, and is avaialble for
 *      programs like hbdict.
 *
 *      The low level layer is meant for developers, and is written in C. The core
 *      api is meant to be lightning fast, with much complexity being incorporated in
 *      the hit table by the dictionary programs, so that I18N calls are resolved as
 *      fast as possible, that is usually lot faster than it is possible to notice
 *      in an average GUI.
 *
 *      The choice of having pre-compiled tables instead of easy to mangle with ascii
 *      oriented dictionary files is has been made for the reason of allowing
 *      applications to start with a minimal overhead. This is very important in
 *      nowdays world for little service programs, like CGI. So, you can use i18n in
 *      programs that are loaded many times per second, without having to worry about
 *      the overhead of loading a translation file.
 *
 *
 *      High level api
 *      --------------
 *
 *      I18N( cString ) --> cTranslated
 *
 *         Searches cString in the currently loaded table, using a fast binary
 *         (dicotomic) search, thus reducing the amount of tests needed to a
 *         log2 factor of the table size. I.e., if your program has 1000 strings
 *         to be translated (they are really a LOT for an average application),
 *         I18N will find a result in 10 steps (at worst).
 *
 *         If the string is not present, cString will be returned untranslated,
 *         without any new allocation or copy being made.
 *
 *      HB_I18nInitalized() --> lInitialized
 *
 *         Returns true if the I18N has been initalized correctly at startup or
 *         with API functions. This can be useful to take default actions or
 *         to search dictionary files in non standard directories.
 *
 *      HB_I18nSetPath( cPath )
 *
 *         Sets the current languages tables search pat to cpath. No tests is
 *         made to see if cPath is a valid path. The path can be relative,
 *         if it does not begin with a drive specification (Windows) or with
 *         a slash, or absolute. Default directory set at program startup is
 *         i18n/
 *
 *
 *      HB_I18nGetPath() --> cPath
 *
 *        Returns the currently selected path for searching i18n table files.
 *
 *
 *      HB_I18NSetLanguage( cLanguage ) --> lResult
 *
 *         Loads a language translation table into the current I18N() translation
 *         table. cLanguage is a filename (without the .hit extension) that must
 *         be searched in the current i18n search path (see HB_I18nGetPath()).
 *
 *         This function is called by the virtual machine before the programs
 *         begins using the value of the environment variable "LANG" as the
 *         cLanguage parameter, if this variable is present present, but can
 *         be also called in a later moment by the program.
 *
 *      HB_I18NSetBaseLanguage( cLanguage, cName )
 *
 *         Set the cLanguage as the "untranslated" language. Suppose that
 *         you are writing a program in i.e. Italian and using i18n to
 *         translate it in English. The sytem must know that if your
 *         "language" becomes equal to "it_IT", be it in the LANG environment
 *         variable or be it set in the dynamically in the program,
 *         then you mean "do not translate me: I was written in it_IT".
 *
 *         The default at the startup is "en_US", so if you write your
 *         program in American English, and use i18n to translate it in
 *         other languages, you don't have to use this function.
 *
 *         Since the default language is not read from a .hit file, but is
 *         hard-coded in the program, you also must set a name for the
 *         language. I.e.:
 *
 *             HB_I18NSetBaseLanguage( "it_CH", "Italiano Svizzero" )
 *
 *      HB_I18NGetLanguage() --> cLanguage
 *
 *         Returns the name of the table that has been loaded into I18N()
 *         translation table with HB_I18nSetLanguage(), or at startup.
 *
 *
 *      HB_I18NGetLanguageName() --> cLanguage
 *
 *         Returns the name of the currently loaded language, as it has been
 *         declared in the language header. Generally, this is a descriptive
 *         non-internationalized name of the language, like "Espanol",
 *         "Italiano", "English", "Franoise", "Detusche", "Nihongo" and
 *         so on.
 *
 *
 *      HB_I18NGetBaseLanguage() --> cLanguage
 *
 *         Retunrs the name of the language in which the program has been
 *         written, that is set with HB_I18NSetBaseLanguage().
 *
 *
 *      HB_I18NGetBaseLanguageName() --> cLanguage
 *
 *         Returns the name of the language that the program has been written in.
 *         Generally, this is a descriptive non-internationalized name of the
 *         language, like "Espanol", "Italiano", "English", "Franoise",
 *         "Deutsche", "Nihongo" and so on.
 *
 *      -------- TODO: functions to access other language members, and
 *               functions to allow i18n lookups on more than one table:
 *               you could have web service threads speaking english and
 *               other speaking french.
 *
 *
 *
 *
 *      Middle Level api
 *      ----------------
 *
 *      HB_I18NLoadTable( cFileName| nFileHandle ) --> aTable.
 *
 *         This function loads a file containing a table and stores it in
 *         an xharbour array. If a filename is given, that won't follow the
 *         i18n() opening conventions: you will have to use a full valid
 *         relative or absolute filename to open the file.
 *
 *         On success, the returned array has two elements. The first element
 *         is a 6 element array holding the header of the table:
 *
 *         aHead[1] == Signature ( 4 characters ).
 *         aHead[2] == Author (50 characters)
 *         aHead[3] == language name ( non international local name, 50 chars).
 *         aHead[4] == language name in english (50 chars)
 *         aHead[5] == language code as xx_XX format, 5 chars.
 *         aHead[6] == number of table entries, or -1
 *
 *         The signature must be chr(3) + "HIL" or "HIT". Hil and Hit file are
 *         intrinsically absolutely identical, except for the fact that a valid
 *         hit file must:
 *         1) have its international entries ordered with ascii() code ascending order.
 *         2) have the number of entries correctly set up
 *         3) never have duplicates in it's table international entries.
 *
 *         An application can know the type of the table loaded by looking at the
 *         last 3 characters of the signature. A file failing to have a correct
 *         signature, not respecting one of the above rules (if it's a HIT file)
 *         or failing to load the table will be closed, and a NIL will be returned.
 *
 *         The second element of the returned array is the translation table; its
 *         is an array of 2 items long arrays, each of which represent the
 *         international (untraslated) string followed by the translated one:
 *
 *         aTable[1] == { "intSting1", "TranslatedString1" }
 *         aTable[2] == { "intSting1", "TranslatedString1" }
 *         ...
 *         aTable[N] == { "intSting1", "TranslatedString1" }
 *
 *         If the file is an HIL, the strings can be in any order, or they could
 *         be even duplicated.
 *
 *         In an HIT, this is not allowed.
 *
 *         The untranslated entry will be an empty string.
 *
 *         On failure, NIL is returned. IF Ferror() is not 0, you can suppose that
 *         an hardware failure happened, else you can suppose that the file is
 *         malformed.
 *
 *
 *      HB_I18NSaveTable( cFileName| nFileHandle, aTable ) --> lSuccess
 *
 *         Saves a table stored in the aTable array. The first element of the
 *         array is the header, while the second element is the translation
 *         table. See HB_I18nLoadTable for details upon the table format.
 *
 *         If yu are saving an HIT table after having modified an HIL,
 *         remember to change the header signature to chr(3) + "HIT", and
 *         to set the number of entries to Len( aTable[2] ). This is necessary
 *         for the appliaction to load faster the HIT table on startup.
 *
 *
 *      HB_I18nSortTable( aTable ) --> aSorted
 *
 *         FUNCTION YET UNTESTED.
 *
 *         This function gets the two entry table in aTable and sorts it on
 *         ascii ascending order of the first element, removing also duplicates
 *         in the table. In other words, it transform a messy hil table in
 *         a table suitable to be saved in a HIT.
 *
 *         NOTE: currently I am doing this with the usual ASort(); this function
 *         would be more efficient, but since array sorting is done in dictionary
 *         files, efficiency should not be a concern.
 *
 *
 *
 *      Low level API
 *      -------------
 *
 *      Yet undocumented. Refer to source/vm/hbi18n.c file to see what those function
 *      are doing; generally they implement low level operations required by higher
 *      level API, and they are all written in C.
 *
 *
 *
 *      Absolutization of path
 *      ----------------------
 *
 *      If you want to execute a small internationalized executable from a
 *      variable location (i.e. if you want to put it in the path and just call it
 *      you can add a code slice like the one in this example to your program:
 *
 *          ...
 *          LOCAL cLang, cProgPath
 *
 *          IF .not. HB_I18nInitialized()
 *             cLang := GetEnv( "LANG" )
 *             IF .not. Empty(cLang)
 *                HB_FnameSplit( hb_argv(0), @cProgPath )
 *                /* You can also put HIT files directly in cProgPath */
 *                HB_I18nSetPath( cProgPath + HB_OSPathSpearator() + "i18n")
 *                HB_I18nSetLanguage( cLang )
 *             ENDIF
 *          ENDIF
 *
 *
 *
 *  $END$
 */

CONTRIBUTORS
============

Giancarlo Niccolai <gian@niccolai.ws>

...
... And whoever wants to join









