[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fw: native encoding Font support in CJK PS printing

To: <locale@sensi.org>
Subject: Fw: native encoding Font support in CJK PS printing
From: "Alexander Voropay" <a.voropay@globalone.ru>
Date: Fri, 26 May 2000 16:43:09 +0400
Reply-To: locale@sensi.org
Sender: owner-locale@sensi.org
Hi!

 Иногда полезно знать японский взгляд на вещи... ;)
Да, конференция тоже неплохая.


-----Original Message-----
From: Masaki Katakai <katakai@sun.co.jp>
Newsgroups: netscape.public.mozilla.i18n
To: mozilla-i18n@mozilla.org <mozilla-i18n@mozilla.org>
Date: 25 ?? 2000 ? 19:34
Subject: native encoding Font support in CJK PS printing


Hi,

It seems that my original mail could not be sent to this
mailing list due to the large attachment(?). I'm very sorry.
I have prepared URLs for the examples and screen shots.

I'm working on native PostScript font and native encoding
support for Mozilla CJK printing. The code change has not been
finished yet, but I'd like to explain my approach here and want to
know your comments and suggestions.

Mozilla's CJK printing on Unix platform now doesn't work by just
clicking Print button. The default behavior requires the
post-processing that external filters and external TrueType fonts
because the outputs use unicode code point and don't use native
encoding PostScript fonts. When we consider users who have native
PostScript printer or who have installed native PostScript fonts
into ghostscript, it will be much inconvenient to configure
(download filter and large TrueType fonts, and install the files,
etc.) and do the post-processing.

In Japan, japanese PostScript printers are widely used. Also
ghostscript of major Linux distribution already contains japanese
native PostScript fonts. Most japanese users would not need the
post-processing. Most users want printing just by click on Print
button. Also the native PostScript fonts support is being
supported in Netscape 4.x. So Mozilla should support this feature.

Here is my approach, which is same as Netscape 4.x, but I try to
add unicode CID font support. This shouldn't break the existing
external filters and, of course, shouldn't break ascii printing. This
will apply to the cases when the charset encoding of document is
not "utf-8", such as "euc-jp" and "euc-kr".

   * Native ps fonts and its encoding are defined in prefs.js
     Taking the same approach as Netscape 4.x.

     user_pref("print.psnativeencoding.<charset>","<target_charset>");
     user_pref("print.psnativefont.<charset>","<fontname>");

     For example, prefs for japanese euc-jp can be defined as
     follows,

     user_pref("print.psnativeencoding.euc-jp","euc-jp");
     user_pref("print.psnativefont.euc-jp","Ryumin-Light-EUC-H");

   * Continue to use unicodeshow interface with unicode based code
     point
     It means this approach will not break the existing filters
     such as ttf2ps and wprint. The unicode code point can be used
     with unicode based CID fonts too, so I keep the code point as
     unicode.

   * Unicode based CID fonts can be supported
     Unicode based CID fonts would be popular in CJK. Mozilla
     should have the support. UniJIS CMaps are available for
     japanese. UniKS for Korean, UniCNS and UniGB are available
     for Chinese. When printer and ghostscript don't have them, we
     can install unicode based CMap to the targets easily. The
     unicode code point can be used with the CMap.

     user_pref("print.psunicodefont.euc-jp","Ryumin-Light-UniJIS-UCS2-H");

   * Code conversion for native encoding in Mozilla
     Code conversion from unicode to native (e.g. euc-jp) is
     performed in Mozilla and define a dictionary of the
     conversion table in outputs.

     /Unicode2NativeDict 0 dict def

     Unicode2NativeDict 12363 42155 put % 0x304b -> 0xa4ab

     the native code point can be retrieved from the dictionary
     with the code point at runtime,

     Unicode2NativeDict 12363 get % get 42155

I'd like to explain how unicodeshow() is drawing a japanese glyph
0x304b (EUC 0xa4ab). The following lines will be generated from
Mozilla.

  ...
  /NativeFont     /Ryumin-Light-EUC-H     def
  ...
  Unicode2NativeDict 12363 42155 put % 0x304b -> 0xa4ab
  (\113\060) unicodeshow

  The outputs are processed by PostScript engine (printers or
  ghostscript),

  1. check Unicodedict and check the rendering routine of the
     character is defined, use the rendering routine for
     rendering.
     unicode code point 0x304b is used. In this case, Unicodedict
     is not defined, so go to the next step.

  2. when Unicode based CID font (e.g. Ryumin-Light-UniJIS-UCS2-H)
     is defined and can be used in PostScript VM (printer or
     ghostscript) use the CID font and the unicode code point for
     rendering
     The checking of font availability can be done at runtime.
     unicode code point 0x304b is used in show with the UCS2 based
     CID fonts.
     In this case, it is not defined, so go to the next step.

  3. when native PostScript font (e.g. Ryumin-Light-EUC-H) is
     defined and can be used in PostScript VM, use the native
     PostScript font and native code point for rendering
     EUC code 0xa4ab for the 0x304b is defined in
     Unicode2NativeDict. The code conversion is already performed
     in Mozilla at printing. It's easy to get the code from the
     dictionary and use it.

     Unicode2NativeDict 12363 get % get EUC 0xa4ab

     In this example, both native PostScript font and the code
     point are defined, so the character is rendered by

     <a4ab> show

     with Ryumin-Light-EUC-H.

  4. draw a rectangle as undefined character
     The character isn't rendered in the step 1) - 3), we need to
     draw the undefined glyph. I've fixed "han" problem. All
     undefined character will be drawn as a rectangle.
     Please try the examples in attachment on English PostScript
     printers or English ghostscript, you will see rectangles.

  We need an extra preference of controlling the order of step 1)
  and 3). For example, in japanese locale, Mozilla will check the
  native PostScript font is usable first before rendering by
  Unicodedict.

    user_pref("print.psfontorder.euc-jp", 1);
                // 0: check Unicodedict first
                // 1: check native PostScript font first


By the changing of above, Mozilla will be able to print japanese
characters as well as Netscape 4.x.


For utf-8 charset, what do you think about the case below? We need
to consider carefully the case when the target encoding is "utf-8".

  1. a japanese user started Mozilla in japanese locale

  2. japanese document (written in euc-jp, iso-2022-jp, shift-jis) could
     be printed out properly
     yes, native PostScript font is now supported

  3. japanese document (but written in utf-8) could not be printed out
     Users need to prepare post-processing... It's inconvenient.
     Most users don't want to care the encoding of target document
     because the user started Mozilla in japanese locale.

  Is it acceptable? I don't think so. So I believe we should support
  native PostScript font and native encoding support even in
  "utf-8". Even if charset is "utf-8", Mozilla is running in
  japanese locale, so can we assume the document is "japanese"? I
  suppose it is possible to define the following lines in ja-JP
  preference. Note that these are only for ja-JP, which will be
  available only when Mozilla is started in japanese locale.

    user_pref("print.psnativeencoding.utf-8","euc-jp");
    user_pref("print.psnativefont.utf-8","Ryumin-Light-EUC-H");
    user_pref("print.psunicodefont.utf-8","Ryumin-Light-UniJIS-UCS2-H");

  Also, ko-KR preferences would be the following,

    user_pref("print.psnativeencoding.utf-8","euc-kr");
    user_pref("print.psnativefont.utf-8","Haeseo-KSC-EUC-H");
    user_pref("print.psunicodefont.utf-8","Haeseo-UniKS-UCS2-H");

  It would be better that we can define the code range that valid
  code point for the specified font. Characters that within the
  range will be drawn by the specified fonts, but the other characters
  will be rendered by Unicodedict if it's defined.

  user_pref("print.psunicoderange.utf-8","<from_code><to_code>,
                <from_code><to_code>,....");


Do you think this will work for your locale? What common PS font
does your locale have in ps printer and ghostscript? Any comments
and suggestions would be appreciated.

6 files for your reference.

   * http://village.infoweb.ne.jp/~katakai/mozilla/nsPostScriptObj.cpp.txt
     Temporary patch for gfx/src/ps/nsPostScriptObj.cpp. Note that
     the code change isn't finished and psunicoderange feature
     isn't implemented, charset name and font name are hardcoded.
     It's just for reference.

     Btw, does anyone know how to get the charset name (e.g.
     euc-jp and utf-8) of the target documentation in gfx/src/ps/
     or gfx/src/gtk/nsDeviceContextGTK.cpp?

   * http://village.infoweb.ne.jp/~katakai/mozilla/yahoo_ja.ps
     outputs of www.yahoo.co.jp

   * http://village.infoweb.ne.jp/~katakai/mozilla/yahoo_ja.jpg
     snapshot of ghostscript with japanese font

   * http://village.infoweb.ne.jp/~katakai/mozilla/yahoo_ja_undef.jpg
     snapshot of ghostscript without japanese font. Now rectangles
     could be drawn instead of "han".

   * http://village.infoweb.ne.jp/~katakai/mozilla/yahoo_ko.ps
     outputs of kr.yahoo.com. I assume Haeseo-KSC-EUC-H as Korean
     fonts

   * http://village.infoweb.ne.jp/~katakai/mozilla/yahoo_ko.jpg
     snapshot of ghostscript with korean font

Thanks,
Masaki
Prev by Date: Re: li18nux
Next by Date: Re: KOI8-R
Prev by thread: Re: li18nux
Next by thread: Qt 2.1.0
Index(es):
- Date
- Thread