CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing

CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing

Ken Lunde

Language: English

Pages: 900

ISBN: 0596514476

Format: PDF / Kindle (mobi) / ePub

First published a decade ago, CJKV Information Processing quickly became the unsurpassed source of information on processing text in Chinese, Japanese, Korean, and Vietnamese. It has now been thoroughly updated to provide web and application developers with the latest techniques and tools for disseminating information directly to audiences in East Asia. This second edition reflects the considerable impact that Unicode, XML, OpenType, and newer operating systems such as Windows XP, Vista, Mac OS X, and Linux have had on East Asian text processing in recent years.

Written by its original author, Ken Lunde, a Senior Computer Scientist in CJKV Type Development at Adobe Systems, this book will help you:

  • Learn about CJKV writing systems and scripts, and their transliteration methods
  • Explore trends and developments in character sets and encodings, particularly Unicode
  • Examine the world of typography, specifically how CJKV text is laid out on a page
  • Learn information-processing techniques, such as code conversion algorithms and how to apply them using different programming languages
  • Process CJKV text using different platforms, text editors, and word processors
  • Become more informed about CJKV dictionaries, dictionary software, and machine translation software and services
  • Manage CJKV content and presentation when publishing in print or for the Web

Internationalizing and localizing applications is paramount in today's global market -- especially for audiences in East Asia, the fastest-growing segment of the computing world. CJKV Information Processing will help you understand how to develop web and other applications effectively in a field that many find difficult to master.
















hyôjun chôsakai). This is the name of the governing body that establishes JIS standards and publishes manuals through JSA. The committee that develops and writes each JIS manual is composed of people from Japanese industry who have a deep technical background in the topic to be covered by the JIS manual. Committee members are listed at the end of each JIS manual. JSA stands for Japanese Standards Association (日本規格協会 nihon kikaku kyôkai). This organization publishes the manuals for the JIS

2312-80! This hanzi is in both GB 7589-87 (22-51) and GB 8565.2-88 (15-93). The reason why the hanzi 囉 was included in GB/T 12345-90 is due to an error in the 1956 draft version of 简化字总表 ( jiänhuàzì zöngbiäo; later corrected in the 1964 version) whereby the two hanzi 羅 and 囉 were mistakenly labeled as traditional Coded Character Set Standards 87 forms of the simplified hanzi 罗 (34-62 in GB2312-80)—see Table 3-1 on page 68. Only the hanzi 羅 is the true traditional form of the simplified hanzi

(GB5) These 103 additional hanzi occupy all of row 88 (94 hanzi) and the first part of row 89 (9 hanzi). An oddball character set standard in this regard is GB 8565.2-88—it is sometimes referred to as GB8. The hanzi in GB 7589-87 and 7590-87 (this also applies, of course, to their traditional analogs, specifically GB/T 13131-9X and GB/T 13132-9X) are ordered by radical, then total number of strokes, and begin allocating characters at row 16. GB 7589-87 was established on December 1, 1987, and

and so on. As can be seen from our example, the aim of this book is highly practical. The author has a very full grasp of the real needs of such diverse users as software developers, lexicographers, and language learners, and provides detailed information for each need with great clarity and precision. I am fully confident that this book shall become an invaluable source of information to everyone interested in CJKV information processing. Jack Halpern (春°雀來 ) Editor in Chief, CJK Dictionary

for Version 2.0, 28,301 for Version 1.0, and * This distinction is somewhat meaningless now that UTF-16 is part of both standards. 122 Chapter 3: Character Set Standards 34,168 for Version 1.1). The character space for Unicode is set in a 256×256 matrix. Table 3-76 details, by row, how many characters are currently assigned to Unicode (Version 2.1). Table 3-76: The Unicode Version 2.1 Character Set Row Characters 0–16 2,499 17 18–29 240 0 30–31 479 32–39 1,378 40–47 0 48–51 842

Download sample