Chinese Orthodox Translation Project 中華正教翻譯計劃
Project Vision | For the Press | Projects | Help Translate | Donate/Sponsor a day | With Prayers | Dictionaries | What's New?

Converting a Chinese Unicode word document to Chinese web page
The follow steps assures us of a successful conversion of a Microsoft Word document that may have mixed Simplified, Traditional or variant characters encoded in Unicode to be properly converted into a web page with standard Traditional or Simplified character display.
  1. First, save the word document in rtf format.
  2. Then open the Universal Code Converter included with NJStar Communicator evaluation software (http://njstar.com/).
For Traditional Chinese
  1. Select both input and output code as Unicode RTF and check Traditional Chinese under Options.
  2. Click Convert Files.
  3. This dump the converted file into CONV_URT subfolder with file of same name.
  4. Open the converted rtf file in Microsoft Word
  5. Save it as web page, filtered.
  6. Open the html filtered file with Internet Explorer
  7. Save as Web Page, HTML only (*.htm;*.html) with same file name, with Encoding: Chinese Traditional (Big5)
  8. Then view source to launch the html page in notepad
  9. Search for &# (ampersand pound sign) and it should not pull up any matches for HTML entities, which means a successful conversion from Unicode to Big5.
  10. If there are HTML entities present, you can copy the entity, eg 别 to the Decimal NCRs field at the Unicode Code Converter page at http://people.w3.org/rishida/scripts/uniview/conversion and hit tab to show the hexadecimal code point of 522B
  11. Paste the hexadecimal code point, eg 522B, into the Unihan Search database at http://www.unicode.org/charts/unihan.html and click Lookup.
  12. When the results page show up, scroll to the bottom and you will see if there are any traditional character variant.
  13. If there is no variant, then leave the HTML entity alone.
  14. If there is a traditional variant, click on it and see if there is a corresponding big5 code point for it and copy the character, eg. 別 from Your Browser field into your rtf document and reconvert.
For Simplified Chinese
  1. Select both input and output code as Unicode RTF and check Simplified Chinese under Options.
  2. Click Convert Files.
  3. This dump the converted file into CONV_URT subfolder with file of same name.
  4. Open the converted rtf file in Microsoft Word
  5. Save it as web page, filtered.
  6. Open the html filtered file with Internet Explorer
  7. Save as Web Page, HTML only (*.htm;*.html) with same file name, with Encoding: Chinese Simplified (gb2312)
  8. Then view source to launch the html page in notepad
  9. Search for &# (ampersand pound sign) and it should not pull up any matches for HTML entities, which means a successful conversion from Unicode to gb2312.
  10. If there are HTML entities present, you can copy the entity, eg &21029; to the Decimal NCRs field at the Unicode Code Converter page at http://people.w3.org/rishida/scripts/uniview/conversion and hit tab to show the hexadecimal code point of 5225
  11. Paste the hexadecimal code point, eg 5225, into the Unihan Search database at http://www.unicode.org/charts/unihan.html and click Lookup.
  12. When the results page show up, scroll to the bottom and you will see if there are any simplified character variant.
  13. If there is no variant, then leave the HTML entity alone.
  14. If there is a simplified variant, click on it and see if there is a corresponding gb2312 code point for it and copy the character, eg. 别 from Your Browser field into your rtf document and reconvert.

You may also need to manually modify the document to use the appropriate punctuation marks and numbers depending on whether it's Traditional or Simplified.