Extension B CJKV Characters in Unicode 3.2

There are 42,710 characters in Extension B, and a recent font called sursong.ttf can display around 36,000 of these characters. The bulk of these are rare characters which have not found their way into everyday use in computer encodings, and can be found in the Kangxi Dictionary of 1712, and other CJKV sources.

After installation the sursong.ttf font is referenced by the font face name "Simsum (Founder Extended)". The characters in Extension B are in an area of the Unicode range known as the 'surrogates range'. Internet Explorer 6.x has the ability to process information in this range, and, with the installed SimSum (Founder Extended) font face, it can display most of the CJKV characters in Ext. B.

WinXP's Character Map can access the characters in Simsun 18030 which contains characters in the Extension A range. However when viewing Simsun (Founder Extended), Character Map is unable to access the Extension B characters to copy and paste, and so, a method for viewing, selecting, copying and pasting is needed for those wishing to acces the Ext B characters for use.

I've created a small program which is like a HTML version of Character Map, specially for ExtB characters only. Please note that it is for Windows machinese with a 32 bit architecture. You will also need IE 6.x or similar webbrowser which supports and handles surrogate range data. The program may not run on Win 95 or Win 3.x systems or other operating system platforms.

It will automatically generates HTML pages which contains all the Unicode 3.2 CJK characters in Extension B. Once the HTML webpages are created on your computer, you can access the characters in your webbrowser, and use the select, copy and paste functions available in your webbrowsers.

A batch file accompanies the executable program, and both are zipped into one file called u32-extb.zip. The download size is 22.6 kilobytes. Once downloaded, clicking on the file will activate Winzip which will extracted files into the current directory. It contains two files, u32-extb.exe and u32-run.bat which are 62.8 kilobytes and 49 bytes respectively. You will need the sursong.ttf file installed first to view the characters.

To create the files, you can click on the batch file u32-run.bat which will make the program create a datafile initially, then using the data, make 168 files containing grids of 16x16 arrays of Ext B characters with their associated hex and dec codes. A filename.txt file is created which stores the name of the file being created, and rewritten for each file made. Once u32.bat finishes running the u32-extb.exe program, it will delete the filename.txt file, and the datafile called d-h-u31-b.txt. If you wish to preserve these files for personal inspection, then just run the program u32-ext.exe by clicking on it.

The 168 files includes one index.html file which indexes all the HTML files created from the data file as hyperlinks. The total size of all HTML files together is approximately 4.0 Megabytes. If you keep the d-h-u31-b.txt file, add another 918 kilobytes, so you need at least 5 megabytes of diskspace free.


The program is provided as is, and I make no monetary charge for its download. That is THIS DOWNLOAD IS FREE and the program is distributed as FREEWARE. By downloading the file, from this site, or elsewhere, you agree to hold the author, web site host and distributor free of all consequences from the running of the batch and executable files for creating the HTML pages for displaying Unicode 3.2 Extension B characters on your computer. You may distribute it, but you must post a link to this page to tell the user what the program does.

The author of the program is Dylan W.H. Sung, please email me via this link and report errors, or your comments on this program, and its usefulness to you.


© Dylan W.H. Sung 2005

This page was created on Saturday 12th February 2005.

With thanks to Harlan Messinger on sci.lang who asked intelligent questions to discern what the purpose of the program was so I could write this linkpage more clearly.