For further information about the DOC project, type "DOC" and "Dialects of China" into the search engine http://www.google.com and also look for the file called "DOCUSE7.DOC" which gives information about a previous version of the project, together with more tantalising info about Sino-Xenic extensions too.
UPDATE (Thursday 10th May 2018) : Unfortunately, neocities does not allow zip files, so the links below to the various zip files (here and elsewhere) will not work.Download http://www.dylanwhs.ukgateway.net/download/doc-ipa.zip 455 kb Requirements: Unzip, to get doc-ipa.htm 5062 kb. View in a web browser. You must have Big5 Chinese Traditional fonts, and Lucida Sans Unicode to display IPA characters.
The Dialects of China was made into digital form Dialects On Computer (DOC) under William Wang at Berkeley University of the work called汉语方音字汇 Hanyu Fangyan Zihui
北京大学中国语言文学系 Beijing Daxue Zhongguo Yuyan Wenzixi
语言学教研室 Yuyan Xuejiao Yanshi
文字改革出版社 Wenzi Gaige Chubanshe
first published in 1962, which are consists of 17 dialects of pronunciations of 2700 characters together with the Middle Chinese (MC) phonological categories of each character. The second edition published in 1989 of the same title, but (第二版) ISBN 7-80029-000-X contains three extra dialect, Hefei 合肥, Yangjiang 阳江, and JianOu 建瓯 which are not contained in this edition of DOC.
The MC information comes from the rhyming dictionary Guangyun (GY 廣韻) of 1008 AD keeping the rimes of QieYun (QY 切韻) of 601 AD with a few additions, that defines the Middle Chinese pronunciation of this stage in the development of Chinese.
DOC compilers themselves have added an extra dialect, Shanghai, and also included information of reconstructed pronunciation from Zhongyuan Yinyun (ZY) 中原音韵 by Zhou Deqing 周德清. The latter being as important as the MC data because ZY marks the beginning of early Mandarin with the total loss of Ru tone (入聲) characters.
The following lists the form in which the entries for each character is ordered:
- Fanqie (反切) spelling
- 北 BJ Beijing
- 濟 JN Jinan
- 西 XA Xi'an
- 太 TY Taoyuan
- 漢 WH Wuhan
- 成 CD Chengdu
- 揚 YZ Yangzhou
- 蘇 SZ Suzhou
- 溫 WZ Wenzhou
- 長 CS Changsha
- 雙 SF Shuangfeng
- 南 NC Nanchang
- 梅 MX Meixian
- 廣 GZ Guangzhou
- 廈 XM Xiamen
- 潮 CZ Chaozhou
- 福 FZ Fuzhou
- 上 SH Shanghai
- 中 ZY Zhongyuan Yinyun
An example of Fanqie spelling information given in DOC is
一 臻開三入質影0001 0707
The colouration in the converted HTML version will have all entries in blue.
The structure of the Fanqie spelling is as follows
[Char][space][She Rime][Kai/He][Deng][MC tone][GY Rime][GY Initial][Telegraph Code][CODE]
[Char] 一 BIG5 Chinese Character header [space] space between this and following characters [She Rime] 臻 She 攝 is a broad rime group, of which there are sixteen [Kai/He] 開 Kai 開 / He 合 are two distinctions "open" and "close" of "lip rounding" [Deng] 三 Deng 等 is split into four categories, 1 一, 2 二, 3 三, 4 四 refering to some property of the vowel [MC tone] 入 Ping 平, Shang 上, Qu 去, Ru 入 are the four Middle Chinese tone categories [GY Rime] 質 Guangyun contains 206 rimes, each represented by a character from a particular tone [GY Initial] 影 Guangyun contains many initial characters, and they are dependent on the rime also [Telegraph Code] 0001 Four digit code  to  defines a character uniquely in this early Chinese code for information exchange via the telegraph, and still used for character input of names [CODE] 0707 The first three digits refers to the page number in Hanyu Fangyan Zihui (first edition) and the last digit the character number
An example of a dialect entry in DOC is
黏 廈 1 1 2 [l i a m24] 文The structure of the entries is as follows:
[char] 黏 BIG5 Chinese character [dialect] 廈 as above [variant] 1 0,1,2,... 0 if no variant readings, 1, 2 define different readings (see literary) [tone] 1 1,2,3,4 representing ping, shang, qu, ru tone classes [ying/yang] 2 1,2 representing ying and yang respectively, and blank if there is no split, 3 if further splitting, and "." when the tone class does not have yin/yang registers [initial] l initial consonant, blank if zero initial [medial] medial, blank if no medial [nucleus] i main vowel, blank if syllabic consonant is the rime [offglide] a usually u, though not always, blank if none [nasalisation] indicated by z [ending] m final consonant ending, m, n, ng, p, t, k, ? (glottal stop) or other [tone contour] 24 the tone contour is the pitch at which the syllable is pronounced, originally these were not included in the docmas9.txt file [literary] 文 文 indicates that the Chinese character inidicating a literary reading under this dialect
I have taken the liberty to insert the brackets "[" and "]" as this is the usual IPA convention to indicate a sequence of phonemes. To further highlight the pronunciation, it is written in red, and by colouring the tone contour blue, this contrast makes it easier to read.
The tone contours are taken from my copy of Hanyu Fangyin Zihui. However, it does not contain the Shanghai dialect. The tone values for Shanghai is taken from Shanghai Fangyan Cidian 上海方言詞典, ISBN 7-5343-3122-6. One must also caution that since these tone contours come from a different source, they may differ from the data gathered by the DOC compilers themselves. On inspection of the pronunciation of characters found in DOC document, there are subtle differences in vowels between their data and that of Shanghai Fangyan Cidian. I advise caution in using the data given.
The tones for Zhongyuan Yinyun are the only ones which I don't have tone contours for. This isn't surprising, since the dialect of Zhou Deqing 700 years ago is now dead. I've made the correspondence with DOC notation where the tone classes 1 = Ping 2 = Shang, 3 = Qu, 4 = Ru, and for the yin-yang splitting made "1" = Yin and "2" = Yang, and "3" = other splitting, thus the notation is P, S, Q, R, in combination with 1, 2, 3.
One final comment about the conversion of IPA symbols, there are two specific characters found in DOCIPA.TTF which are not found in unicode, and near substitutes are given. These are the palatised nasal, and the front high mid rounded apical vowel. They are replaced by ɲ and Ч respectively.
There are a number of characters which do not have specific codes for the tone and yin-yang split. This is denoted by X and H respectively. The following are the characters I've found which have these properties.
三 潮 0 X H [ s a z ] 修 成 0 X H [ ɕ i ə u ] 傷 廈 2 X H [ s i u z ] 僵 蘇 0 X H [ ʨ i a ŋ ] 儉 成 0 X H [ ʨ i a m ] 儉 揚 0 X H [ ʨ ɪ z ] 儉 長 0 X H [ ʨ i e z ] 儉 長 0 X H [ ʨ i e z ] 刻 西 0 X H [ kʰ e i ] 刻 西 0 X H [ kʰ e i ] 勁 揚 0 X H [ ʨ i ] 卡 潮 0 X H [ kʰ a ] 廁 西 0 X H [ ʦʰ e i ] 只 雙 2 X H [ t o ] 右 蘇 0 X H [ i ø y ] 吹 蘇 2 X H [ ʦʰ ɥ ] 啞 廈 2 X H [ ɪ k ] 囪 雙 2 X H [ ʦʰ ə n ] 坡 蘇 0 X H [ pʰ u ] 奸 揚 0 X H [ ʨ i ɛ z ] 姜 蘇 0 X H [ ʨ i a ŋ ] 姦 揚 0 X H [ ʨ i ɛ z ] 寫 長 0 X H [ s i e ] 寬 太 0 X H [ kʰ u æ z ] 展 雙 0 X H [ ʨ ɪ z ] 岳 南 0 X H [ ŋ ɔ k ] 崩 成 0 X H [ p ə n ] 幫 廈 2 X H [ p ŋ ] 幼 蘇 0 X H [ i ø y ] 引 揚 0 X H [ i z ] 忽 雙 2 X H [ ʦʰ ə n ] 扯 溫 0 X H [ ʦʰ i ɛ ] 找 雙 0 X H [ ʦ ə ] 披 廈 0 X H [ pʰ i ] 搓 廈 1 X H [ ʦʰ o ] 文 擇 雙 0 X H [ ʨʰ i ɛ ] 救 蘇 0 X H [ ʨ i ø y ] 斑 揚 0 X H [ p ɛ z ] 斟 廈 0 X H [ ʦ i m ] 書 蘇 0 X H [ s ɥ ] 朋 長 2 X H [ p ə n ] 朋 雙 1 X H [ b a ŋ ] 文 朋 雙 2 X H [ b ə n ] 染 長 0 X H [ y e z ] 柚 蘇 0 X H [ i ø y ] 案 長 0 X H [ ŋ a n ] 梅 廈 1 X H [ m u i z ] 文 棚 長 1 X H [ p o ŋ ] 文 棚 長 2 X H [ p ə n ] 檢 成 0 X H [ ʨ i a n ] 淒 南 0 X H [ ʨʰ i ] 炕 廈 1 X H [ kʰ ɔ ŋ ] 文 炕 廈 1 X H [ kʰ ɔ ŋ ] 文 爭 廣 0 X H [ ʧ a: ŋ ] 班 揚 0 X H [ p ɛ z ] 畝 蘇 1 X H [ m ø y ] 文 留 太 0 X H [ l i o u ] 疆 蘇 0 X H [ ʨ i a ŋ ] 盡 揚 0 X H [ ʨ i z ] 研 太 0 X H [ i ɛ ] 祐 蘇 0 X H [ i ø y ] 種 雙 0 X H [ ʨ i n ] 秤 蘇 0 X H [ ʦʰ ə n ] 紗 雙 0 X H [ s o ] 纓 廈 2 X H [ i a z ] 置 西 0 X H [ tʂ ʅ ] 羞 成 0 X H [ ɕ i ə u ] 聰 雙 2 X H [ ʦʰ ə n ] 脂 廈 0 X H [ ʦ i ] 臧 南 0 X H [ ʦ ɔ ŋ ] 舅 蘇 0 X H [ ʥ i ø y ] 與 成 0 X H [ y ] 艱 廈 0 X H [ k a n ] 蔥 雙 2 X H [ ʦʰ ə n ] 蒙 揚 1 X H [ m ɔ u ŋ ] 蒙 揚 2 X H [ m ɔ u ŋ ] 文 薑 蘇 0 X H [ ʨ i a ŋ ] 襟 蘇 0 X H [ ʨ i n ] 襟 南 0 X H [ ʨ i n ] 親 廣 0 X H [ ʧʰ a n ] 評 南 0 X H [ pʰ i n ] 贓 南 0 X H [ ʦ ɔ ŋ ] 跳 南 0 X H [ tʰ i ɛ u ] 車 梅 0 X H [ ʦʰ a ] (In my own Hakka dialect from Hong Kong, this is in the yin ping tone therefore this character ought to be [ ʦʰ a 44 ] according to the Meixian tones. 轉 雙 0 X H [ t u i z ] 近 揚 0 X H [ ʨ i z ] 釉 蘇 0 X H [ i ø y ] 間 潮 0 X H [ ʦʰ o i z ] 隱 揚 0 X H [ i z ] 雕 太 0 X H [ t i a u ] 雹 成 0 X H [ p a u ] 頒 揚 0 X H [ p ɛ z ] 顆 西 0 X H [ kʰ u o ] 髒 南 0 X H [ ʦ ɔ ŋ ]
There is one truncated entry in my version of DOCIPA, and this concerns the MC entry 帶 蟹開一去代泰端1601 10 where final CODE should be 1052. The original document mistakenly included the character 代 (after comparing with HYFYZH 2nd Ed). The corrected line should read
帶 蟹開一去泰端1601 1052
Moreover, there are "MC entries" without any MC phonological information, and this occurs in HYFYZH 2nd Ed, and not a mistake. Examples include:
炸 3498 0058 卡 0595 0094 拼 2210 2098 棍 2760 2184 綁 4834 2226 擋 2346 2245 槓 2850 2307 晃 2270 2396 碰 4314 2415 另 0659 2557 哄 0758 2709
The CODE found in the first edition which refers to the page number and entry number differs greatly with the second edition, since the second edition has been reordered and new entries have been added. HYFYZH 2nd Edition has 370 pages each with a list of 8 characters, forming 2960 entries, in docmas9.txt there are only 2713 MC entry headers.
In order to investigate the entries, I extracted the MC headers and sorted them according to CODE and they were indeed similar to HYFYZH 2nd Ed entries. The following is a sample result where an extra column of four digit numbers representing the page number (first 3 digits) and entry of a character (last digit)in the 2nd Edition is compared to docmas9.txt information.
巴 假開二平麻幫1572 0011 0011 疤 假開二平麻幫4002 0011 0012 八 山開二入黠幫0360 0012 0013 拔 山開二入黠並2149 0013 0014 把 假開二上馬幫2116 0014 0015 壩 假開二去禡幫1100 0015 0017 霸 假開二去禡幫6011 0015 0021 爸 假開二去禡幫3640 0016 0016 罷 蟹開二上蟹並5007 0017 0022 爬 假開二平麻並3632 0018 0023 怕 假開二去禡滂1830 0019 0024 媽 假開二平麻明1265 0021 0025 罵 假開二去禡明5006 0024 0031 發 山合三入月非4099 0025 0032 乏 咸合三入乏奉0040 0026 0033 罰 山合三入月奉5000 0027 0035 法 咸合三入乏非3127 0028 0036 搭 咸開一入合端2290 0031 0038 答 咸開一入合端4594 0032 0041 打 梗開二上梗端2092 0034 0043 大 果開一去箇定2192 0035 0044 他 果開一平歌透0100 0036 0045 塌 咸開一入盍透1042 0037 0046 塔 咸開一入盍透1044 0038 0047 拿 假開二平麻泥2169 0041 0053 納 咸開一入合泥4780 0042 0054 拉 咸開一入合來2139 0043 0055 臘 咸開一入盍來5248 0044 0056 紮 山開二入黠莊4796 0046 擦 山開一入曷清2361 0048 0063 渣 假開二平麻莊3257 0051 0065 扎 咸開二入洽莊2089 0052 0061 炸 咸開三入洽崇3498 0054 0068 札 山開二入黠莊2610 0055 0071 炸 3498 0058 0075 榨 假開二去禡莊2834 0059 0074 乍 假開二去禡崇0038 0061 叉 假開二平麻初0643 0062 0076 差 假開二平麻初1567 0063 0077 插 咸開二入洽初2252 0064 0078 茶 假開二平麻澄5420 0065 0081 搽 假開二平麻澄2258 0066 查 假開二平麻崇2686 0067 0082 察 山開二入黠初1390 0068 0083 岔 假開二去禡初1479 0069 0085 差 假開二去禡初1567 0071 0084 沙 假開二平麻生3097 0072 0086 紗 假開二平麻生4784 0073 0087 殺 山開二入黠生3010 0074 0088 傻 假合二上馬生0247 0075 0092 加 假開二平麻見0502 0076 0095 嘉 假開二平麻見0857 0077 0096 家 假開二平麻見1367 0078 0097 佳 蟹開二平佳見0163 0079 0098 夾 咸開二入洽見1140 0081 0101 裌 咸開二入洽見6005 0082 假 假開二上馬見0250 0083 0103
The gaps in the final column indicate that the corresponding character is not in the vicinity of the current page. I looked back two or three pages, and again forward two or three pages, if it was not there, I left a blank. If I finish coding the list, I may release it as a separate file in future, but currently, it remains unfinished. When using HYFYZH 2nd Ed to compare docmas9.txt (or doc-ipa.htm), one must remember that the former is in simplified characters whilst the electronic version uses traditional fonts. This has added to the problem of finding the characters in the book.
In the previous version of doc-ipa, I made a mistake in tones of Taiyuan tones 3 (Qu) and 42 (YangRu), which have been corrected. This showed up when I found pronunciations without tone numbers against them. If you find more similar differences, you can email me below, with the subject being "SHW-NALYD doc-ipa.htm". The current version of doc-ipa is now smaller, having eliminated more space characters, and stands at 5062kb.
With thanks to Kobo-Daishi, PLLA for correcting the comment on CODE.
© Dylan W.H. Sung 2004
This page was created on Saturday 17th January 2004
and recently update on Friday 23rd January 2004.