The character pronunciations in Hakka come from 9 different sources, but to compound the problem, the tone system is not straight forward. Here, we take a selection of characters (一二三四五六七八九零十百千萬億加減乘除) and list their entries in ZDIC and sort them out according to their tone numbering. We shall find 3 basic systems in use. In the second half of this essay, we delve into the origins of these character listing, and show you precisely who we should credit for making them available in the first place.
Character | 客英字典 | 梅县腔 | 东莞腔 | 宝安腔 | 沙头角腔 | 客语拼音字汇 | 台湾四县腔 | 海陆丰腔 | 陆丰腔 | Tone | |
加 ga1 | ga1 | ga1 | ga1 | ga1 | ga1 | ga1 | ga1 | ga1 | ga1 | Yin Ping | |
三 sam1 | sam1 sam5 | sam1 sam5 | sam1 | sam1 sam3 | sam1 | sam1 | sam1 sam3 | sam1 sam3 | sam1 | ||
千 cien1 | cien1 | cien1 | cen1 | cen1 | cien1 | qian1 | cien1 | cien1 | tsian1 | ||
乘 sin2 | shin2 shin5 | shin1 shin5 | sin2 | sin1 sin2 sin3 sin5 | sin5 | sin2 sin4 sin4 | sen1 sen2 sen5 | shin1 shin2 shin5 | shin5 | Yang Ping | |
除 cu2 | chu2 | chu2 | ciu2 | cu2 | cu2 | cu2 | cu2 | chu2 | chu3 | ||
零 lang2 | lang2 len2 len1 | lan3 lang2 lin2 | lin2 | lang2 | lang2 | lang2 | lang2 len2 | lang2 len2 | lang3 | ||
減 gam3 | gam3 | gam3 | gam3 | gam3 | gam3 | gam3 | gam3 | gam3 | Shang Sheng | ||
五 ng3 | ng3 | ng3 | ng3 | ng3 | ng3 | ng3 | ng3 | ng3 | ng3 | ||
九 giu3 | giu3 | giu3 | giu3 | giu3 | kieu3 | giu3 | giu3 | giu3 | giu3 | ||
四 si4 | si5 | si5 | si5 | si5 | si5 | xi4 | si5 | si5 | si5 | Qu Sheng | Yin Qu |
億 yi4, yit6 | ji5 jit7 | ji5 | jit8 | jit8 ji5 | yi4 | ji5 jit8 | ri5 rit8 | jit8 | |||
二 ngi4 | ngi5 | ngi5 ng5 | ngi5 | ngi5 | gni5 | ngi4 | ngi5 | ngi6 | gni6 | Yang Qu | |
萬 van4 | van5 | wan5 | man3 | man3 wan3 | man5 wan5 | man4 van4 | wan5 | wan6 | wan6 | ||
一 yit5 | jit7 | jit7 | jit7 | jit7 | jit7 | yid5 | jit7 | rit7 | jit7 | Yin Ru | |
六 liuk5 | liuk7 | liuk7 | luk7 | luk7 | luk7 | liug5 lug5 | liuk7 | liuk7 | liuk7 | ||
七 cit5 | cit7 | cit7 | cit7 | cit7 | qid5 | cit7 | cit7 | cit7 | |||
八 bat5 | bat7 | bat7 | bat7 | bat7 | bat7 | bad5 | bat7 | bat7 | bat7 | ||
百 bak5 | bak7 | bak7 | bak7 | bak7 | bak7 | bag5 | bak7 | bak7 | bak7 | ||
十 sip6 | ship8 | ship8 | sip8 | sip8 | sip8 | sib6 | siip8 | ship8 | ship8 | Yang Ru | |
億 yi4, yit6 | ji5 jit7 | ji5 | jit8 | jit8 ji5 | yi4 | ji5 jit8 | ri5 rit8 | jit8 |
This can be summarised as follows:
(1) 客英字典 (2) 东莞腔 (5) 梅县腔 (6) 台湾四县腔 (8) 宝安腔 (9) 沙头角腔 | 1. 陰平 | 2. 陽平 | 3. 上聲 | 5. 去聲 | 7. 陰入 | 8. 陽入 | |
(7) 客语拼音字汇 | 1. 陰平 | 2. 陽平 | 3. 上聲 | 4. 去聲 | 5. 陰入 | 6. 陽入 | |
(3) 海陆丰腔 (4) 陆丰腔 | 1. 陰平 | 2. 陽平 | 3. 上聲 | 5. 陰去 | 6. 陽去 | 7. 陰入 | 8. 陽入 |
ZDIC is arguably one of the more popular online Chinese character resources on the internet. At one time it's sister site, longwiki.net (which stopped functioning around 2010) had user input which eventually found its way onto ZDIC. The popularity of ZDIC is down to the convenience of the collected information there. Chinese character data also includes dialect data from Cantonese, Hakka and Chaozhou and Mandarin as well as historical reconstructions of pronunciations from the Middle and Old Chinese, Early Mandarin stages of Chinese, and Sino-Xenic data like the Rimes in Mongol Script and other Chinese character rime dictioanries.The reconstructions of Middle Chinese from many past and current scholars makes ZDIC a conveninet first port of call for many who do not have physical access the data to hand. 高本漢 Bernhard Karlgren, 王力 Li Wang, 李榮 Rong Li, 邵榮芬 Rongfen Shao, 鄭張尚芳 Shangfang ZhengZhang, 潘悟雲 Wuyun Pan, 蒲立本 Edwin G. Pulleyblank, as well as amateur contributions such as 古韻羅馬字 from the Polyhedron distribution, and 有女羅馬字 from a different perspective.
As the Polyhedron distribution was open source, many of the other dialect data must also have come from similar sources. We note that there are other site which offer the same data, for example, Ctext and indeed, the Hakka dialect data that is used in ZDIC comes directly from Thomas Chin's dictionary website at Chinese Language. Chin's online dictionary has been online since 1994, and he made available his entire database for users to download, from which it is easy to verify that the data in ZDIC is a direct copy, only disguised by the fact that the data fields are changed into Chinese.
Specimen data from Thomas Chin's CCDICT 4.3.0 (2003)
Specimen data from Thomas Chin's CCDICT 5.0.0 (2005)一 0 1 0 1 [Sathewkok] jit7 [Lau Chunfat] yid5 [Hailu] rit7 [Bao'an] jit7 [MacIver] jit7 [Siyan] jit7 [Meixian] jit7 [Dongguan] jit7 [Lufeng] jit7 jat1 yi1 yi4 yi2 il [1] one; unit [2] once [3] as soon as; once [4] one principle [5] single; alone [6] a; an; the [7] each; per; every time [8] another [9] whole; all; throughout; complete [10] union; uniformity; uniform [v] unify; unite [11] so; such; to such extent [12] same; together [13] a little
Specimen data from Thomas Chin's CCDICT 5.1.1 (2006)U+4E00.0 fCNS11643 1-4421 U+4E00.0 fBig5 A440 U+4E00.0 fGB 0-523B U+4E00.0 fR/S 1.0 U+4E00.0 fTotalStrokes 1 U+4E00.0 fCangjie M U+4E00.0 fHakka jit7 [1,2,4,5,6,8,9] yid5 [7] rit7 [3] U+4E00.0 fCantonese jat1 U+4E00.0 fMandarin yi2 yi4 yi1 U+4E00.0 fEnglish [1] one; unit [2] once [3] as soon as; once [4] one principle [5] single; alone [6] a; an; the [7] each; per; every time [8] another [9] whole; all; throughout; complete [10] union; uniformity; uniform [v] unify; unite [11] so; such; to such extent [12] same; together [13] a little
U+4E00.0 fUTF8 一 U+4E00.0 fCNS11643 1-4421 U+4E00.0 fBig5 A440 U+4E00.0 fGB 0-523B U+4E00.0 fR/S 1.0 U+4E00.0 fTotalStrokes 1 U+4E00.0 fCangjie M U+4E00.0 fFourCorner 10000 U+4E00.0 fHakka jit7; rit7 U+4E00.0 fCantonese jat1 U+4E00.0 fMandarin yi1 U+4E00.0 fEnglish [1] one; unit [2] whole; all; throughout; complete [3] one principle [4] once; as soon as [5] single; alone [6] slightly; a little [7] a; an; the [8] each; per; every time [9] unify; unite [10] same; together [11] union; uniformity; uniform [12] so; such; to such extent [13] another [14] once [15] a little [16] a Chinese family name
From the above successive builds, we see Thomas Chin modified his listing to remove the sources in later builds, but in the earliest specimen above CCDICT 4.3.0 we see a direct correspondence to the data that ZDIC used for its own implementation. It is probably this, or a similar build of the database from whence the ZDIC data is obtained.
Let's compare the entry of the Chinese character 一 from our specimen data and that currently available in ZDIC
Thomas Chin's Hakka Data | ZDIC data today | Notes | ||
[Sathewkok] | jit7 | [沙头角腔] | jit7 | 沙头角 ShaTouJiao/沙頭角 ShaTauKok or Sathewkok is a town in the north east corner of Hong Kong |
[Lau Chunfat] | yid5 | [客语拼音字汇] | yid5 | Lau Chunfat (Liu Zinfat) 劉鎮發/刘镇发 is the author of 客語拼音匯/客语拼音字汇 Hakka Pinyim Dictionary (Hag4 Ngi1 Pin1 Yim1 Su4 Fui4) |
[Hailu] | rit7 | 海陆丰腔] | rit7 | 海陸/海陆 Hailu/HoiLiuk refers to Hakka dialects in Taiwan that come from 海豐/海丰 HoiFung and 陸豐/陆丰 LukFung districts in south eastern Guangdong Province |
[Bao'an] | jit7 | [[宝安腔] | jit7 | 寶安/宝安 BaoAn/Bau-On covers Hong Kong and most of modern day Shenzhen |
[MacIver] | jit7 | [客英字典] | jit7 | Donald MacIver and Manfred McKenzie of [客英字典] A Chinese – English Dictionary, Hakka-Dialect As Spoken in Kwantung Province. |
[Siyan] | jit7 | [台湾四县腔] | jit7 | 四縣/四县 Siyan are a group of dialects in Taiwan that correspond to the dialects of north eastern Guangdong |
[Meixian] | jit7 | [梅县腔] | jit7 | 梅縣/梅县 Meixian/Moiyan/Moiyen dialect is the paragon dialect of Hakka |
[Dongguan] | jit7 | [东莞腔] | jit7 | 東莞/东莞 Dongguan/DungGon is an area that overlaps Shenzhen and is very similar to the BaoAn dialect |
[Lufeng] | jit7 | [陆丰腔] | jit7 | 陸豐/陆丰 Lufeng/LukFung is a district in south eastern Guangdong Province. |
There is no difference between the two other than the translation of the romanised data fields for simplified character data fields.
Where do these readings come from, and are they exactly the same as the original source from where it was copied into the database? The answer is no. The tone numbering itself is one reason that makes the body of data in Chin's Hakka database his work. If you have read the other parts of my blog, you'll know that I own many Hakka resources, amongst which are Henry Henne's published works on the Sathewkok Hakka, MacIver's Hakka Dictionary, and Bennet M. Lindauer translation of Simon Hartwich Schaank's The Loeh-Foeng Dialect (Lufeng), Phang TetHsiu's Hakka Pronunciation of Chinese Character Dictionary etc, and Chunfat Lau's Hakka Pinyim Dictionary amongst other resources. Apart from the latter, the others use diacritics to indicate tones. For our character 一, more often than not these sources wrote yit or yid rather than jit as in Chin's data. It is safe to say, that rather than copy wholesale, other than than Lau's transcription, Chin actively converted the reading he found in his sources to his own romanised format. So, it is fair to say that the body of work in the Hakka dialect database used in ZDIC is Chin's actual original work.