Hakka dialect data on ZDIC, and it's origins

The character pronunciations in Hakka come from 9 different sources, but to compound the problem, the tone system is not straight forward. Here, we take a selection of characters (一二三四五六七八九零十百千萬億加減乘除) and list their entries in ZDIC and sort them out according to their tone numbering. We shall find 3 basic systems in use. In the second half of this essay, we delve into the origins of these character listing, and show you precisely who we should credit for making them available in the first place.

Character 客英字典 梅县腔 东莞腔 宝安腔 沙头角腔 客语拼音字汇 台湾四县腔 海陆丰腔 陆丰腔 Tone
加 ga1 ga1 ga1 ga1 ga1 ga1 ga1 ga1 ga1 ga1 Yin Ping
三 sam1 sam1 sam5 sam1 sam5 sam1 sam1 sam3 sam1 sam1 sam1 sam3 sam1 sam3 sam1
千 cien1 cien1 cien1 cen1 cen1 cien1 qian1 cien1 cien1 tsian1
乘 sin2 shin2 shin5 shin1 shin5 sin2 sin1 sin2 sin3 sin5 sin5 sin2 sin4 sin4 sen1 sen2 sen5 shin1 shin2 shin5 shin5 Yang Ping
除 cu2 chu2 chu2 ciu2 cu2 cu2 cu2 cu2 chu2 chu3
零 lang2 lang2 len2 len1 lan3 lang2 lin2 lin2 lang2 lang2 lang2 lang2 len2 lang2 len2 lang3
減 gam3 gam3 gam3 gam3 gam3 gam3 gam3 gam3 gam3 Shang Sheng
五 ng3 ng3 ng3 ng3 ng3 ng3 ng3 ng3 ng3 ng3
九 giu3 giu3 giu3 giu3 giu3 kieu3 giu3 giu3 giu3 giu3
四 si4 si5 si5 si5 si5 si5 xi4 si5 si5 si5 Qu ShengYin Qu
億 yi4, yit6 ji5 jit7 ji5 jit8 jit8 ji5 yi4 ji5 jit8 ri5 rit8 jit8
二 ngi4 ngi5 ngi5 ng5 ngi5 ngi5 gni5 ngi4 ngi5 ngi6 gni6 Yang Qu
萬 van4 van5 wan5 man3 man3 wan3 man5 wan5 man4 van4 wan5 wan6 wan6
一 yit5 jit7 jit7 jit7 jit7 jit7 yid5 jit7 rit7 jit7Yin Ru
六 liuk5 liuk7 liuk7 luk7 luk7 luk7 liug5 lug5 liuk7 liuk7 liuk7
七 cit5 cit7 cit7 cit7 cit7 qid5 cit7 cit7 cit7
八 bat5 bat7 bat7 bat7 bat7 bat7 bad5 bat7 bat7 bat7
百 bak5 bak7 bak7 bak7 bak7 bak7 bag5 bak7 bak7 bak7
十 sip6 ship8 ship8 sip8 sip8 sip8 sib6 siip8 ship8 ship8 Yang Ru
億 yi4, yit6 ji5 jit7 ji5 jit8 jit8 ji5 yi4 ji5 jit8 ri5 rit8 jit8

This can be summarised as follows:

(1) 客英字典
(2) 东莞腔
(5) 梅县腔
(6) 台湾四县腔
(8) 宝安腔
(9) 沙头角腔
1. 陰平 2. 陽平 3. 上聲 5. 去聲 7. 陰入 8. 陽入
(7) 客语拼音字汇 1. 陰平 2. 陽平 3. 上聲 4. 去聲 5. 陰入 6. 陽入
(3) 海陆丰腔
(4) 陆丰腔
1. 陰平 2. 陽平 3. 上聲 5. 陰去 6. 陽去 7. 陰入 8. 陽入

About

ZDIC is arguably one of the more popular online Chinese character resources on the internet. At one time it's sister site, longwiki.net (which stopped functioning around 2010) had user input which eventually found its way onto ZDIC. The popularity of ZDIC is down to the convenience of the collected information there. Chinese character data also includes dialect data from Cantonese, Hakka and Chaozhou and Mandarin as well as historical reconstructions of pronunciations from the Middle and Old Chinese, Early Mandarin stages of Chinese, and Sino-Xenic data like the Rimes in Mongol Script and other Chinese character rime dictioanries.The reconstructions of Middle Chinese from many past and current scholars makes ZDIC a conveninet first port of call for many who do not have physical access the data to hand. 高本漢 Bernhard Karlgren, 王力 Li Wang, 李榮 Rong Li, 邵榮芬 Rongfen Shao, 鄭張尚芳 Shangfang ZhengZhang, 潘悟雲 Wuyun Pan, 蒲立本 Edwin G. Pulleyblank, as well as amateur contributions such as 古韻羅馬字 from the Polyhedron distribution, and 有女羅馬字 from a different perspective.

As the Polyhedron distribution was open source, many of the other dialect data must also have come from similar sources. We note that there are other site which offer the same data, for example, Ctext and indeed, the Hakka dialect data that is used in ZDIC comes directly from Thomas Chin's dictionary website at Chinese Language. Chin's online dictionary has been online since 1994, and he made available his entire database for users to download, from which it is easy to verify that the data in ZDIC is a direct copy, only disguised by the fact that the data fields are changed into Chinese.

Specimen Data from Thomas Chin's CCDICT database builds

Specimen data from Thomas Chin's CCDICT 4.3.0 (2003)

一	0	1	0	1	[Sathewkok] jit7 [Lau Chunfat] yid5 [Hailu] rit7 [Bao'an] jit7 [MacIver] jit7 [Siyan] jit7 [Meixian] jit7 [Dongguan] jit7 [Lufeng] jit7	jat1	yi1 yi4 yi2		il	[1] one; unit [2] once [3] as soon as; once [4] one principle [5] single; alone [6] a; an; the [7] each; per; every time [8] another [9] whole; all; throughout; complete [10] union; uniformity; uniform [v] unify; unite [11] so; such; to such extent [12] same; together [13] a little
Specimen data from Thomas Chin's CCDICT 5.0.0 (2005)
U+4E00.0	fCNS11643	1-4421
U+4E00.0	fBig5	A440
U+4E00.0	fGB	0-523B
U+4E00.0	fR/S	1.0
U+4E00.0	fTotalStrokes	1
U+4E00.0	fCangjie	M
U+4E00.0	fHakka	jit7 [1,2,4,5,6,8,9] yid5 [7] rit7 [3]
U+4E00.0	fCantonese	jat1
U+4E00.0	fMandarin	yi2 yi4 yi1
U+4E00.0	fEnglish	[1] one; unit [2] once [3] as soon as; once [4] one principle [5] single; alone [6] a; an; the [7] each; per; every time [8] another [9] whole; all; throughout; complete [10] union; uniformity; uniform [v] unify; unite [11] so; such; to such extent [12] same; together [13] a little
Specimen data from Thomas Chin's CCDICT 5.1.1 (2006)
U+4E00.0	fUTF8	一
U+4E00.0	fCNS11643	1-4421
U+4E00.0	fBig5	A440
U+4E00.0	fGB	0-523B
U+4E00.0	fR/S	 1.0
U+4E00.0	fTotalStrokes	 1
U+4E00.0	fCangjie	M
U+4E00.0	fFourCorner	10000
U+4E00.0	fHakka	  jit7; rit7
U+4E00.0	fCantonese	jat1
U+4E00.0	fMandarin	yi1
U+4E00.0	fEnglish	[1] one; unit [2] whole; all; throughout; complete [3] one principle [4] once; as soon as [5] single; alone [6] slightly; a little [7] a; an; the [8] each; per; every time [9] unify; unite [10] same; together [11] union; uniformity; uniform [12] so; such; to such extent [13] another [14] once [15] a little [16] a Chinese family name

From the above successive builds, we see Thomas Chin modified his listing to remove the sources in later builds, but in the earliest specimen above CCDICT 4.3.0 we see a direct correspondence to the data that ZDIC used for its own implementation. It is probably this, or a similar build of the database from whence the ZDIC data is obtained.

Let's compare the entry of the Chinese character 一 from our specimen data and that currently available in ZDIC

  • CCDICT 4.3.0 [Sathewkok] jit7 [Lau Chunfat] yid5 [Hailu] rit7 [Bao'an] jit7 [MacIver] jit7 [Siyan] jit7 [Meixian] jit7 [Dongguan] jit7 [Lufeng] jit7
  • ZDIC online [沙头角腔] jit7 [客语拼音字汇] yid5 [海陆丰腔] rit7 [宝安腔] jit7 [客英字典] jit7 [台湾四县腔] jit7 [梅县腔] jit7 [东莞腔] jit7 [陆丰腔] jit7

    Thomas Chin's Hakka Data ZDIC data todayNotes
    [Sathewkok] jit7 [沙头角腔] jit7 沙头角 ShaTouJiao/沙頭角 ShaTauKok or Sathewkok is a town in the north east corner of Hong Kong
    [Lau Chunfat] yid5 [客语拼音字汇] yid5 Lau Chunfat (Liu Zinfat) 劉鎮發/刘镇发 is the author of 客語拼音匯/客语拼音字汇 Hakka Pinyim Dictionary (Hag4 Ngi1 Pin1 Yim1 Su4 Fui4)
    [Hailu] rit7 海陆丰腔] rit7 海陸/海陆 Hailu/HoiLiuk refers to Hakka dialects in Taiwan that come from 海豐/海丰 HoiFung and 陸豐/陆丰 LukFung districts in south eastern Guangdong Province
    [Bao'an] jit7 [[宝安腔] jit7 寶安/宝安 BaoAn/Bau-On covers Hong Kong and most of modern day Shenzhen
    [MacIver] jit7 [客英字典] jit7 Donald MacIver and Manfred McKenzie of [客英字典] A Chinese – English Dictionary, Hakka-Dialect As Spoken in Kwantung Province.
    [Siyan] jit7 [台湾四县腔] jit7 四縣/四县 Siyan are a group of dialects in Taiwan that correspond to the dialects of north eastern Guangdong
    [Meixian] jit7 [梅县腔] jit7 梅縣/梅县 Meixian/Moiyan/Moiyen dialect is the paragon dialect of Hakka
    [Dongguan] jit7 [东莞腔] jit7 東莞/东莞 Dongguan/DungGon is an area that overlaps Shenzhen and is very similar to the BaoAn dialect
    [Lufeng] jit7 [陆丰腔] jit7 陸豐/陆丰 Lufeng/LukFung is a district in south eastern Guangdong Province.

    There is no difference between the two other than the translation of the romanised data fields for simplified character data fields.

    Is Chin's data his work or copied from book sources?

    Where do these readings come from, and are they exactly the same as the original source from where it was copied into the database? The answer is no. The tone numbering itself is one reason that makes the body of data in Chin's Hakka database his work. If you have read the other parts of my blog, you'll know that I own many Hakka resources, amongst which are Henry Henne's published works on the Sathewkok Hakka, MacIver's Hakka Dictionary, and Bennet M. Lindauer translation of Simon Hartwich Schaank's The Loeh-Foeng Dialect (Lufeng), Phang TetHsiu's Hakka Pronunciation of Chinese Character Dictionary etc, and Chunfat Lau's Hakka Pinyim Dictionary amongst other resources. Apart from the latter, the others use diacritics to indicate tones. For our character 一, more often than not these sources wrote yit or yid rather than jit as in Chin's data. It is safe to say, that rather than copy wholesale, other than than Lau's transcription, Chin actively converted the reading he found in his sources to his own romanised format. So, it is fair to say that the body of work in the Hakka dialect database used in ZDIC is Chin's actual original work.

    Acknowledgement

    I'd like to thank Thomas Chin for his online dictionary, making it possible for people across the globe to access Hakka pronunciation data at a time when resources were scarce. ZDIC may have taken the data and not credited Chin for his work, but some of us know, and want you to know that fact too.


    Index

    © Dylan W.H. Sung 2019

    This page was created on Sunday 31st March 2019

    Dylan's Seal