CEDICT.DOC - README file for CEDICT - Chinese-English Dictionary
23 January 2003


Sometime in 2000 the web page of the originator of the CEDICT project,
Paul Denisowski, disappeared from the web and his e-mail addresses
stopped working.  To continue the work, I have taken the last versions
of the dictionary and have started correcting and adding to them with
the same intent as Paul, to provide a freely available, searchable
Chinese/English dictionary.  Below is the original README file from

The objective of the CEDICT project is to create an online, downloadable
(as opposed to searchable-only) public-domain Chinese-English dictionary.
For the most part, the project is modelled on Jim Breen's highly successful
EDICT (Japanese-English dictionary) project and is intended to be a
collaborative effort, with users providing entries and corrections to
the main file.  For specific limitations regarding its use, please see
the CEDICT license included in Appendix A of this document.

Since the project was only started in October 1997, it's probably a bit
of a stretch to say that it has a "history".  Actually, the project
in part grew out of my (very limited) involvement in the EDICT project,
which gave me some very useful practice in compiling dictionary entries.
After beginning the PhD program in Linguistics at UNC-CH, I had decided
to dust off my (very dusty) Chinese to help in my academic research
of Asian languages (particularly Vietnamese).  To this end I began
to keep a list of words I encountered in my reading, and this list
formed the original (c. 500 entry) CEDICT, which I first posted to my
UNC web site (http://www.unc.edu/~pauld) sometime in November 1997.  Since
then the list has grown at a fairly steady pace.  In April 1998, the CEDICT
project was moved to a new location (www.mindspring.com/~paul_denisowski).
In May 1998, CEDICT began to be mirrored at the Monash Nihongo ftp archive
(ftp.cc.monash.edu.au/pub/nihongo/) thanks to the very generous effort of
Jim Breen.  Now it is hosted at http://www.mandarintools.com/cedict.html

Although CEDICT started out as a one-person project, contributions from
the Internet community have become the major source of new entries.
Contributors thus far to the project are (in chronological

   Ocrat, Mike Wright, Wenke Wei, Sharlene Liu, Richard Warmington,
   Erik Peterson, Derek Chadwick, Dave Hiebeler, Steve Swales, Carl Hoffman

(Please let me know if I've left someone off the list)

Another important contributor is the creator of NJSTAR, Hongbo Ni
(hongbo@njstar.com.au : http://www.njstar.com.au) who provided 
the tools needed to generate the NJSTAR dictionary index files.  
Although I originally used a home-brew Perl script to do this, 
the newer method is a _substantial_ and much appreciated improvement 
over my own meager efforts.

Last, but certainly not least of the CEDICT contributors is Jim Breen,
who not only provided the inspiration for CEDICT through the highly
successful EDICT project, but who has also very graciously allowed 
CEDICT to be mirrored at the Monash Nihongo ftp site 

Contribution Guidelines


Contributions are always warmly welcome.


The CEDICT format is:
     CHINESE [pinyin] /English definition 1/English definition 2/.../
Please send entries in Big5 encoding

Pinyin tones should be indicated by numbers 1-5, as follows:
   1=level tone, 2=rising tone, 3=mid-rising tone, 4=falling tone, 5=neutral tone

Please indicate the neutral tone (5) (e.g. xue2 sheng5 instead of xue2 sheng)

I've also been seperating pinyin syllables with a space for
readability, e.g. [zhong1 guo2] instead of [zhong1guo2]

Avoid using square brackets for anything but pinyin.  Use parenthesis
instead: e.g. /Beijing (capital of mainland China)/ and not /Beijing

Avoid using hyphens (-) in the pinyin

I'm also trying to keep all the pinyin lowercase, even for proper nouns.

Please mail all contributions to cedict@chinesetools.com


In order to preserve the public-domain/freeware aspect of this
dictionary, please do not send in entries from copyrighted sources,
especially from electronic media.  This also includes web sites with
copyrighted material.  If in doubt, please ask  the copyright owner

Please make sure to check your contributions against the latest
version of CEDICT.  Editing for duplicates has become a more serious
issue as the dictionary grows in size.  If you want to submit
additional definitions for existing entries, please submit ONLY the
new definition in the normal CEDICT format (I have a script that will
merge them).

Please try to observe the CEDICT format.  Again, I don't mind writing
scripts to clean up contributions that don't meet the above format,
but it's impractical to do this except for very large files

Please be careful about pinyin tones, especially when it comes to
hanzi that change tone (such as "bu" - not, or "yi" - one) and final
hanzi (as in "xue2 sheng5").  Since I'm not a native speaker I have to
rely on the goodwill of others to correct pinyin/hanzi errors.

Although I've gotten some suggestions about splitting CEDICT up into
specialized dictionaries (esp for proper names and technical terms, as
EDICT has done), for the time being, there's no need to seperate
entries by subject matter.
Revision History (since 16 December 1997)

16 December 1997
   1337 entries 
   Contributions/Corrections by Paul Denisowski
17 December 1997
   1677 entries
   Contributions/Corrections by Paul Denisowski, Ocrat 
28 December 1997
   2077 entries
   Contributions/Corrections by Paul Denisowski
2 January 1998
   2115 entries
   Contributions/Corrections by Paul Denisowski
   NJSTAR dictionary format version now supplied using tools
   provided by NJSTAR's creator, Hongbo Ni 
9 January 1998
   2751 entries
   Corrections by Mike Wright
   Contributions/Corrections by Paul Denisowski
20 January 1998
   3252 entries
   Corrections by Ocrat
   Contributions/Corrections by Paul Denisowski
01 February 1998
   3947 entries
   Contributions/Corrections by Paul Denisowski
14 March 1998
   4570 entries
   Corrections by Wenke Wei and Sharlene Liu 
   Contributions/Corrections by Paul Denisowski
29 March 1998
   7419 entries
   VOA vocabulary files contributed by Ocrat (c. 4500 entries)
   Contributions/Corrections by Paul Denisowski
06 April 1998
   Project moves to new website:  www.mindspring.com/~paul_denisowski/cedict.html
   7720 entries
   Contributions/Corrections by Richard Warmington, Paul Denisowski
11 April 1998
   7850 entries
   Contributions/Corrections by Paul Denisowski
24 April 1998
   8447 entries
   Contributions/Corrections by Erik Peterson, Paul Denisowski
4 May 1998
   11349 entries
   Over 3000 entries contributed by Derek Chadwick
   Contributions/Corrections by Paul Denisowski
11 May 1998
   11564 entries
   Contributions/Corrections by Richard Warmington, Paul Denisowski
   Jim Breen adds CEDICT to Monash Nihongo ftp archive:
01 June 1998
   12132 entries
   Contributions/Corrections by Ocrat, Paul Denisowski
05 July 1998
   12221 entries
   Contributions/Corrections by Dave Hiebeler, Steve Swales, Carl Hoffman, Paul Denisowski
1 Septemeber 1998
   16830 entries
   Contributions/Corrections by Dave Hiebeler, Erik Peterson, Paul Denisowski
1 November 1998
   23510 entries
   Merged in public-domain cchelp file (Author: Stephen G. Simpson simpson@math.psu.edu)
2 September 2000
   23484 entries
   Fixed formatting errors, regularized the way u: is represented,
   added a 5 to indicate light tone on pinyin with no tone number,
   reordered dictionary by pinyin, and removed some duplicate entries.
5 January 2001
   23481 entries
   Made corrections and removed duplicate entries suggested by Richard
26 June 2002
   23483 entries
   Ran the dictionary through a spell checker and fixes dozens of
   English definition spelling errors.
9 January 2003
  B5:  Unique Words: 22947, Definitions: 23519
   Added in contributions from Sebastien Bruggeman and others.  Fixed
   spelling and pinyin errors. Removed duplicate entries.
23 January 2003
  GB   Words: 21544, Defs: 22346
  Big5 Words: 23820, Defs: 24400
  Added in some wordlists.  Fixed some errors in creation of GB
28 January 2003
  GB   Words: 23243, Defs: 24063
  Big5 Words: 25541, Defs: 26120
  Added in a huge wordlist from Ron Grenier (Thanks Ron!)
07 February 2003
  GB   Words: 23316, Defs: 24149
  Big5 Words: 25616, Defs: 26206
  Added in Chinese titles for various family relations.
27 February 2003
  GB   Words: 23451, Defs: 24285
  Big5 Words: 25746, Defs: 26343
  Added in fixes and new words from Eric Goodell (Thanks Eric!)
28 April 2003
  GB   Words: 23494, Defs: 24328
  Big5 Words: 25789, Defs: 26387
  Various corrections and new words from the news.
30 May 2003
  GB   Words: 23512, Defs: 24345
  Big5 Words: 25807, Defs: 26404
  Various corrections and new words from the news.



Copyright (C) 1997, 1998 Paul Andrew Denisowski

This  licence  statement  and  copyright  notice   applies   to   the   CEDICT 
Chinese/English   Dictionary   file,   the   associated  documentation  file 
CEDICT.DOC, and any data files which are derived from them. 


Permission is granted to make and distribute verbatim copies of  these  files 
provided  this copyright notice and permission notice is distributed with all 
copies.  Any distribution of the files must take place  without  a  financial 
return, except a charge to cover the cost of the distribution medium. 

Permission is granted to make and distribute extracts or subsets of the CEDICT 
file under the same conditions applying to verbatim copies. 

Permission  is  granted  to  translate the English elements of the CEDICT file 
into other languages, and to make and distribute copies of those translations 
under the same conditions applying to verbatim copies. 


These files may be freely  used  by  individuals,  and  may  be  accessed  by 
software belonging to, or operated by, such individuals. 

The files, extracts from the files, and translations of the files must not be 
sold  as  part  of  any  commercial  software  package,   nor  must  they  be 
incorporated in any published dictionary or other  printed  document  without 
the specific permission of the copyright holder. 


Copyright  over  the  documents  covered  by  this statement is held by 
Paul Denisowski.