MandarinIPAOperator — IPA for Cantonese

cjklib.reading.operator.MandarinIPAOperator is an implementation of a transcription of Standard Mandarin into the International Phonetic Alphabet (IPA).



  • Tones can be marked either with tone numbers (1-4), tone contour numbers (e.g. 214), IPA tone bar characters or IPA diacritics,
  • support for low third tone (1/2 third tone) with tone contour 21,
  • four levels of the neutral tone for varying stress depending on the preceding syllable and
  • splitting of syllables into onset and rhyme using method getOnsetRhyme().


Tones in IPA can be expressed using different schemes. The following schemes are implemented here:

  • Numbers, regular tone numbers from 1 to 5 for first tone to fifth (qingsheng),
  • ChaoDigits, numbers displaying the levels of tone contours, e.g. 214 for the regular third tone,
  • IPAToneBar, IPA modifying tone bar characters, e.g. ɕi˨˩˦,
  • Diacritics, diacritical marks and finally
  • None, no support for tone marks

Unlike other operators for Mandarin, distinction is made for six different tonal occurrences. The third tone is affected by tone sandhi and basically two different tone contours exist. Therefore getTonalEntity() and splitEntityTone() work with string representations as tones defined in TONES. Same behaviour as found in other operators for Mandarin can be achieved by simply using the first character of the given string:

>>> from cjklib.reading import operator
>>> ipaOp = operator.MandarinIPAOperator(toneMarkType='ipaToneBar')
>>> syllable, toneName = ipaOp.splitEntityTone(u'mən˧˥')
>>> tone = int(toneName[0])

The implemented schemes render tone information differently. Mapping might lose information so a full back-transformation can not be guaranteed.


  • Yuen Ren Chao: A Grammar of Spoken Chinese. University of California Press, Berkeley, 1968, ISBN 0-520-00219-9.


class cjklib.reading.operator.MandarinIPAOperator(**options)

Bases: cjklib.reading.operator.TonalIPAOperator

Provides an operator on strings in Mandarin Chinese written in the International Phonetic Alphabet (IPA).

  • options – extra options
  • dbConnectInst – instance of a DatabaseConnector, if none is given, default settings will be assumed.
  • toneMarkType – type of tone marks, one out of 'numbers', 'superscriptNumbers', 'chaoDigits', 'superscriptChaoDigits', 'ipaToneBar', 'diacritics', 'none'
  • missingToneMark – if set to 'noinfo' no tone information will be deduced when no tone mark is found (takes on value None), if set to 'ignore' this entity will not be valid. Either behaviour only becomes effective if the chosen 'toneMarkType' makes no use of empty tone marks.

Splits the given plain syllable into onset (initial) and rhyme (final).

Parameter:plainSyllable (str) – syllable in IPA without tone marks
Return type:tuple of str
Returns:tuple of syllable onset and rhyme
Raises InvalidEntityError:
 if the entity is invalid (e.g. syllable nucleus or tone invalid).
getPlainReadingEntities(*args, **kwargs)

Gets the list of plain entities supported by this reading. These entities will carry no tone mark.

Return type:set of str
Returns:set of supported syllables

Table Of Contents

Previous topic

GROperator — Gwoyeu Romatzyh

Next topic

MandarinBrailleOperator — Braille for Mandarin

This Page