Expansion - unihan_etl.expansion

Functions to uncompact details inside field values.

Notes

re.compile() operations are inside of expand functions:

  1. readability

  2. module-level function bytecode is cached in python

  3. the last used compiled regexes are cached

unihan_etl.expansion.N_DIACRITICS = 'ńňǹ'
data

diacritics from kHanyuPinlu

unihan_etl.expansion.expand_kDefinition(value)
function[source]

Expand kDefinition field.

Parameters:

value (str)

Return type:

list[str]

class unihan_etl.expansion.kMandarinDict

Bases: TypedDict

unihan_etl.expansion.expand_kMandarin(value)
function[source]

Expand kMandarin field.

Parameters:

value (list[str])

Return type:

kMandarinDict

class unihan_etl.expansion.kTotalStrokesDict

Bases: TypedDict

unihan_etl.expansion.expand_kTotalStrokes(value)
function[source]

Expand kTotalStrokes field.

Parameters:

value (list[str])

Return type:

kTotalStrokesDict

class unihan_etl.expansion.kAlternateTotalStrokesDict

Bases: TypedDict

kAlternateTotalStrokes mapping.

unihan_etl.expansion.is_valid_kAlternateTotalStrokes_irg_source(value)
function[source]

Return True and upcast if valid kAlternateTotalStrokes source.

Parameters:

value (Any)

Return type:

TypeGuard[kAlternateTotalStrokesLiteral]

unihan_etl.expansion.expand_kAlternateTotalStrokes(value)
function[source]

Expand kAlternateTotalStrokes field.

Examples

>>> expand_kAlternateTotalStrokes(['3:J'])
[{'strokes': 3, 'sources': ['J']}]
>>> expand_kAlternateTotalStrokes(['12:JK'])
[{'strokes': 12, 'sources': ['J', 'K']}]
>>> expand_kAlternateTotalStrokes(['-'])
[{'strokes': None, 'sources': ['-']}]
Parameters:

value (list[str])

Return type:

list[kAlternateTotalStrokesDict]

unihan_etl.expansion.expand_kUnihanCore2020(value)
function[source]

Expand kUnihanCore2020 field.

Examples

>>> expand_kUnihanCore2020('GHJ')
['G', 'H', 'J']
Parameters:

value (str)

Return type:

list[str]

class unihan_etl.expansion.kLocationDict

Bases: TypedDict

kLocation mapping.

unihan_etl.expansion.expand_kHanYu(value)
function[source]

Expand kHanYu field.

Parameters:

value (list[str])

Return type:

list[kLocationDict]

unihan_etl.expansion.expand_kIRGHanyuDaZidian(value)
function[source]

Expand kIRGHanyuDaZidian field.

Parameters:

value (list[str])

Return type:

list[kLocationDict]

class unihan_etl.expansion.kTGHZ2013LocationDict

Bases: TypedDict

kTGHZ2013 location mapping.

class unihan_etl.expansion.kTGHZ2013Dict

Bases: TypedDict

kTGHZ2013 mapping.

unihan_etl.expansion.expand_kTGHZ2013(value)
function[source]

Expand kTGHZ2013 field.

Examples

>>> expand_kTGHZ2013(['097.110,097.120:fēng'])
[{'reading': 'fēng', 'locations': [{'page': 97, 'position': 11, 'entry_type': 0},
{'page': 97, 'position': 12, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['482.140:zhòu'])
[{'reading': 'zhòu', 'locations': [{'page': 482, 'position': 14, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['256.090:mò', '379.160:wàn'])
[{'reading': 'mò', 'locations': [{'page': 256, 'position': 9, 'entry_type': 0}]},
 {'reading': 'wàn', 'locations': [{'page': 379, 'position': 16, 'entry_type': 0}]}]
Parameters:

value (list[str])

Return type:

list[kTGHZ2013Dict]

class unihan_etl.expansion.kSMSZD2003IndexDict

Bases: TypedDict

kSMSZD2003Index location mapping.

unihan_etl.expansion.expand_kSMSZD2003Index(value)
function[source]

Expand kSMSZD2003Index Soengmou San Zidin (商務新字典) field.

Examples

>>> expand_kSMSZD2003Index(['26.07'])
[{'page': 26, 'position': 7}]
>>> expand_kSMSZD2003Index(['769.05', '15.17', '291.20', '493.13'])
[{'page': 769, 'position': 5},
{'page': 15, 'position': 17},
{'page': 291, 'position': 20},
{'page': 493, 'position': 13}]

Bibliography

Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.

Parameters:

value (list[str])

Return type:

list[kSMSZD2003IndexDict]

class unihan_etl.expansion.kSMSZD2003ReadingsDict

Bases: TypedDict

kSMSZD2003Readings location mapping.

unihan_etl.expansion.expand_kSMSZD2003Readings(value)
function[source]

Expand kSMSZD2003Readings Soengmou San Zidin (商務新字典) field.

Examples

>>> expand_kSMSZD2003Readings(['tà粵taat3'])
[{'mandarin': ['tà'], 'cantonese': ['taat3']}]
>>> expand_kSMSZD2003Readings(['ma粵maa1,maa3', 'má粵maa1', 'mǎ粵maa1'])
[{'mandarin': ['ma'], 'cantonese': ['maa1', 'maa3']},
{'mandarin': ['má'], 'cantonese': ['maa1']},
{'mandarin': ['mǎ'], 'cantonese': ['maa1']}]

Bibliography

Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.

Parameters:

value (list[str])

Return type:

list[kSMSZD2003ReadingsDict]

class unihan_etl.expansion.kHanyuPinyinPreDict

Bases: TypedDict

kHanyuPinyin predicate mapping.

class unihan_etl.expansion.kHanyuPinyinDict

Bases: TypedDict

kHanyuPinyin mapping.

unihan_etl.expansion.expand_kHanyuPinyin(value)
function[source]

Expand kHanyuPinyin field.

Parameters:

value (list[str])

Return type:

list[kHanyuPinyinDict]

class unihan_etl.expansion.kXHC1983LocationDict

Bases: TypedDict

kXHC1983 location mapping.

class unihan_etl.expansion.kXHC1983Dict

Bases: TypedDict

kXHC1983 mapping.

class unihan_etl.expansion.kXHC1983PreDict

Bases: TypedDict

kXHC1983 predicate mapping.

unihan_etl.expansion.expand_kXHC1983(value)
function[source]

Expand kXHC1983 field.

Parameters:

value (list[str])

Return type:

list[kXHC1983Dict]

class unihan_etl.expansion.kCheungBauerDict

Bases: TypedDict

kCheungBauer mapping.

unihan_etl.expansion.expand_kCheungBauer(value)
function[source]

Expand kCheungBauer field.

Parameters:

value (list[str])

Return type:

list[kCheungBauerDict]

class unihan_etl.expansion.kRSAdobe_Japan1_6Dict

Bases: TypedDict

unihan_etl.expansion.expand_kRSAdobe_Japan1_6(value)
function[source]

Expand kRSAdobe_Japan1_6 field.

Parameters:

value (list[str])

Return type:

list[kRSAdobe_Japan1_6Dict]

class unihan_etl.expansion.kCihaiTDict

Bases: TypedDict

kCihaiT mapping.

unihan_etl.expansion.expand_kCihaiT(value)
function[source]

Expand kCihaiT field.

Parameters:

value (list[str])

Return type:

list[kCihaiTDict]

class unihan_etl.expansion.kIICoreDict

Bases: TypedDict

kIICore mapping.

unihan_etl.expansion.expand_kIICore(value)
function[source]

Expand kIICore field.

Parameters:

value (list[str])

Return type:

list[kIICoreDict]

class unihan_etl.expansion.kDaeJaweonDict

Bases: TypedDict

kDaehwan mapping.

unihan_etl.expansion.expand_kDaeJaweon(value)
function[source]

Expand kDaeJaweon field.

Parameters:

value (str)

Return type:

kDaeJaweonDict

unihan_etl.expansion.expand_kIRGKangXi(value)
function[source]

Expand kIRGKangXi field.

Parameters:

value (list[str])

Return type:

list[kDaeJaweonDict]

unihan_etl.expansion.expand_kIRGDaeJaweon(value)
function[source]

Expand kIRGDaeJaweon field.

Parameters:

value (list[str])

Return type:

list[kDaeJaweonDict]

class unihan_etl.expansion.kFennDict

Bases: TypedDict

kFenn mapping.

unihan_etl.expansion.expand_kFenn(value)
function[source]

Expand kFenn field.

Parameters:

value (list[str])

Return type:

list[kFennDict]

class unihan_etl.expansion.kHanyuPinluDict

Bases: TypedDict

kHanyuPinlu mapping.

unihan_etl.expansion.expand_kHanyuPinlu(value)
function[source]

Expand kHanyuPinlu field.

Parameters:

value (list[str])

Return type:

list[kHanyuPinluDict]

class unihan_etl.expansion.LocationDict

Bases: TypedDict

Location mapping.

class unihan_etl.expansion.kHDZRadBreakDict

Bases: TypedDict

kHDZRadBreak mapping.

unihan_etl.expansion.expand_kHDZRadBreak(value)
function[source]

Expand kHDZRadBreak field.

Parameters:

value (str)

Return type:

kHDZRadBreakDict

class unihan_etl.expansion.kSBGYDict

Bases: TypedDict

kSBGY mapping.

unihan_etl.expansion.expand_kSBGY(value)
function[source]

Expand kSBGY field.

Parameters:

value (list[str])

Return type:

list[kSBGYDict]

class unihan_etl.expansion.kRSSimplifiedType

Bases: Enum

Whether ideograph is a simplified form of a radical.

“The radical is indicated by a number in the range 1-214, followed by an optional single apostrophe (U+0027 ‘ APOSTROPHE) or, double apostrophe (‘’), or triple apostrophe (‘’’) suffix. A single apostrophe after the radical indicates a Chinese simplified version of the given radical. Two apostrophes after the radical indicates a non-Chinese simplified version of the given radical. Three apostrophes after the radical indicates a second non-Chinese simplified version of the given radical.” Source: https://www.unicode.org/reports/tr38/tr38-36.html#kRSUnicode

class unihan_etl.expansion.kRSGenericDict

Bases: TypedDict

kRSGeneric mapping.

unihan_etl.expansion.get_krs_simplified_type(val)
function[source]

Detect type of simplified radical, if one at all.

Examples

>>> get_krs_simplified_type('')
False
>>> get_krs_simplified_type("'")
<kRSSimplifiedType.Chinese: 'Chinese'>
>>> get_krs_simplified_type("''")
<kRSSimplifiedType.NonChinese: 'NonChinese'>
>>> get_krs_simplified_type("'''")
<kRSSimplifiedType.SecondNonChinese: 'SecondNonChinese'>
Parameters:

val (str)

Return type:

kRSSimplifiedType | Literal[False]

unihan_etl.expansion._expand_kRSGeneric(value)
function[source]

Expand kRSGeneric field.

Examples

>>> _expand_kRSGeneric(['5.10', "213''.0"])
[{'radical': 5, 'strokes': 10, 'simplified': False},
{'radical': 213, 'strokes': 0, 'simplified':
    <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"])
[{'radical': 120, 'strokes': 3, 'simplified':
    <kRSSimplifiedType.Chinese: 'Chinese'>}]
Parameters:

value (list[str])

Return type:

list[kRSGenericDict]

unihan_etl.expansion.expand_kRSUnicode(value)
function[source]

Expand kRSGeneric field.

Examples

>>> _expand_kRSGeneric(['5.10', "213''.0"])
[{'radical': 5, 'strokes': 10, 'simplified': False},
{'radical': 213, 'strokes': 0, 'simplified':
    <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"])
[{'radical': 120, 'strokes': 3, 'simplified':
    <kRSSimplifiedType.Chinese: 'Chinese'>}]
Parameters:

value (list[str])

Return type:

list[kRSGenericDict]

class unihan_etl.expansion.SourceLocationDict

Bases: TypedDict

Source location mapping.

unihan_etl.expansion._expand_kIRG_GenericSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_GSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_HSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_JSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_KPSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_KSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_MSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_SSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_TSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_USource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_UKSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

unihan_etl.expansion.expand_kIRG_VSource(value)
function[source]

Expand kIRG_GenericSource field.

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')
{'source': 'SAT', 'location': '02570'}
Parameters:

value (str)

Return type:

SourceLocationDict

class unihan_etl.expansion.kGSRDict

Bases: TypedDict

kGSR mapping.

unihan_etl.expansion.expand_kGSR(value)
function[source]

Expand kGSR field.

Parameters:

value (list[str])

Return type:

list[kGSRDict]

class unihan_etl.expansion.kCheungBauerIndexDict

Bases: TypedDict

kCheungBauer mapping.

unihan_etl.expansion.expand_kCheungBauerIndex(value)
function[source]

Expand kCheungBauerIndex field.

Parameters:

value (list[str])

Return type:

list[str | kCheungBauerIndexDict]

unihan_etl.expansion.expand_kFennIndex(value)
function[source]

Expand kCheungBauerIndex field.

Parameters:

value (list[str])

Return type:

list[str | kCheungBauerIndexDict]

class unihan_etl.expansion.kStrangeDict

Bases: TypedDict

kStrange mapping.

unihan_etl.expansion.is_valid_kstrange_property(value)
function[source]

Return True and upcast if valid kStrange property type.

Parameters:

value (Any)

Return type:

TypeGuard[kStrangeLiteral]

unihan_etl.expansion.expand_kStrange(value)
function[source]

Expand kStrange field.

Examples

>>> expand_kStrange(['B:U+310D', 'I:U+5DDB'])
[{'property_type': 'B', 'characters': ['U+310D']},
{'property_type': 'I', 'characters': ['U+5DDB']}]
>>> expand_kStrange(['K:U+30A6:U+30C4:U+30DB'])
[{'property_type': 'K', 'characters': ['U+30A6', 'U+30C4', 'U+30DB']}]
>>> expand_kStrange(['U'])
[{'property_type': 'U', 'characters': []}]
Parameters:

value (list[str])

Return type:

list[kStrangeDict]

class unihan_etl.expansion.kMojiJohoVariationDict

Bases: TypedDict

Variation sequence of Moji Jōhō Kiban entry.

class unihan_etl.expansion.kMojiJohoDict

Bases: TypedDict

kMojiJoho mapping.

unihan_etl.expansion.expand_kMojiJoho(value)
function[source]

Expand kMojiJoho (Moji Jōhō Kiban) field.

Examples

>>> expand_kMojiJoho('MJ000004')
{'serial_number': 'MJ000004', 'variants': []}
>>> expand_kMojiJoho('MJ000022 MJ000023:E0101 MJ000022:E0103')
{'serial_number': 'MJ000022', 'variants':
    [{'serial_number': 'MJ000023', 'variation_sequence': 'E0101',
    'standard': False},
    {'serial_number': 'MJ000022', 'variation_sequence': 'E0103',
    'standard': True}]}

See also

Assume

U+342A kMojiJoho MJ000022 MJ000023:E0101 MJ000022:E0103:

Database

Parameters:

value (str)

Return type:

kMojiJohoDict

class unihan_etl.expansion.kFanqieDict

Bases: TypedDict

kFanqie mapping.

unihan_etl.expansion.expand_kFanqie(value)
function[source]

Expand kFanqie field.

Examples

>>> expand_kFanqie(['德紅'])
[{'initial': '德', 'final': '紅'}]
>>> expand_kFanqie(['蘇彫', '先鳥'])
[{'initial': '蘇', 'final': '彫'}, {'initial': '先', 'final': '鳥'}]
Parameters:

value (list[str])

Return type:

list[kFanqieDict]

class unihan_etl.expansion.kZhuangDict

Bases: TypedDict

kZhuang mapping.

unihan_etl.expansion.expand_kZhuang(value)
function[source]

Expand kZhuang field.

Examples

>>> expand_kZhuang(['naengh'])
[{'reading': 'naengh', 'non_standard': False}]
>>> expand_kZhuang(['fa*'])
[{'reading': 'fa', 'non_standard': True}]
Parameters:

value (list[str])

Return type:

list[kZhuangDict]

unihan_etl.expansion.expand_field(field, fvalue)
function[source]

Return structured value of information in UNIHAN field.

Parameters:
  • field (str) – field name

  • fvalue (str) – value of field

Returns:

expanded field information per UNIHAN’s documentation

Return type:

list or dict