Expansion - unihan_etl.expansion

Functions to uncompact details inside field values.

Notes

re.compile() operations are inside of expand functions:

  1. readability

  2. module-level function bytecode is cached in python

  3. the last used compiled regexes are cached

unihan_etl.expansion.N_DIACRITICS = 'ńňǹ'

diacritics from kHanyuPinlu

unihan_etl.expansion.expand_kDefinition(value)[source]

Expand kDefinition field.

Return type:

list[str]

Parameters:

value (str)

class unihan_etl.expansion.kMandarinDict[source]

Bases: TypedDict

unihan_etl.expansion.expand_kMandarin(value)[source]

Expand kMandarin field.

Return type:

kMandarinDict

Parameters:

value (list[str])

class unihan_etl.expansion.kTotalStrokesDict[source]

Bases: TypedDict

unihan_etl.expansion.expand_kTotalStrokes(value)[source]

Expand kTotalStrokes field.

Return type:

kTotalStrokesDict

Parameters:

value (list[str])

class unihan_etl.expansion.kAlternateTotalStrokesDict[source]

Bases: TypedDict

kAlternateTotalStrokes mapping.

sources: list[Literal['-', 'B', 'H', 'J', 'K', 'M', 'P', 'S', 'U', 'V']]
strokes: Optional[int]
unihan_etl.expansion.is_valid_kAlternateTotalStrokes_irg_source(value)[source]

Return True and upcast if valid kAlternateTotalStrokes source.

Return type:

TypeGuard[kAlternateTotalStrokesLiteral]

Parameters:

value (Any)

unihan_etl.expansion.expand_kAlternateTotalStrokes(value)[source]

Expand kAlternateTotalStrokes field.

Return type:

list[kAlternateTotalStrokesDict]

Parameters:

value (list[str])

Examples

>>> expand_kAlternateTotalStrokes(['3:J'])
[{'strokes': 3, 'sources': ['J']}]
>>> expand_kAlternateTotalStrokes(['12:JK'])
[{'strokes': 12, 'sources': ['J', 'K']}]
>>> expand_kAlternateTotalStrokes(['-'])
[{'strokes': None, 'sources': ['-']}]
unihan_etl.expansion.expand_kUnihanCore2020(value)[source]

Expand kUnihanCore2020 field.

Return type:

list[str]

Parameters:

value (str)

Examples

>>> expand_kUnihanCore2020('GHJ')
['G', 'H', 'J']
class unihan_etl.expansion.kLocationDict[source]

Bases: TypedDict

kLocation mapping.

volume: int
page: int
character: int
virtual: int
unihan_etl.expansion.expand_kHanYu(value)[source]

Expand kHanYu field.

Return type:

list[kLocationDict]

Parameters:

value (list[str])

unihan_etl.expansion.expand_kIRGHanyuDaZidian(value)[source]

Expand kIRGHanyuDaZidian field.

Return type:

list[kLocationDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kTGHZ2013LocationDict[source]

Bases: TypedDict

kTGHZ2013 location mapping.

page: int
position: int
entry_type: int
class unihan_etl.expansion.kTGHZ2013Dict[source]

Bases: TypedDict

kTGHZ2013 mapping.

reading: str
locations: Sequence[kTGHZ2013LocationDict]
unihan_etl.expansion.expand_kTGHZ2013(value)[source]

Expand kTGHZ2013 field.

Return type:

list[kTGHZ2013Dict]

Parameters:

value (list[str])

Examples

>>> expand_kTGHZ2013(['097.110,097.120:fēng'])
[{'reading': 'fēng', 'locations': [{'page': 97, 'position': 11, 'entry_type': 0},
{'page': 97, 'position': 12, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['482.140:zhòu'])  
[{'reading': 'zhòu', 'locations': [{'page': 482, 'position': 14, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['256.090:mò', '379.160:wàn'])
[{'reading': 'mò', 'locations': [{'page': 256, 'position': 9, 'entry_type': 0}]},
 {'reading': 'wàn', 'locations': [{'page': 379, 'position': 16, 'entry_type': 0}]}]
class unihan_etl.expansion.kSMSZD2003IndexDict[source]

Bases: TypedDict

kSMSZD2003Index location mapping.

page: int
position: int
unihan_etl.expansion.expand_kSMSZD2003Index(value)[source]

Expand kSMSZD2003Index Soengmou San Zidin (商務新字典) field.

Return type:

list[kSMSZD2003IndexDict]

Parameters:

value (list[str])

Examples

>>> expand_kSMSZD2003Index(['26.07'])
[{'page': 26, 'position': 7}]
>>> expand_kSMSZD2003Index(['769.05', '15.17', '291.20', '493.13'])
[{'page': 769, 'position': 5},
{'page': 15, 'position': 17},
{'page': 291, 'position': 20},
{'page': 493, 'position': 13}]

Bibliography

Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.

class unihan_etl.expansion.kSMSZD2003ReadingsDict[source]

Bases: TypedDict

kSMSZD2003Readings location mapping.

mandarin: list[str]
cantonese: list[str]
unihan_etl.expansion.expand_kSMSZD2003Readings(value)[source]

Expand kSMSZD2003Readings Soengmou San Zidin (商務新字典) field.

Return type:

list[kSMSZD2003ReadingsDict]

Parameters:

value (list[str])

Examples

>>> expand_kSMSZD2003Readings(['tà粵taat3'])
[{'mandarin': ['tà'], 'cantonese': ['taat3']}]
>>> expand_kSMSZD2003Readings(['ma粵maa1,maa3', 'má粵maa1', 'mǎ粵maa1'])
[{'mandarin': ['ma'], 'cantonese': ['maa1', 'maa3']},
{'mandarin': ['má'], 'cantonese': ['maa1']},
{'mandarin': ['mǎ'], 'cantonese': ['maa1']}]

Bibliography

Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.

class unihan_etl.expansion.kHanyuPinyinPreDict[source]

Bases: TypedDict

kHanyuPinyin predicate mapping.

locations: Sequence[Union[str, kLocationDict]]
readings: list[str]
class unihan_etl.expansion.kHanyuPinyinDict[source]

Bases: TypedDict

kHanyuPinyin mapping.

locations: kLocationDict
readings: list[str]
unihan_etl.expansion.expand_kHanyuPinyin(value)[source]

Expand kHanyuPinyin field.

Return type:

list[kHanyuPinyinDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kXHC1983LocationDict[source]

Bases: TypedDict

kXHC1983 location mapping.

page: int
character: int
entry: Optional[int]
substituted: bool
class unihan_etl.expansion.kXHC1983Dict[source]

Bases: TypedDict

kXHC1983 mapping.

locations: kXHC1983LocationDict
reading: str
class unihan_etl.expansion.kXHC1983PreDict[source]

Bases: TypedDict

kXHC1983 predicate mapping.

locations: Union[list[str], kXHC1983LocationDict]
reading: str
unihan_etl.expansion.expand_kXHC1983(value)[source]

Expand kXHC1983 field.

Return type:

list[kXHC1983Dict]

Parameters:

value (list[str])

class unihan_etl.expansion.kCheungBauerDict[source]

Bases: TypedDict

kCheungBauer mapping.

radical: int
strokes: int
cangjie: Optional[str]
readings: list[str]
unihan_etl.expansion.expand_kCheungBauer(value)[source]

Expand kCheungBauer field.

Return type:

list[kCheungBauerDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kRSAdobe_Japan1_6Dict[source]

Bases: TypedDict

type: str
cid: int
radical: int
strokes: int
unihan_etl.expansion.expand_kRSAdobe_Japan1_6(value)[source]

Expand kRSAdobe_Japan1_6 field.

Return type:

list[kRSAdobe_Japan1_6Dict]

Parameters:

value (list[str])

class unihan_etl.expansion.kCihaiTDict[source]

Bases: TypedDict

kCihaiT mapping.

page: int
row: int
character: int
unihan_etl.expansion.expand_kCihaiT(value)[source]

Expand kCihaiT field.

Return type:

list[kCihaiTDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kIICoreDict[source]

Bases: TypedDict

kIICore mapping.

priority: str
sources: list[str]
unihan_etl.expansion.expand_kIICore(value)[source]

Expand kIICore field.

Return type:

list[kIICoreDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kDaeJaweonDict[source]

Bases: TypedDict

kDaehwan mapping.

page: int
character: int
virtual: int
unihan_etl.expansion.expand_kDaeJaweon(value)[source]

Expand kDaeJaweon field.

Return type:

kDaeJaweonDict

Parameters:

value (str)

unihan_etl.expansion.expand_kIRGKangXi(value)[source]

Expand kIRGKangXi field.

Return type:

list[kDaeJaweonDict]

Parameters:

value (list[str])

unihan_etl.expansion.expand_kIRGDaeJaweon(value)[source]

Expand kIRGDaeJaweon field.

Return type:

list[kDaeJaweonDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kFennDict[source]

Bases: TypedDict

kFenn mapping.

phonetic: str
frequency: str
unihan_etl.expansion.expand_kFenn(value)[source]

Expand kFenn field.

Return type:

list[kFennDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kHanyuPinluDict[source]

Bases: TypedDict

kHanyuPinlu mapping.

phonetic: str
frequency: int
unihan_etl.expansion.expand_kHanyuPinlu(value)[source]

Expand kHanyuPinlu field.

Return type:

list[kHanyuPinluDict]

Parameters:

value (list[str])

class unihan_etl.expansion.LocationDict[source]

Bases: TypedDict

Location mapping.

volume: int
page: int
character: int
virtual: int
class unihan_etl.expansion.kHDZRadBreakDict[source]

Bases: TypedDict

kHDZRadBreak mapping.

radical: str
ucn: str
location: LocationDict
unihan_etl.expansion.expand_kHDZRadBreak(value)[source]

Expand kHDZRadBreak field.

Return type:

kHDZRadBreakDict

Parameters:

value (str)

class unihan_etl.expansion.kSBGYDict[source]

Bases: TypedDict

kSBGY mapping.

page: int
character: int
unihan_etl.expansion.expand_kSBGY(value)[source]

Expand kSBGY field.

Return type:

list[kSBGYDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kRSSimplifiedType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: Enum

Whether ideograph is a simplified form of a radical.

“The radical is indicated by a number in the range 1-214, followed by an optional single apostrophe (U+0027 ‘ APOSTROPHE) or, double apostrophe (‘’), or triple apostrophe (‘’’) suffix. A single apostrophe after the radical indicates a Chinese simplified version of the given radical. Two apostrophes after the radical indicates a non-Chinese simplified version of the given radical. Three apostrophes after the radical indicates a second non-Chinese simplified version of the given radical.” Source: https://www.unicode.org/reports/tr38/tr38-36.html#kRSUnicode

Chinese = 'Chinese'
NonChinese = 'NonChinese'
SecondNonChinese = 'SecondNonChinese'
class unihan_etl.expansion.kRSGenericDict[source]

Bases: TypedDict

kRSGeneric mapping.

radical: int
strokes: int
simplified: Union[kRSSimplifiedType, Literal[False]]
unihan_etl.expansion.get_krs_simplified_type(val)[source]

Detect type of simplified radical, if one at all.

Return type:

Union[kRSSimplifiedType, Literal[False]]

Parameters:

val (str)

Examples

>>> get_krs_simplified_type('')
False
>>> get_krs_simplified_type("'")
<kRSSimplifiedType.Chinese: 'Chinese'>
>>> get_krs_simplified_type("''")
<kRSSimplifiedType.NonChinese: 'NonChinese'>
>>> get_krs_simplified_type("'''")
<kRSSimplifiedType.SecondNonChinese: 'SecondNonChinese'>
unihan_etl.expansion._expand_kRSGeneric(value)[source]

Expand kRSGeneric field.

Return type:

list[kRSGenericDict]

Parameters:

value (list[str])

Examples

>>> _expand_kRSGeneric(['5.10', "213''.0"])  
[{'radical': 5, 'strokes': 10, 'simplified': False},
{'radical': 213, 'strokes': 0, 'simplified':
    <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"])  
[{'radical': 120, 'strokes': 3, 'simplified':
    <kRSSimplifiedType.Chinese: 'Chinese'>}]
unihan_etl.expansion.expand_kRSUnicode(value)[source]

Expand kRSGeneric field.

Return type:

list[kRSGenericDict]

Parameters:

value (list[str])

Examples

>>> _expand_kRSGeneric(['5.10', "213''.0"])  
[{'radical': 5, 'strokes': 10, 'simplified': False},
{'radical': 213, 'strokes': 0, 'simplified':
    <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"])  
[{'radical': 120, 'strokes': 3, 'simplified':
    <kRSSimplifiedType.Chinese: 'Chinese'>}]
class unihan_etl.expansion.SourceLocationDict[source]

Bases: TypedDict

Source location mapping.

source: str
location: Optional[str]
unihan_etl.expansion._expand_kIRG_GenericSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_GSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_HSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_JSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_KPSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_KSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_MSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_SSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_TSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_USource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_UKSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
unihan_etl.expansion.expand_kIRG_VSource(value)[source]

Expand kIRG_GenericSource field.

Return type:

SourceLocationDict

Parameters:

value (str)

Examples

>>> _expand_kIRG_GenericSource('JMJ-056876')  
{'source': 'JMJ', 'location': '056876'}
>>> _expand_kIRG_GenericSource('SAT-02570')  
{'source': 'SAT', 'location': '02570'}
class unihan_etl.expansion.kGSRDict[source]

Bases: TypedDict

kGSR mapping.

set: int
letter: str
apostrophe: bool
unihan_etl.expansion.expand_kGSR(value)[source]

Expand kGSR field.

Return type:

list[kGSRDict]

Parameters:

value (list[str])

class unihan_etl.expansion.kCheungBauerIndexDict[source]

Bases: TypedDict

kCheungBauer mapping.

page: int
character: int
unihan_etl.expansion.expand_kCheungBauerIndex(value)[source]

Expand kCheungBauerIndex field.

Return type:

list[Union[str, kCheungBauerIndexDict]]

Parameters:

value (list[str])

unihan_etl.expansion.expand_kFennIndex(value)[source]

Expand kCheungBauerIndex field.

Return type:

list[Union[str, kCheungBauerIndexDict]]

Parameters:

value (list[str])

class unihan_etl.expansion.kStrangeDict[source]

Bases: TypedDict

kStrange mapping.

property_type: Literal['A', 'B', 'C', 'F', 'H', 'I', 'K', 'M', 'O', 'R', 'S', 'U']
characters: Sequence[str]
unihan_etl.expansion.is_valid_kstrange_property(value)[source]

Return True and upcast if valid kStrange property type.

Return type:

TypeGuard[kStrangeLiteral]

Parameters:

value (Any)

unihan_etl.expansion.expand_kStrange(value)[source]

Expand kStrange field.

Return type:

list[kStrangeDict]

Parameters:

value (list[str])

Examples

>>> expand_kStrange(['B:U+310D', 'I:U+5DDB'])
[{'property_type': 'B', 'characters': ['U+310D']},
{'property_type': 'I', 'characters': ['U+5DDB']}]
>>> expand_kStrange(['K:U+30A6:U+30C4:U+30DB'])  
[{'property_type': 'K', 'characters': ['U+30A6', 'U+30C4', 'U+30DB']}]
>>> expand_kStrange(['U'])  
[{'property_type': 'U', 'characters': []}]
class unihan_etl.expansion.kMojiJohoVariationDict[source]

Bases: TypedDict

Variation sequence of Moji Jōhō Kiban entry.

serial_number: str
variation_sequence: str
standard: bool
class unihan_etl.expansion.kMojiJohoDict[source]

Bases: TypedDict

kMojiJoho mapping.

serial_number: str
variants: list[kMojiJohoVariationDict]
unihan_etl.expansion.expand_kMojiJoho(value)[source]

Expand kMojiJoho (Moji Jōhō Kiban) field.

Return type:

kMojiJohoDict

Parameters:

value (str)

Examples

>>> expand_kMojiJoho('MJ000004')
{'serial_number': 'MJ000004', 'variants': []}
>>> expand_kMojiJoho('MJ000022 MJ000023:E0101 MJ000022:E0103')
{'serial_number': 'MJ000022', 'variants':
    [{'serial_number': 'MJ000023', 'variation_sequence': 'E0101',
    'standard': False},
    {'serial_number': 'MJ000022', 'variation_sequence': 'E0103',
    'standard': True}]}

See also

Assume

U+342A kMojiJoho MJ000022 MJ000023:E0101 MJ000022:E0103:

Database

class unihan_etl.expansion.kFanqieDict[source]

Bases: TypedDict

kFanqie mapping.

initial: str
final: str
unihan_etl.expansion.expand_kFanqie(value)[source]

Expand kFanqie field.

Return type:

list[kFanqieDict]

Parameters:

value (list[str])

Examples

>>> expand_kFanqie(['德紅'])
[{'initial': '德', 'final': '紅'}]
>>> expand_kFanqie(['蘇彫', '先鳥'])
[{'initial': '蘇', 'final': '彫'}, {'initial': '先', 'final': '鳥'}]
class unihan_etl.expansion.kZhuangDict[source]

Bases: TypedDict

kZhuang mapping.

reading: str
non_standard: bool
unihan_etl.expansion.expand_kZhuang(value)[source]

Expand kZhuang field.

Return type:

list[kZhuangDict]

Parameters:

value (list[str])

Examples

>>> expand_kZhuang(['naengh'])
[{'reading': 'naengh', 'non_standard': False}]
>>> expand_kZhuang(['fa*'])
[{'reading': 'fa', 'non_standard': True}]
unihan_etl.expansion.expand_field(field, fvalue)[source]

Return structured value of information in UNIHAN field.

Return type:

Any

Parameters:
  • field (str) – field name

  • fvalue (str) – value of field

Returns:

expanded field information per UNIHAN’s documentation

Return type:

list or dict