Expansion - unihan_etl.expansion
¶
Functions to uncompact details inside field values.
Notes
re.compile()
operations are inside of expand functions:
readability
module-level function bytecode is cached in python
the last used compiled regexes are cached
- unihan_etl.expansion.N_DIACRITICS = 'ńňǹ'¶
diacritics from kHanyuPinlu
- unihan_etl.expansion.expand_kMandarin(value)[source]¶
Expand kMandarin field.
- Return type:
- Parameters:
- unihan_etl.expansion.expand_kTotalStrokes(value)[source]¶
Expand kTotalStrokes field.
- Return type:
- Parameters:
- class unihan_etl.expansion.kAlternateTotalStrokesDict[source]¶
Bases:
TypedDict
kAlternateTotalStrokes mapping.
- unihan_etl.expansion.is_valid_kAlternateTotalStrokes_irg_source(value)[source]¶
Return True and upcast if valid kAlternateTotalStrokes source.
- Return type:
TypeGuard[kAlternateTotalStrokesLiteral]
- Parameters:
value (Any)
- unihan_etl.expansion.expand_kAlternateTotalStrokes(value)[source]¶
Expand kAlternateTotalStrokes field.
- Return type:
- Parameters:
Examples
>>> expand_kAlternateTotalStrokes(['3:J']) [{'strokes': 3, 'sources': ['J']}]
>>> expand_kAlternateTotalStrokes(['12:JK']) [{'strokes': 12, 'sources': ['J', 'K']}]
>>> expand_kAlternateTotalStrokes(['-']) [{'strokes': None, 'sources': ['-']}]
- unihan_etl.expansion.expand_kUnihanCore2020(value)[source]¶
Expand kUnihanCore2020 field.
Examples
>>> expand_kUnihanCore2020('GHJ') ['G', 'H', 'J']
- unihan_etl.expansion.expand_kIRGHanyuDaZidian(value)[source]¶
Expand kIRGHanyuDaZidian field.
- Return type:
- Parameters:
- class unihan_etl.expansion.kTGHZ2013LocationDict[source]¶
Bases:
TypedDict
kTGHZ2013 location mapping.
- class unihan_etl.expansion.kTGHZ2013Dict[source]¶
Bases:
TypedDict
kTGHZ2013 mapping.
-
locations:
Sequence
[kTGHZ2013LocationDict
]¶
-
locations:
- unihan_etl.expansion.expand_kTGHZ2013(value)[source]¶
Expand kTGHZ2013 field.
- Return type:
- Parameters:
Examples
>>> expand_kTGHZ2013(['097.110,097.120:fēng']) [{'reading': 'fēng', 'locations': [{'page': 97, 'position': 11, 'entry_type': 0}, {'page': 97, 'position': 12, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['482.140:zhòu']) [{'reading': 'zhòu', 'locations': [{'page': 482, 'position': 14, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['256.090:mò', '379.160:wàn']) [{'reading': 'mò', 'locations': [{'page': 256, 'position': 9, 'entry_type': 0}]}, {'reading': 'wàn', 'locations': [{'page': 379, 'position': 16, 'entry_type': 0}]}]
- class unihan_etl.expansion.kSMSZD2003IndexDict[source]¶
Bases:
TypedDict
kSMSZD2003Index location mapping.
- unihan_etl.expansion.expand_kSMSZD2003Index(value)[source]¶
Expand kSMSZD2003Index Soengmou San Zidin (商務新字典) field.
- Return type:
- Parameters:
Examples
>>> expand_kSMSZD2003Index(['26.07']) [{'page': 26, 'position': 7}]
>>> expand_kSMSZD2003Index(['769.05', '15.17', '291.20', '493.13']) [{'page': 769, 'position': 5}, {'page': 15, 'position': 17}, {'page': 291, 'position': 20}, {'page': 493, 'position': 13}]
Bibliography¶
Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.
- class unihan_etl.expansion.kSMSZD2003ReadingsDict[source]¶
Bases:
TypedDict
kSMSZD2003Readings location mapping.
- unihan_etl.expansion.expand_kSMSZD2003Readings(value)[source]¶
Expand kSMSZD2003Readings Soengmou San Zidin (商務新字典) field.
- Return type:
- Parameters:
Examples
>>> expand_kSMSZD2003Readings(['tà粵taat3']) [{'mandarin': ['tà'], 'cantonese': ['taat3']}]
>>> expand_kSMSZD2003Readings(['ma粵maa1,maa3', 'má粵maa1', 'mǎ粵maa1']) [{'mandarin': ['ma'], 'cantonese': ['maa1', 'maa3']}, {'mandarin': ['má'], 'cantonese': ['maa1']}, {'mandarin': ['mǎ'], 'cantonese': ['maa1']}]
Bibliography¶
Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.
- class unihan_etl.expansion.kHanyuPinyinPreDict[source]¶
Bases:
TypedDict
kHanyuPinyin predicate mapping.
-
locations:
Sequence
[Union
[str
,kLocationDict
]]¶
-
locations:
- class unihan_etl.expansion.kHanyuPinyinDict[source]¶
Bases:
TypedDict
kHanyuPinyin mapping.
-
locations:
kLocationDict
¶
-
locations:
- unihan_etl.expansion.expand_kHanyuPinyin(value)[source]¶
Expand kHanyuPinyin field.
- Return type:
- Parameters:
- class unihan_etl.expansion.kXHC1983LocationDict[source]¶
Bases:
TypedDict
kXHC1983 location mapping.
- class unihan_etl.expansion.kXHC1983Dict[source]¶
Bases:
TypedDict
kXHC1983 mapping.
-
locations:
kXHC1983LocationDict
¶
-
locations:
- class unihan_etl.expansion.kXHC1983PreDict[source]¶
Bases:
TypedDict
kXHC1983 predicate mapping.
-
locations:
Union
[list
[str
],kXHC1983LocationDict
]¶
-
locations:
- unihan_etl.expansion.expand_kXHC1983(value)[source]¶
Expand kXHC1983 field.
- Return type:
- Parameters:
- unihan_etl.expansion.expand_kCheungBauer(value)[source]¶
Expand kCheungBauer field.
- Return type:
- Parameters:
- unihan_etl.expansion.expand_kRSAdobe_Japan1_6(value)[source]¶
Expand kRSAdobe_Japan1_6 field.
- Return type:
- Parameters:
- unihan_etl.expansion.expand_kDaeJaweon(value)[source]¶
Expand kDaeJaweon field.
- Return type:
- Parameters:
value (str)
- unihan_etl.expansion.expand_kIRGKangXi(value)[source]¶
Expand kIRGKangXi field.
- Return type:
- Parameters:
- unihan_etl.expansion.expand_kIRGDaeJaweon(value)[source]¶
Expand kIRGDaeJaweon field.
- Return type:
- Parameters:
- unihan_etl.expansion.expand_kHanyuPinlu(value)[source]¶
Expand kHanyuPinlu field.
- Return type:
- Parameters:
- class unihan_etl.expansion.kHDZRadBreakDict[source]¶
Bases:
TypedDict
kHDZRadBreak mapping.
-
location:
LocationDict
¶
-
location:
- unihan_etl.expansion.expand_kHDZRadBreak(value)[source]¶
Expand kHDZRadBreak field.
- Return type:
- Parameters:
value (str)
- class unihan_etl.expansion.kRSSimplifiedType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
Enum
Whether ideograph is a simplified form of a radical.
“The radical is indicated by a number in the range 1-214, followed by an optional single apostrophe (U+0027 ‘ APOSTROPHE) or, double apostrophe (‘’), or triple apostrophe (‘’’) suffix. A single apostrophe after the radical indicates a Chinese simplified version of the given radical. Two apostrophes after the radical indicates a non-Chinese simplified version of the given radical. Three apostrophes after the radical indicates a second non-Chinese simplified version of the given radical.” Source: https://www.unicode.org/reports/tr38/tr38-36.html#kRSUnicode
- Chinese = 'Chinese'¶
- NonChinese = 'NonChinese'¶
- SecondNonChinese = 'SecondNonChinese'¶
- class unihan_etl.expansion.kRSGenericDict[source]¶
Bases:
TypedDict
kRSGeneric mapping.
-
simplified:
Union
[kRSSimplifiedType
,Literal
[False
]]¶
-
simplified:
- unihan_etl.expansion.get_krs_simplified_type(val)[source]¶
Detect type of simplified radical, if one at all.
- Return type:
Union
[kRSSimplifiedType
,Literal
[False
]]- Parameters:
val (str)
Examples
>>> get_krs_simplified_type('') False
>>> get_krs_simplified_type("'") <kRSSimplifiedType.Chinese: 'Chinese'>
>>> get_krs_simplified_type("''") <kRSSimplifiedType.NonChinese: 'NonChinese'>
>>> get_krs_simplified_type("'''") <kRSSimplifiedType.SecondNonChinese: 'SecondNonChinese'>
- unihan_etl.expansion._expand_kRSGeneric(value)[source]¶
Expand kRSGeneric field.
- Return type:
- Parameters:
Examples
>>> _expand_kRSGeneric(['5.10', "213''.0"]) [{'radical': 5, 'strokes': 10, 'simplified': False}, {'radical': 213, 'strokes': 0, 'simplified': <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"]) [{'radical': 120, 'strokes': 3, 'simplified': <kRSSimplifiedType.Chinese: 'Chinese'>}]
- unihan_etl.expansion.expand_kRSUnicode(value)[source]¶
Expand kRSGeneric field.
- Return type:
- Parameters:
Examples
>>> _expand_kRSGeneric(['5.10', "213''.0"]) [{'radical': 5, 'strokes': 10, 'simplified': False}, {'radical': 213, 'strokes': 0, 'simplified': <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"]) [{'radical': 120, 'strokes': 3, 'simplified': <kRSSimplifiedType.Chinese: 'Chinese'>}]
- unihan_etl.expansion._expand_kIRG_GenericSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_GSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_HSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_JSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_KPSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_KSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_MSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_SSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_TSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_USource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_UKSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.expand_kIRG_VSource(value)[source]¶
Expand kIRG_GenericSource field.
- Return type:
- Parameters:
value (str)
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- unihan_etl.expansion.is_valid_kstrange_property(value)[source]¶
Return True and upcast if valid kStrange property type.
- Return type:
TypeGuard[kStrangeLiteral]
- Parameters:
value (Any)
- unihan_etl.expansion.expand_kStrange(value)[source]¶
Expand kStrange field.
- Return type:
- Parameters:
Examples
>>> expand_kStrange(['B:U+310D', 'I:U+5DDB']) [{'property_type': 'B', 'characters': ['U+310D']}, {'property_type': 'I', 'characters': ['U+5DDB']}]
>>> expand_kStrange(['K:U+30A6:U+30C4:U+30DB']) [{'property_type': 'K', 'characters': ['U+30A6', 'U+30C4', 'U+30DB']}]
>>> expand_kStrange(['U']) [{'property_type': 'U', 'characters': []}]
- class unihan_etl.expansion.kMojiJohoVariationDict[source]¶
Bases:
TypedDict
Variation sequence of Moji Jōhō Kiban entry.
- class unihan_etl.expansion.kMojiJohoDict[source]¶
Bases:
TypedDict
kMojiJoho mapping.
-
variants:
list
[kMojiJohoVariationDict
]¶
-
variants:
- unihan_etl.expansion.expand_kMojiJoho(value)[source]¶
Expand kMojiJoho (Moji Jōhō Kiban) field.
- Return type:
- Parameters:
value (str)
Examples
>>> expand_kMojiJoho('MJ000004') {'serial_number': 'MJ000004', 'variants': []}
>>> expand_kMojiJoho('MJ000022 MJ000023:E0101 MJ000022:E0103') {'serial_number': 'MJ000022', 'variants': [{'serial_number': 'MJ000023', 'variation_sequence': 'E0101', 'standard': False}, {'serial_number': 'MJ000022', 'variation_sequence': 'E0103', 'standard': True}]}
See also
Assume
U+342A kMojiJoho MJ000022 MJ000023:E0101 MJ000022:E0103:
Database
- unihan_etl.expansion.expand_kFanqie(value)[source]¶
Expand kFanqie field.
- Return type:
- Parameters:
Examples
>>> expand_kFanqie(['德紅']) [{'initial': '德', 'final': '紅'}]
>>> expand_kFanqie(['蘇彫', '先鳥']) [{'initial': '蘇', 'final': '彫'}, {'initial': '先', 'final': '鳥'}]