Expansion - unihan_etl.expansion¶
Functions to uncompact details inside field values.
Notes
re.compile() operations are inside of expand functions:
readability
module-level function bytecode is cached in python
the last used compiled regexes are cached
-
unihan_etl.expansion.N_DIACRITICS = 'ńňǹ'¶
diacritics from kHanyuPinlu
-
unihan_etl.expansion.expand_kDefinition(value)¶
Expand kDefinition field.
-
unihan_etl.expansion.expand_kMandarin(value)¶
Expand kMandarin field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kTotalStrokes(value)¶
Expand kTotalStrokes field.
- Parameters:
- Return type:
-
class unihan_etl.expansion.kAlternateTotalStrokesDict¶
Bases:
TypedDictkAlternateTotalStrokes mapping.
-
unihan_etl.expansion.is_valid_kAlternateTotalStrokes_irg_source(value)¶
Return True and upcast if valid kAlternateTotalStrokes source.
-
unihan_etl.expansion.expand_kAlternateTotalStrokes(value)¶
Expand kAlternateTotalStrokes field.
Examples
>>> expand_kAlternateTotalStrokes(['3:J']) [{'strokes': 3, 'sources': ['J']}]
>>> expand_kAlternateTotalStrokes(['12:JK']) [{'strokes': 12, 'sources': ['J', 'K']}]
>>> expand_kAlternateTotalStrokes(['-']) [{'strokes': None, 'sources': ['-']}]
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kUnihanCore2020(value)¶
Expand kUnihanCore2020 field.
Examples
>>> expand_kUnihanCore2020('GHJ') ['G', 'H', 'J']
-
unihan_etl.expansion.expand_kHanYu(value)¶
Expand kHanYu field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kIRGHanyuDaZidian(value)¶
Expand kIRGHanyuDaZidian field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kTGHZ2013(value)¶
Expand kTGHZ2013 field.
Examples
>>> expand_kTGHZ2013(['097.110,097.120:fēng']) [{'reading': 'fēng', 'locations': [{'page': 97, 'position': 11, 'entry_type': 0}, {'page': 97, 'position': 12, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['482.140:zhòu']) [{'reading': 'zhòu', 'locations': [{'page': 482, 'position': 14, 'entry_type': 0}]}]
>>> expand_kTGHZ2013(['256.090:mò', '379.160:wàn']) [{'reading': 'mò', 'locations': [{'page': 256, 'position': 9, 'entry_type': 0}]}, {'reading': 'wàn', 'locations': [{'page': 379, 'position': 16, 'entry_type': 0}]}]
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kSMSZD2003Index(value)¶
Expand kSMSZD2003Index Soengmou San Zidin (商務新字典) field.
Examples
>>> expand_kSMSZD2003Index(['26.07']) [{'page': 26, 'position': 7}]
>>> expand_kSMSZD2003Index(['769.05', '15.17', '291.20', '493.13']) [{'page': 769, 'position': 5}, {'page': 15, 'position': 17}, {'page': 291, 'position': 20}, {'page': 493, 'position': 13}]
Bibliography¶
Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.
- Parameters:
- Return type:
-
class unihan_etl.expansion.kSMSZD2003ReadingsDict¶
Bases:
TypedDictkSMSZD2003Readings location mapping.
-
unihan_etl.expansion.expand_kSMSZD2003Readings(value)¶
Expand kSMSZD2003Readings Soengmou San Zidin (商務新字典) field.
Examples
>>> expand_kSMSZD2003Readings(['tà粵taat3']) [{'mandarin': ['tà'], 'cantonese': ['taat3']}]
>>> expand_kSMSZD2003Readings(['ma粵maa1,maa3', 'má粵maa1', 'mǎ粵maa1']) [{'mandarin': ['ma'], 'cantonese': ['maa1', 'maa3']}, {'mandarin': ['má'], 'cantonese': ['maa1']}, {'mandarin': ['mǎ'], 'cantonese': ['maa1']}]
Bibliography¶
Wong Gongsang 黃港生, ed. Shangwu Xin Zidian / Soengmou San Zidin 商務新字典 (New Commercial Press Character Dictionary). Hong Kong: 商務印書館(香港)有限公司 (Commercial Press [Hong Kong], Ltd.), 2003. ISBN 962-07-0140-2.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kHanyuPinyin(value)¶
Expand kHanyuPinyin field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kXHC1983(value)¶
Expand kXHC1983 field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kCheungBauer(value)¶
Expand kCheungBauer field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kRSAdobe_Japan1_6(value)¶
Expand kRSAdobe_Japan1_6 field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kCihaiT(value)¶
Expand kCihaiT field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kIICore(value)¶
Expand kIICore field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kDaeJaweon(value)¶
Expand kDaeJaweon field.
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRGKangXi(value)¶
Expand kIRGKangXi field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kIRGDaeJaweon(value)¶
Expand kIRGDaeJaweon field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kFenn(value)¶
Expand kFenn field.
-
unihan_etl.expansion.expand_kHanyuPinlu(value)¶
Expand kHanyuPinlu field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kHDZRadBreak(value)¶
Expand kHDZRadBreak field.
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kSBGY(value)¶
Expand kSBGY field.
-
class unihan_etl.expansion.kRSSimplifiedType¶
Bases:
EnumWhether ideograph is a simplified form of a radical.
“The radical is indicated by a number in the range 1-214, followed by an optional single apostrophe (U+0027 ‘ APOSTROPHE) or, double apostrophe (‘’), or triple apostrophe (‘’’) suffix. A single apostrophe after the radical indicates a Chinese simplified version of the given radical. Two apostrophes after the radical indicates a non-Chinese simplified version of the given radical. Three apostrophes after the radical indicates a second non-Chinese simplified version of the given radical.” Source: https://www.unicode.org/reports/tr38/tr38-36.html#kRSUnicode
-
unihan_etl.expansion.get_krs_simplified_type(val)¶
Detect type of simplified radical, if one at all.
Examples
>>> get_krs_simplified_type('') False
>>> get_krs_simplified_type("'") <kRSSimplifiedType.Chinese: 'Chinese'>
>>> get_krs_simplified_type("''") <kRSSimplifiedType.NonChinese: 'NonChinese'>
>>> get_krs_simplified_type("'''") <kRSSimplifiedType.SecondNonChinese: 'SecondNonChinese'>
- Parameters:
val (str)
- Return type:
kRSSimplifiedType | Literal[False]
-
unihan_etl.expansion._expand_kRSGeneric(value)¶
Expand kRSGeneric field.
Examples
>>> _expand_kRSGeneric(['5.10', "213''.0"]) [{'radical': 5, 'strokes': 10, 'simplified': False}, {'radical': 213, 'strokes': 0, 'simplified': <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"]) [{'radical': 120, 'strokes': 3, 'simplified': <kRSSimplifiedType.Chinese: 'Chinese'>}]
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kRSUnicode(value)¶
Expand kRSGeneric field.
Examples
>>> _expand_kRSGeneric(['5.10', "213''.0"]) [{'radical': 5, 'strokes': 10, 'simplified': False}, {'radical': 213, 'strokes': 0, 'simplified': <kRSSimplifiedType.NonChinese: 'NonChinese'>}]
>>> _expand_kRSGeneric(["120'.3"]) [{'radical': 120, 'strokes': 3, 'simplified': <kRSSimplifiedType.Chinese: 'Chinese'>}]
- Parameters:
- Return type:
-
unihan_etl.expansion._expand_kIRG_GenericSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_GSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_HSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_JSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_KPSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_KSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_MSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_SSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_TSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_USource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_UKSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kIRG_VSource(value)¶
Expand kIRG_GenericSource field.
Examples
>>> _expand_kIRG_GenericSource('JMJ-056876') {'source': 'JMJ', 'location': '056876'} >>> _expand_kIRG_GenericSource('SAT-02570') {'source': 'SAT', 'location': '02570'}
- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kGSR(value)¶
Expand kGSR field.
-
unihan_etl.expansion.expand_kCheungBauerIndex(value)¶
Expand kCheungBauerIndex field.
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kFennIndex(value)¶
Expand kCheungBauerIndex field.
- Parameters:
- Return type:
-
unihan_etl.expansion.is_valid_kstrange_property(value)¶
Return True and upcast if valid kStrange property type.
-
unihan_etl.expansion.expand_kStrange(value)¶
Expand kStrange field.
Examples
>>> expand_kStrange(['B:U+310D', 'I:U+5DDB']) [{'property_type': 'B', 'characters': ['U+310D']}, {'property_type': 'I', 'characters': ['U+5DDB']}]
>>> expand_kStrange(['K:U+30A6:U+30C4:U+30DB']) [{'property_type': 'K', 'characters': ['U+30A6', 'U+30C4', 'U+30DB']}]
>>> expand_kStrange(['U']) [{'property_type': 'U', 'characters': []}]
- Parameters:
- Return type:
-
class unihan_etl.expansion.kMojiJohoVariationDict¶
Bases:
TypedDictVariation sequence of Moji Jōhō Kiban entry.
-
unihan_etl.expansion.expand_kMojiJoho(value)¶
Expand kMojiJoho (Moji Jōhō Kiban) field.
Examples
>>> expand_kMojiJoho('MJ000004') {'serial_number': 'MJ000004', 'variants': []}
>>> expand_kMojiJoho('MJ000022 MJ000023:E0101 MJ000022:E0103') {'serial_number': 'MJ000022', 'variants': [{'serial_number': 'MJ000023', 'variation_sequence': 'E0101', 'standard': False}, {'serial_number': 'MJ000022', 'variation_sequence': 'E0103', 'standard': True}]}
See also
AssumeU+342A kMojiJoho MJ000022 MJ000023:E0101 MJ000022:E0103:
Database- Parameters:
value (str)
- Return type:
-
unihan_etl.expansion.expand_kFanqie(value)¶
Expand kFanqie field.
Examples
>>> expand_kFanqie(['德紅']) [{'initial': '德', 'final': '紅'}]
>>> expand_kFanqie(['蘇彫', '先鳥']) [{'initial': '蘇', 'final': '彫'}, {'initial': '先', 'final': '鳥'}]
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_kZhuang(value)¶
Expand kZhuang field.
Examples
>>> expand_kZhuang(['naengh']) [{'reading': 'naengh', 'non_standard': False}]
>>> expand_kZhuang(['fa*']) [{'reading': 'fa', 'non_standard': True}]
- Parameters:
- Return type:
-
unihan_etl.expansion.expand_field(field, fvalue)¶
Return structured value of information in UNIHAN field.