Utilities - unihan_etl.util
¶
Utilities for parsing UNIHAN’s data and structures.
- unihan_etl.util.ucn_to_unicode(ucn)[source]¶
Return a python unicode value from a UCN.
Converts a Unicode Universal Character Number (e.g.
"U+4E00"
or"4E00"
) to Python unicode(u'\\u4e00')
>>> ucn_to_unicode("U+4E00") '一'
>>> ucn_to_unicode("4E00") '一'
- unihan_etl.util.ucnstring_to_python(ucn_string)[source]¶
Return Unicode UCN (e.g. “U+4E00”) as native Python Unicode (u’\u4e00’).
>>> ucnstring_to_python("U+4E00") b'\xe4\xb8\x80'
- unihan_etl.util.ucnstring_to_unicode(ucn_string)[source]¶
Return ucnstring as Unicode.
>>> ucnstring_to_unicode('U+4E00') '一'
>>> ucnstring_to_unicode('U+4E01') '丁'
>>> ucnstring_to_unicode('U+0030') '0'
>>> ucnstring_to_unicode('U+0031') '1'
- unihan_etl.util._dl_progress(count, block_size, total_size, out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]¶
MIT License: https://github.com/okfn/dpm-old/blob/master/dpm/util.py.
Modification for testing: http://stackoverflow.com/a/4220278
>>> _dl_progress(0, 1, 10) Total size: 10b
>>> _dl_progress(0, 100, 942_200) Total size: 942Kb