Utilities - unihan_etl.util
#
Utilities for parsing UNIHAN’s data and structures.
- unihan_etl.util.ucn_to_unicode(ucn)[source]#
Return a python unicode value from a UCN.
Converts a Unicode Universal Character Number (e.g.
"U+4E00"
or"4E00"
) to Python unicode(u'\u4e00')
>>> ucn_to_unicode("U+4E00") '\u4e00'
>>> ucn_to_unicode("4E00") '\u4e00'
- unihan_etl.util.ucnstring_to_python(ucn_string)[source]#
Return string with Unicode UCN (e.g. “U+4E00”) to native Python Unicode (u’u4e00’).
>>> ucnstring_to_python("U+4E00") b'ä¸'
- unihan_etl.util.ucnstring_to_unicode(ucn_string)[source]#
Return ucnstring as Unicode.
>>> ucnstring_to_unicode('U+4E00') '一'
>>> ucnstring_to_unicode('U+4E01') '丁'
>>> ucnstring_to_unicode('U+0030') '0'
>>> ucnstring_to_unicode('U+0031') '1'
- unihan_etl.util._dl_progress(count, block_size, total_size, out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]#
MIT License: https://github.com/okfn/dpm-old/blob/master/dpm/util.py
Modification for testing: http://stackoverflow.com/a/4220278
>>> _dl_progress(0, 1, 10) Total size: 10b
>>> _dl_progress(0, 100, 942_200) Total size: 942Kb