unihan-etl export

Export UNIHAN data to CSV, JSON, or YAML format.

Command

Export UNIHAN data to CSV, JSON, or YAML.

Download, process, and export Unicode Han character database.

Usage

usage: unihan-etl export [-h] [-s SOURCE] [-z ZIP_PATH] [-d DESTINATION]
                         [-w WORK_DIR] [-F {json,csv,yaml}] [--no-expand]
                         [--no-prune] [--no-cache] [-f [FIELDS ...]]
                         [-i [INPUT_FILES ...]]

Examples

$ unihan-etl export
$ unihan-etl export -F json
$ unihan-etl export -F json -f kDefinition kMandarin
$ unihan-etl export -d /tmp/unihan.csv

Format

$ unihan-etl export -F csv
$ unihan-etl export -F json --no-expand
$ unihan-etl export -F yaml --no-prune

Options

-s, --source

URL or path of zipfile. Default: http://www.unicode.org/Public/UNIDATA/Unihan.zip

Default
None
-z, --zip-path

Path the zipfile is downloaded to. Default: /home/runner/.cache/unihan_etl/downloads/Unihan.zip

Default
None
-d, --destination

Output file. Default: /home/runner/.local/share/unihan_etl/unihan.{json,csv,yaml}

Default
None
-w, --work-dir

Working directory for extraction. Default: /home/runner/.cache/unihan_etl/downloads

Default
None
-F, --format

Output format. Default: csv

Default
None
Choices
json, csv, yaml
--no-expand

Don't expand values to lists in multi-value UNIHAN fields. Doesn't apply to CSVs.

Default
True
--no-prune

Don't prune fields with empty keys. Doesn't apply to CSVs.

Default
True
--no-cache

Don't cache the UNIHAN zip file or CSV outputs.

Default
True
-f, --fields

Fields to use in export. Separated by spaces. All fields used by default. Fields: kAccountingNumeric, kAlternateTotalStrokes, kBigFive, kCCCII, kCNS1986, kCNS1992, kCangjie, kCantonese, kCheungBauer, kCheungBauerIndex, kCihaiT, kCompatibilityVariant, kCowles, kDaeJaweon, kDefinition, kEACC, kFanqie, kFenn, kFennIndex, kFourCornerCode, kGB0, kGB1, kGB3, kGB5, kGB8, kGSR, kGradeLevel, kHDZRadBreak, kHKGlyph, kHanYu, kHangul, kHanyuPinlu, kHanyuPinyin, kIBMJapan, kIICore, kIRGDaeJaweon, kIRGHanyuDaZidian, kIRGKangXi, kIRG_GSource, kIRG_HSource, kIRG_JSource, kIRG_KPSource, kIRG_KSource, kIRG_MSource, kIRG_SSource, kIRG_TSource, kIRG_UKSource, kIRG_USource, kIRG_VSource, kJIS0213, kJapanese, kJapaneseKun, kJapaneseOn, kJinmeiyoKanji, kJis0, kJis1, kJoyoKanji, kKangXi, kKarlgren, kKorean, kKoreanEducationHanja, kKoreanName, kLau, kMainlandTelegraph, kMandarin, kMatthews, kMeyerWempe, kMojiJoho, kMorohashi, kNelson, kOtherNumeric, kPhonetic, kPrimaryNumeric, kPseudoGB1, kRSAdobe_Japan1_6, kRSUnicode, kSBGY, kSMSZD2003Index, kSMSZD2003Readings, kSemanticVariant, kSimplifiedVariant, kSpecializedSemanticVariant, kSpoofingVariant, kStrange, kTGH, kTGHZ2013, kTaiwanTelegraph, kTang, kTayNumeric, kTotalStrokes, kTraditionalVariant, kUnihanCore2020, kVietnamese, kVietnameseNumeric, kXHC1983, kXerox, kZVariant, kZhuang, kZhuangNumeric

Default
None
-i, --input-files

Files inside zip to pull data from. Separated by spaces. All files used by default. Files: Unihan_DictionaryIndices.txt, Unihan_DictionaryLikeData.txt, Unihan_IRGSources.txt, Unihan_NumericValues.txt, Unihan_OtherMappings.txt, Unihan_RadicalStrokeCounts.txt, Unihan_Readings.txt, Unihan_Variants.txt

Default
None

Examples

Export all UNIHAN data to JSON:

$ unihan-etl export -F json

Export specific fields:

$ unihan-etl export -F json -f kDefinition kMandarin