unihan-etl¶
Download, search, and export Unicode’s UNIHAN CJK character dataset. Normalizes raw Unicode data files into clean JSON, CSV, or YAML.
unihan-etl handles the data pipeline. For SQLAlchemy models, see unihan-db. For end-user character lookups, see cihai.
Quickstart
Install and run your first export.
CLI Reference
Every command, flag, and option.
API Reference
Core modules, types, and pytest plugin.
Topics
About UNIHAN, FAQ, and data format details.
Contributing
Development setup, code style, and release process.
Install¶
$ uv tool install unihan-etl
$ pip install unihan-etl
At a glance¶
Fetches raw UNIHAN data from unicode.org.
$ unihan-etl download
Look up a character across all fields.
$ unihan-etl search 好
Export the full dataset to JSON (also supports CSV, YAML).
$ unihan-etl export -F json