unihan-etl

Download, search, and export Unicode’s UNIHAN CJK character dataset. Normalizes raw Unicode data files into clean JSON, CSV, or YAML.

unihan-etl handles the data pipeline. For SQLAlchemy models, see unihan-db. For end-user character lookups, see cihai.

Quickstart

Install and run your first export.

Quickstart
CLI Reference

Every command, flag, and option.

CLI Reference
API Reference

Core modules, types, and pytest plugin.

API Reference
Topics

About UNIHAN, FAQ, and data format details.

Topics
Contributing

Development setup, code style, and release process.

Project

Install

$ uv tool install unihan-etl
$ pip install unihan-etl

At a glance

Fetches raw UNIHAN data from unicode.org.

$ unihan-etl download

Look up a character across all fields.

$ unihan-etl search 

Export the full dataset to JSON (also supports CSV, YAML).

$ unihan-etl export -F json