Changelog

To install the unreleased unihan-etl version, see developmental releases.

pip:

$ pip install --user --upgrade --pre unihan-etl

pipx:

$ pipx install --suffix=@next unihan-etl --pip-args '\--pre' --force
// Usage: unihan-etl@next

unihan-etl 0.38.x (unreleased)

unihan-etl 0.37.0 (2024-12-21)

Maintenance release: No bug fixes or new features.

Breaking changes

  • Drop Python 3.8 (#332)

    The minimum version of Python in this and future releases is Python 3.9.

    Python 3.8 reached end-of-life status on October 7th, 2024 (see PEP 569).

Development

  • Aggressive automated lint fixes via ruff (#332)

    via ruff v0.8.4, all automated lint fixes, including unsafe and previews were applied for Python 3.9:

    ruff check --select ALL . --fix --unsafe-fixes --preview --show-fixes; ruff format .
    

unihan-etl 0.36.0 (2024-11-26)

Maintenance release: No bug fixes or new features.

Breaking changes

Project and package management: poetry to uv (#329)

uv is the new package and project manager for the project, replacing Poetry.

Build system: poetry to hatchling (#329)

Build system moved from poetry to hatchling.

unihan-etl 0.35.0 (2024-11-25)

Revision 37 updates (#330)

These changes align with Unicode Technical Report #38’s 37th revision and are part of ongoing improvements to Unihan data handling.

Add kFanqie and kZhuang

Adds support for the kFanqie and kZhuang fields.

See also:

Support kRSUnicode for apostrophes

Unihan_IRGSources: Updated kRSUnicode for apostrophes.

See also:

Removals

  • kFrequency: Removed from Unihan_DictionaryLikeData, constants, and datapackage.json.

See also:

Tests

  • Added tests for simplified expansions to ensure correctness of kFanqie and kZhuang.

Documentation

  • Automatically linkify links that were previously only text.

Development

unihan-etl 0.34.0 (2024-03-24)

Development

  • Aggressive automated lint fixes via ruff (#317)

    via ruff v0.3.4, all automated lint fixes, including unsafe and previews were applied:

    ruff check --select ALL . --fix --unsafe-fixes --preview --show-fixes; ruff format .
    

    Branches were treated with:

    git rebase \
        --strategy-option=theirs \
        --exec 'poetry run ruff check --select ALL . --fix --unsafe-fixes --preview --show-fixes; poetry run ruff format .; git add src tests; git commit --amend --no-edit' \
        origin/master
    
  • poetry: 1.7.1 -> 1.8.1

    See also: https://github.com/python-poetry/poetry/blob/1.8.1/CHANGELOG.md

  • ruff 0.2.2 -> 0.3.0 (#316)

    Related formattings. Update CI to use ruff check . instead of ruff ..

    See also: https://github.com/astral-sh/ruff/blob/v0.3.0/CHANGELOG.md

unihan-etl 0.33.1 (2024-02-09)

Maintenance release: No bug fixes or new features.

Documentation

  • README: Rewrite introduction, note updated UNIHAN compatibility information.

  • Link to UNIHAN release in v0.31.0’s changelog notes.

unihan-etl 0.33.0 (2024-02-09)

Maintenance release: No bug fixes or new features.

Documentation

  • CsvLexer: Fix quoted items (#314)

Development

unihan-etl 0.32.0 (2024-02-05)

Documentation

  • Highlighting for CSV and TSV examples (#253)

Improvements

  • Typing fixes and additional doctest for kTGH2013 (#312)

Development

  • Added types-pygments package (#253)

  • Added some manual type stubs for pygmentsLexer (#253)

  • pytest-watcher: Silent *.py.*py reruns (#312)

unihan-etl 0.31.0 (2024-02-04)

Breaking: UNIHAN upgrades (#305)

Bump UNIHAN compatibility from 11.0.0 to 15.1.0 (released 2023-09-01, revision 35).

Removed fields

New fields

Development

  • Quiet pytest tracebacks (#310)

  • Relax pytest plugin assertions in regards to zip / export file size (#310)

unihan-etl 0.30.1 (2023-12-10)

Bug fix

  • Expansions: Fix loading of double apostrophe values via kRSUnicode via kRSGeneric (#304)

unihan-etl 0.30.0post0 (2023-11-26)

CI

  • Move CodeQL from advanced configuration file to GitHub’s default

Documentation

  • Typo fixes

unihan-etl 0.30.0 (2023-11-26)

Maintenance only, no bug fixes, or new features

Development

  • ci: Add pydocstyle rule to ruff (#303)

Documentation

  • Add docstrings to functions, methods, classes, and packages (#303)

unihan-etl 0.29.0 (2023-11-19)

Maintenance only, no bug fixes, or new features

Packaging

  • Move pytest configuration to pyproject.toml (#299)

  • Add Python 3.12 to trove classifiers

  • Per Poetry’s docs on managing dependencies and poetry check, we had it wrong: Instead of using extras, we should create these:

    [tool.poetry.group.group-name.dependencies]
    dev-dependency = "1.0.0"
    

    Which we now do.

Development

unihan-etl 0.28.1 (2023-09-02)

Bug fix

  • SPACE_DELIMITED_LIST_FIELDS: Fix for field name kAccountingNumeric found during automated sweep for typos.

Development

  • Typo fixes

    typos --format brief --write-changes
    

    One of these typos was for kAccountingNumeric in SPACE_DELIMITED_LIST_FIELDS.

  • ruff: Remove ERA / eradicate plugin

    This rule had too many false positives to trust. Other ruff rules have been beneficial.

unihan-etl 0.28.0 (2023-07-22)

Breaking: pytest fixtures now prefixed with unihan_ (#296)

  • All pytest plugin fixtures are now prefixed unihan_, e.g.:

    • quick_unihan_path -> unihan_quick_path

    • quick_unihan_options -> unihan_quick_options

    • quick_unihan_packager -> unihan_quick_packager

    • ensure_quick_unihan -> unihan_ensure_quick

    • mock_zip -> unihan_mock_zip

    • columns -> unihan_quick_columns

  • TestPackager fixture has been removed

    This fixture was made redundant by unihan_quick_* and unihan_full_* fixtures

Bug fixes (#296)

  • pytest plugin (unihan_zshrc): Fix skipif condition to run if shell uses zsh(1)

unihan-etl 0.27.0 (2023-07-18)

Breaking: pytest fixtures renamed, data moved (#294)

  • “quick” fixtures:

    • Data has been moved from tests/fixtures to src/unihan_etl/data_files/quick

    • Fixtures prefixed by sample_ in the name have been renamed to quick_

  • “quick” and “full” fixtures: Fixed ability to access data files from outside unihan_etl package

Development

  • ruff: Code quality tweaks (#295)

unihan-etl 0.26.0 (2023-07-09)

Features

  • pytest plugin: Add cached fixtures for UNIHAN (#291)

    After initial download of UNIHAN.zip, an 11 second testrun on unihan-etl’s test can go down to 1.5 seconds - eliminating redownloading and extraction.

unihan-etl 0.25.2 (2023-07-08)

Bug Fixes

  • pytest plugin: Revert fix of zshrc fixture’s skipif condition (#293)

    It was fine as-is.

unihan-etl 0.25.1 (2023-07-08)

Rolled back

Bug Fixes

  • pytest plugin: Fix zshrc fixture’s skipif condition (#292)

unihan-etl 0.25.0 (2023-07-01)

Maintenance only, no bug fixes, or new features

Internal changes

  • ruff: Add additional linters, apply code fixes automatically and by hand (#290)

  • Typings: Extract LogLevel and UnihanFormats

unihan-etl 0.24.0 (2023-06-24)

Maintenance only, no bug fixes, or new features

Internal changes

  • zhon: 1.1.5 -> 2.0.0 (#289, fixes #282)

    Release notes

    Fixes pytest warning related to regular expressions.

unihan-etl 0.23.0 (2023-06-24)

Maintenance only, no bug fixes, or new features

Internal changes

  • unihan_etl._internal.app_dirs improvements (#287)

    • Breaking: app_dirs moved

      • Before 0.23.x: unihan_etl.app_dirs

      • After 0.23.x: unihan_etl._internal.app_dirs

    • New feature: Override directories on a one-off basis

    • New feature: Template replacement of variables replacing environmental variables via os.path.expandvars() + os.path.expanduser()

    • doctests: See the above in action thanks to doctests

    • Dedicated tests via pytest

Documentation

  • API docs (#288):

    • Limit depth of table of contents to one

    • Fix section heading

    • Fix comment in AppDirs

unihan-etl 0.22.1 (2023-06-18)

Bug fixes

  • Fix for destination of files not replacing file extension correctly (#285)

unihan-etl 0.22.0 (2023-06-17)

Breaking changes

unihan_etl.process -> unihan_etl.core (#284)

This module has been renamed.

Configuration (#280)

Before 0.22.x, unihan_etl’s configuration was done through a dict object.

0.22.0 and after settings are configurable via a dataclasses.dataclass object: unihan_etl.options.Options

Documentation

  • Add doctest support (#274)

  • Stub out initial pytest plugin (#274)

  • Split API docs into multiple files (#283)

  • Fix make start in docs/Makefile by fixing argument positions (#283)

unihan-etl 0.21.1 (2023-06-18)

Bug fixes

  • Fix for destination of files not replacing file extension correctly (#286)

unihan-etl 0.21.0 (2023-06-12)

Maintenance only, no bug fixes or features

Internal improvements

unihan-etl 0.20.0 (2023-06-11)

Maintenance only, no bug fixes or features

Breaking changes

Internal improvements

  • Typings:

    • Import typing as a namespace, e.g. import typing as t (#276)

    • Use typing for typing.TypedDict and typing.Literal (#276)

    • Use typing_extensions’ TypeAlias for repeated types, such in test_expansions (#276)

unihan-etl 0.19.1 (2023-05-28)

Maintenance only, no bug fixes or features

Development

  • Add back black for formatting

    This is still necessary to accompany ruff, until it replaces black.

unihan-etl 0.19.0 (2023-05-27)

Maintenance only, no bug fixes or features

Internal improvements

  • Move formatting, import sorting, and linting to ruff.

    This rust-based checker has dramatically improved performance. Linting and formatting can be done almost instantly.

    This change replaces black, isort, flake8 and flake8 plugins.

  • poetry: 1.4.0 -> 1.5.0

    See also: https://github.com/python-poetry/poetry/releases/tag/1.5.0

  • pytest: Fix invalid escape sequence warning from zhon

Development

  • merge_dict: Improve typing of generic params (#271)

unihan-etl 0.18.1 (2022-10-01)

Packaging

  • Add PyYAML dependency

Infrastructure

  • CI speedups (#267)

    • Split out release to separate job so the PyPI Upload docker image isn’t pulled on normal runs

    • Clean up CodeQL

  • Bump poetry 1.1.x to 1.2.x

Packaging

  • Move .coveragerc -> pyproject.toml (#268)

unihan-etl 0.18.0 (2022-09-11)

Development

Documentation

unihan-etl 0.17.2 (2022-08-21)

Documentation

  • Add vendorized, updated fork of sphinxcontrib-issuetracker, via #261.

  • Remove sphinx-issues package

unihan-etl 0.17.1 (2022-08-21)

Follow ups to #257.

Fixes

  • merged_dict(): Fix merging edgecase where destination key was missing

  • download(): Fix edgecase when “downloading” file from local path

unihan-etl 0.17.0 (2022-08-21)

Features

  • mypy --strict annotations, via #257

unihan-etl 0.16.0 (2022-08-20)

Features

  • New option: --no-cache

    Disregard cached .zip / extracted files, via #259.

Development

  • Add python 3.8 and 3.9 to CI

    This is to make way for strict type annotations, as the typings and generic behavior vary dramatically between 3.7 - 3.11.

unihan-etl 0.15.0 (2022-08-29)

Breaking changes

  • Python 2 compatibility module and imports removed. Python 2.x was officially dropped in 0.12.0 (2021-06-15) via #258

unihan-etl 0.14.0 (2022-08-16)

Improvements

  • load_data: Accept list of pathlib.Path in addition to list of str

Compatibility

  • Add Python 3.10 (#248)

  • Dropped Python 3.6 (#248)

Development

Infrastructure updates for static type checking and doctest examples.

  • Update poetry to 1.1

    • CI: Use poetry 1.1.12 and install-poetry.py installer (#237 + #248)

    • Relock poetry.lock at 1.1 (w/ 1.1.7’s fix)

  • Run pyupgrade for python 3.7

  • Tests: Move from tmpdir -> tmp_path

  • Initial doctests support added, via #255

  • Initial mypy validation, via #255

  • CI (tests, docs): Improve caching of python dependencies via action/setup-python’s v3/4’s new poetry caching, via #255

  • CI (docs): Skip if no PUBLISH condition triggered, via #255

Documentation

unihan-etl 0.13.0 (2021-06-16)

  • #236: Convert to markdown

unihan-etl 0.12.0 (2021-06-15)

  • Update black to 21.6b0

  • Update trove classifiers to 3.9

  • #235: Drop python 2.7, 3.5. Remove python 2 modesets and __future__

unihan-etl 0.11.0 (2020-08-09)

  • #230 Move packaging / publishing to poetry

  • #229 Self host docs

  • #229 Add metadata / icons / etc. for doc site

  • #229 Move travis -> github actions

  • #229 Overhaul Makefiles

unihan-etl 0.10.4 (2020-08-05)

  • Update CHANGES headings to produce working links

  • Relax appdirs version constraint

  • #228 Move from Pipfile to poetry

unihan-etl 0.10.3 (2019-08-18)

  • Fix flicker in download progress bar

unihan-etl 0.10.2 (2019-08-17)

  • Add project_urls to setup.py

  • Use plain reStructuredText for CHANGES

  • Use collections that’s compatible with python 2 and 3

  • PEP8 tweaks

unihan-etl 0.10.1 (2017-09-08)

  • Add code links in API

  • Add __version__ to unihan_etl

unihan-etl 0.10.0 (2017-08-29)

  • #91 New fields from UNIHAN Revision 25.

    • kJinmeiyoKanji

    • kJoyoKanji

    • kKoreanEducationHanja

    • kKoreanName

    • kTGH

    UNIHAN Revision 25 was released 2018-05-18 and issued for Unicode 11.0:

  • Add tests and example corpus for kCCCII

  • Add configuration / make tests for isort, flake8

  • Switch tmuxp config to use pipenv

  • Add Pipfile

  • Add make sync_pipfile task to sync requirements/.txt* files with *Pipfile*

  • Update and sync Pipfile

  • Developer package updates (linting / docs / testing)

    • isort 4.2.15 to 4.3.4

    • flake8 3.3.0 to 3.5.0

    • vulture 0.14 to 0.27

    • sphinx 1.6.2 to 1.7.6

    • alagitpull 0.0.12 to 0.0.21

    • releases 1.3.1 to 1.6.1

    • sphinx-argparse 0.2.1 to 1.6.2

    • pytest 3.1.2 to 3.6.4

  • Move documentation over to numpy-style

  • Add sphinxcontrib-napoleon 0.6.1

  • Update LICENSE New BSD to MIT

  • All future commits and contributions are licensed to the cihai software foundation. This includes commits by Tony Narlock (creator).

unihan-etl 0.9.5 (2017-06-26)

  • Enhance support for locations on kHDZRadBreak fields.

unihan-etl 0.9.4 (2017-06-05)

  • Fix kIRG_GSource without location

  • Fix kFenn output

  • Fix kHanyuPinlu support output for n diacritics

unihan-etl 0.9.3 (2017-05-31)

  • Add expansion for kIRGKangXi

unihan-etl 0.9.2 (2017-05-31)

  • Normalize Radical-Stroke expansion for kRSUnicode

  • Migrate more fields to regular expressions

  • Normalize character field for kDaeJaweon, kHanyuPinyin, and kCheungBauer, kFennIndex, kCheungBauerIndex, kIICore, kIRGHanyuDaZidian

unihan-etl 0.9.1 (2017-05-27)

  • Support for expanding kGSR

  • Convert some field expansions to use regexes

unihan-etl 0.9.0 (2017-05-26)

  • Fix bug where destination file was made into directory on first run

  • Rename from unihan-tabular to unihan-etl

  • Support for expanding multi-value fields

  • Support for pruning empty fields

  • Improve help dialog

  • Added a page about UNIHAN and the project to documentation

  • Split constant values into their own module

  • Split functionality for expanding unstructured values into its own module

unihan-etl 0.8.1 (2017-05-20)

  • Update to add kJa and adjust source file of kCompatibilityVariant per Unicode 8.0.0.

unihan-etl 0.8.0 (2017-05-17)

  • Support for configuring logging via options and CLI

  • Convert all print statements to use logger

unihan-etl 0.7.4 (2017-05-14)

  • Allow for local / file system sources for Unihan.zip

  • Only extract zip if unextracted

unihan-etl 0.7.3 (2017-05-13)

  • Update package classifiers

unihan-etl 0.7.2 (2017-05-13)

  • Add back datapackage

unihan-etl 0.7.1 (2017-05-12)

  • Fix python 2 CSV output

  • Default to CSV output

unihan-etl 0.7.0 (2017-05-12)

  • Move unicodecsv module to dependency package

  • Support for XDG directory specification

  • Support for custom destination output, including replacing template variable {ext}

unihan-etl 0.6.3 (2017-05-11)

  • Move about.py to module level

unihan-etl 0.6.2 (2017-05-11)

  • Fix python package import

unihan-etl 0.6.1 (2017-05-10)

  • Fix readme bug on pypi

unihan-etl 0.6.0 (2017-05-10)

  • Support for exporting in YAML and JSON

  • More internal factoring and simplification

  • Return data as list

unihan-etl 0.5.1 (2017-05-08)

  • Drop python 3.3 an 3.4 support

unihan-etl 0.5.0 (2017-05-08)

  • Rename from cihaidata_unihan unihan_tabular

  • Drop datapackages in favor of a universal JSON, YAML and CSV export.

  • Only use UnicodeWriter in Python 2, fixes issue with python would encode b in front of values

unihan-etl 0.4.2 (2017-05-07)

  • Rename scripts/ to cihaidata_unihan/

unihan-etl 0.4.1 (2017-05-07)

  • Enable invoking tool via $ cihaidata_unihan

unihan-etl 0.4.0 (2017-05-07)

  • Major internal refactor and simplification

  • Convert to pytest assert statements

  • Convert full test suite to pytest functions and fixtures

  • Get CLI documentation up again

  • Improve test coverage

  • Lint code, remove unused imports

  • Switch license BSD -> MIT

unihan-etl 0.3.0 (2017-04-17)

  • Rebooted

  • Modernize Makefile in docs

  • Add Makefile to main project

  • Modernize package metadata to use about.py

  • Update requirements to use requirements/ folder for base, testing and doc dependencies.

  • Update sphinx theme to alabaster with new logo.

  • Update travis to use coverall

  • Update links on README to use https

  • Update travis to test up to python 3.6

  • Add support for pypy (why not)

  • Lock base dependencies

  • Add dev dependencies for isort, vulture and flake8