Changelog

To install the unreleased unihan-etl version, see developmental releases.

uv:

$ uv add unihan-etl --prerelease allow
$ uvx --from 'unihan-etl' --prerelease allow unihan-etl

These commands install or run the latest available pre-release.

pip:

$ pip install --user --upgrade --pre unihan-etl

pipx:

$ pipx install --suffix=@next unihan-etl --pip-args '\--pre' --force

Then run unihan-etl@next.

unihan-etl 0.42.x (unreleased)

Notes on the upcoming release will go here.

unihan-etl 0.42.x continues the documentation-platform refresh started after 0.41.0. The release moves the API and pytest-plugin pages onto shared gp-sphinx components, restores first-paint theme handling, and keeps the generated CLI/API reference linked to concrete docs targets such as pytest plugin, API Reference, and CLI Reference.

What’s new

Shared gp-sphinx documentation platform (#351, #354, #355, #356)

The docs site now relies on the published gp-sphinx package stack instead of carrying local copies of the Furo theme customizations, argparse renderers, font loader, SPA navigation script, and related Sphinx extensions. That keeps unihan-etl aligned with the shared gp-libs documentation platform while preserving project-specific pieces such as the CSV/TSV lexers in docs/conf.py.

The API reference adopts the gp-sphinx autodoc styling layer: signatures render as structured API cards, Python-object cross-reference roles work in MyST pages, and generated CLI pages use the packaged argparse domain rather than repo-local extensions. The latest dependency refresh also moves theme asset building to sphinx-vite-builder, so source builds fail with clearer setup guidance and wheels carry the static assets they need.

Generated pytest plugin reference (#352, #353)

The pytest plugin page now uses the shared doc-pytest-plugin directive to present fixtures in the same shape as other gp-libs projects. The page keeps unihan-etl’s project-specific guidance for explicit fixture setup, while the generated table gives users a faster path to the quick/full UNIHAN dataset fixtures.

Fixture return types now link to real API targets instead of rendering as plain text. The unihan_etl.pytest_plugin.UnihanTestOptions alias is available at runtime and documented on the plugin page, and the type aliases used by fixture summaries are included in the Sphinx inventory.

Fixes

Theme preference no longer flashes on first paint (#357)

Light-theme loads no longer briefly flash the default dark styling before the user’s preference is applied. docs/conf.py now chains the gp-sphinx setup() callback before registering unihan-etl’s own missing-reference handler, restoring the inline theme-prevention script plus the shared spa-nav.js, copybutton bridge, and MyST lexer hooks.

Development

gp-sphinx and sibling docs packages were bumped through the 0.0.1a17 workspace releases, alongside routine uv, just, and development dependency refreshes.

Add your latest changes from PRs here

unihan-etl 0.41.0 (2026-03-21)

unihan-etl 0.41.0 improves the generated CLI documentation and makes the docs site feel more polished in daily use. Argument definitions gained stable anchors and headerlinks, help text renders more safely, and the frontend picked up self-hosted fonts, reduced layout shift, and smoother internal navigation.

Documentation

Linkable CLI argument definitions (#347)

The CLI Reference and per-command pages now give each generated argument definition its own anchor. Users can link directly to option definitions such as unihan-etl export --format, and headerlinks appear on hover so those URLs are easy to copy from the rendered page.

More robust help text rendering (#347)

Generated help output now handles glob-like patterns and underscore-heavy metavars without creating spurious reStructuredText warnings. The argparse renderer also uses semantic definition-list markup for argument metadata, making the CLI pages easier to scan and easier for Sphinx to index.

Faster, steadier documentation pages (#349)

The docs frontend now self-hosts IBM Plex fonts, preloads critical weights, and uses fallback font metrics to avoid text reflow while fonts load. Image and badge dimensions are stabilized to reduce layout shift, and internal links use a lightweight SPA-style navigation script with progressive view transitions.

unihan-etl 0.40.0 (2026-01-24)

unihan-etl 0.40.0 is the CLI redesign release. It splits the single command into explicit subcommands for exporting, downloading, field discovery, source-file discovery, and character lookup, while leaving the Python API centered on unihan_etl.core.Packager and unihan_etl.options.Options.

Breaking changes

Bare unihan-etl now shows help (#344)

Running unihan-etl with no arguments now shows the command overview instead of starting the export workflow. Scripts that relied on the old implicit behavior should call unihan-etl export explicitly.

$ unihan-etl

Now use:

$ unihan-etl export

See Migration notes for the upgrade guidance and command mapping.

What’s new

Subcommands for each user workflow (#344)

The CLI now exposes separate pages and command handlers for unihan-etl export, unihan-etl download, unihan-etl fields, unihan-etl files, and unihan-etl search. This makes common workflows discoverable from help output instead of requiring users to remember which flags change the old all-in-one command’s behavior.

Field and file discovery commands support table, JSON, and NDJSON output for both human inspection and tool-friendly consumption. Character search accepts a character, UCN, or hexadecimal codepoint and can limit output to selected fields.

Modern help and path output (#344)

Help output uses colorized examples where the running Python supports argparse theming, and generated examples use the visually distinct character for search documentation. CLI output that mentions local paths collapses the user’s home directory to ~, avoiding accidental disclosure of full home paths in copied output.

Documentation

Dedicated CLI reference pages (#344)

The CLI reference moved from one page to a command-oriented structure under CLI Reference. Each command page includes generated argparse output plus short examples for the workflow it represents.

Custom argparse documentation extensions (#344)

The release introduced a local Sphinx argparse renderer, example transformer, CLI usage lexer, argparse lexer, and argparse cross-reference roles. These extensions powered the new CLI pages until they were later moved into the shared gp-sphinx package stack.

Development

The CLI work added focused tests for command dispatch, output formats, path masking, formatter behavior, and generated Sphinx documentation.

unihan-etl 0.39.1 (2026-01-24)

unihan-etl 0.39.1 is a release-infrastructure update. It moves publishing to PyPI Trusted Publisher so releases can authenticate through GitHub Actions OIDC rather than long-lived upload tokens.

Development

PyPI Trusted Publisher (#339)

Release publishing now uses PyPI Trusted Publisher from CI. This reduces credential management risk for maintainers without changing the package’s runtime behavior.

unihan-etl 0.39.0 (2026-01-11)

unihan-etl 0.39.0 updates the bundled UNIHAN field model for Unicode 16.0 and 17.0 changes. It adds the new Tày numeric field, accepts the new symmetric kStrange property type, and removes fields that Unicode no longer publishes.

What’s new

Unicode 16.0 and 17.0 field updates (#343)

kTayNumeric is now included in the UNIHAN manifest and space-delimited list handling for Unicode 17.0 data. kStrange also accepts the Y property type introduced for symmetric ideographs in Unicode 16.0.

The deprecated kGB7 and kJa fields were removed from the manifest and quick fixture data to match Unicode 17.0. See the Unicode TR38 modification notes for the upstream field history.

Documentation

Docs deployment moved to AWS OIDC authentication, matching the token-free publishing approach used for PyPI.

Development

Justfile task runner (#340)

The repository moved from make targets to just recipes for routine development and docs tasks. Project docs and CI now call the same just entry points used locally.

unihan-etl 0.38.0 (2025-11-01)

unihan-etl 0.38.0 updates the supported Python floor and adds coverage for the newest CPython line available during the release. The package now targets Python 3.10 and later.

Breaking changes

Python 3.9 support dropped (#338)

Python 3.10 is now the minimum supported version. Python 3.9 reached end-of-life in October 2025, so the project also updated typing, Ruff, mypy, and CI settings for Python 3.10+.

Development

Python 3.14 support (#337)

Python 3.14 was added to trove classifiers and the CI test/docs matrix.

Deferred annotations (#333)

All Python modules now start with from __future__ import annotations, and Ruff’s modern-annotation rules are enabled. This keeps annotations cheaper at runtime and prepares the codebase for newer typing syntax.

unihan-etl 0.37.0 (2024-12-21)

unihan-etl 0.37.0 is a maintenance release focused on Python-version support and automated lint cleanup.

Breaking changes

Python 3.8 support dropped (#332)

Python 3.9 became the minimum supported version after Python 3.8 reached end-of-life on October 7, 2024.

Development

Python 3.9 lint modernization (#332)

Ruff automated fixes were applied against the Python 3.9 target, including preview and unsafe fixes where they matched the project’s style goals.

unihan-etl 0.36.0 (2024-11-26)

unihan-etl 0.36.0 changes the project management and build backend stack. The release moves local development to uv and package builds to hatchling.

Breaking changes

Project management moved from Poetry to uv (#329)

uv replaced Poetry for dependency management, locking, and local environment workflows.

Build backend moved from Poetry to hatchling (#329)

The package build backend moved to hatchling. Packaging metadata and lock files were updated for the new build path.

unihan-etl 0.35.0 (2024-11-25)

unihan-etl 0.35.0 aligns the parser, expansion logic, datapackage metadata, and quick fixture data with Unicode Technical Report #38 revision 37. The release adds new reading fields, updates radical-stroke handling for apostrophe forms, and removes a field that was dropped upstream.

Breaking changes

kFrequency removed (#330)

kFrequency was removed from Unihan_DictionaryLikeData, constants, and datapackage metadata to match Unicode’s field removal.

What’s new

kFanqie and kZhuang support (#330)

The UNIHAN manifest, datapackage metadata, quick fixture data, and expansion rules now include kFanqie and kZhuang. Structured exports can expand both fields instead of leaving them as opaque strings.

kRSUnicode apostrophe forms (#330)

kRSUnicode handling was updated for apostrophe-bearing radical-stroke values from Unihan_IRGSources. The change fixes parsing for forms discussed in Unicode TR38 revision 37 and the related upstream issue trail.

Documentation

Plain-text links in docs are now automatically linkified.

Development

Tests were added for simplified field expansions covering kFanqie and kZhuang, and the development stack picked up Poetry 1.8.2 plus Ruff 0.4.2 style updates.

unihan-etl 0.34.0 (2024-03-24)

unihan-etl 0.34.0 is a maintenance release for linting and developer tooling.

Development

Aggressive Ruff cleanup (#317)

Ruff 0.3.4 was applied across the codebase with automated fixes, preview rules, and unsafe fixes where accepted by the project. CI also switched to the newer ruff check . invocation.

Dependency updates (#316)

Ruff moved from 0.2.2 to 0.3.0 and Poetry moved from 1.7.1 to 1.8.1.

unihan-etl 0.33.1 (2024-02-09)

unihan-etl 0.33.1 is a documentation-only maintenance release.

Documentation

The README introduction was rewritten and the UNIHAN compatibility notes were refreshed. The 0.31.0 changelog entry also gained a link to the relevant UNIHAN release context.

unihan-etl 0.33.0 (2024-02-09)

unihan-etl 0.33.0 is a maintenance release for documentation rendering and lint policy.

Documentation

Quoted CSV example highlighting (#314)

The docs CSV lexer now handles quoted items correctly, improving rendered CSV examples.

Development

Stricter Ruff rule set (#313)

Ruff gained additional flake8-compatible rule families, including comma, builtins, and error-message checks. The resulting fixes remove more style drift before review.

unihan-etl 0.32.0 (2024-02-05)

unihan-etl 0.32.0 improves rendered data examples and adds a small typing/test pass around UNIHAN expansion data.

What’s new

CSV and TSV example highlighting (#253)

Documentation examples for CSV and TSV output now use custom Pygments lexers, making table-shaped output easier to read in the docs.

kTGHZ2013 typing and examples (#312)

The kTGHZ2013 test data and typing were tightened, including an example with multiple items and a doctest that documents the expected shape.

Development

The release added types-pygments, local Pygments stubs used at the time, and pytest-watcher configuration to avoid noisy reruns on generated *.py.*py files.

unihan-etl 0.31.0 (2024-02-04)

unihan-etl 0.31.0 is the major UNIHAN compatibility refresh from Unicode 11.0-era data to Unicode 15.1. It adds fields introduced across Unicode 13.0 through 15.1, removes fields no longer published upstream, and relaxes tests that previously assumed older fixture sizes.

Breaking changes

UNIHAN compatibility moved to 15.1 (#309)

The manifest and expansion support now match the Unicode 15.1 UNIHAN field set. Removed fields include kHKSCS, kIRGDaiKanwaZiten, kKPS0, kKPS1, kKSC0, kKSC1, kRSKangXi, kRSJapanese, kRSKanWa, kRSKorean, and the private kDefaultSortKey.

What’s new

New fields from Unicode 13.0 through 15.1 (#309)

Newly supported fields include kIRG_SSource, kIRG_UKSource, kSpoofingVariant, kTGHZ2013, kUnihanCore2020, kStrange, kAlternateTotalStrokes, kJapanese, kMojiJoho, kSMSZD2003Index, kSMSZD2003Readings, kVietnameseNumeric, and kZhuangNumeric.

Development

Pytest tracebacks were quieted, and pytest plugin fixture assertions were relaxed so field removals no longer fail tests only because fixture zip or export sizes changed.

unihan-etl 0.30.1 (2023-12-10)

unihan-etl 0.30.1 fixes a radical-stroke parsing edge case.

Fixes

Double apostrophes in kRSUnicode values (#304)

The expansion logic now loads double-apostrophe values correctly when kRSUnicode data is parsed through kRSGeneric handling.

unihan-etl 0.30.0post0 (2023-11-26)

unihan-etl 0.30.0post0 is a CI and documentation follow-up to 0.30.0.

Documentation

Docstrings for unihan_etl.core.Packager and related download helpers were cleaned up.

Development

CodeQL moved from an advanced local configuration file to GitHub’s default Python setup.

unihan-etl 0.30.0 (2023-11-26)

unihan-etl 0.30.0 is a documentation-quality release. It enables NumPy-style docstring checks and fills in public module, class, function, and test-helper docstrings across the codebase.

Documentation

NumPy-style docstrings (#303)

Ruff’s pydocstyle rules now enforce the project’s NumPy docstring convention, and existing modules were updated to satisfy that standard.

Development

The development lint suite gained pydocstyle coverage through Ruff.

unihan-etl 0.29.0 (2023-11-19)

unihan-etl 0.29.0 is a packaging and formatting maintenance release.

Development

Pytest configuration moved to pyproject.toml (#299)

Pytest settings now live with the rest of the project configuration in pyproject.toml.

Formatting moved from Black to Ruff (#302)

ruff format replaced Black as the formatter, keeping Black-compatible formatting while removing a separate formatter dependency.

Poetry dependency groups corrected (#302)

Development dependencies moved from extras to Poetry dependency groups, matching Poetry’s documented model. Python 3.12 was also added to trove classifiers, and GitHub Actions dependencies were refreshed to quiet CI warnings.

unihan-etl 0.28.1 (2023-09-02)

unihan-etl 0.28.1 fixes a field-name typo found during a repository-wide typo sweep.

Fixes

kAccountingNumeric list handling (#296)

SPACE_DELIMITED_LIST_FIELDS now spells kAccountingNumeric correctly, so the field is recognized by the expansion configuration.

Development

The typo sweep used typos, and the Ruff ERA/eradicate rule was removed because its false positives were too noisy for this codebase.

unihan-etl 0.28.0 (2023-07-22)

unihan-etl 0.28.0 finishes the pytest fixture naming cleanup started in the previous release. Fixtures now consistently use the unihan_ prefix, making the plugin safer to enable in downstream test suites with their own generic fixture names.

Breaking changes

Pytest fixtures now use the unihan_ prefix (#296)

The quick dataset fixtures were renamed from names such as quick_unihan_path, quick_unihan_options, and quick_unihan_packager to names such as unihan_quick_path, unihan_quick_options, and unihan_quick_packager. Helper fixtures followed the same convention: for example, ensure_quick_unihan became unihan_ensure_quick.

The old TestPackager fixture was removed because the quick and full dataset fixtures cover the same test setup more explicitly.

Fixes

The pytest plugin’s zshrc skip condition was corrected so zsh-specific tests run under the intended shell conditions.

unihan-etl 0.27.0 (2023-07-18)

unihan-etl 0.27.0 moves quick fixture data into the package and starts the fixture-name cleanup.

Breaking changes

Quick fixture data moved into package data (#294)

The quick dataset moved from tests/fixtures to src/unihan_etl/data_files/quick, making the files available when the package is installed rather than only inside the source checkout.

Sample fixture names became quick fixture names (#294)

Fixtures with sample_ in the name were renamed to use quick_, matching the quick/full dataset language used by the plugin.

Development

Ruff quality rules were tightened and the resulting fixes were applied.

unihan-etl 0.26.0 (2023-07-09)

unihan-etl 0.26.0 introduces the pytest plugin’s cached UNIHAN datasets.

What’s new

Cached quick and full UNIHAN fixtures (#291)

The pytest plugin can now download UNIHAN.zip once and reuse the cached archive and extracted files across test runs. This avoids repeated download/extraction work for suites that need real UNIHAN data.

unihan-etl 0.25.2 (2023-07-08)

unihan-etl 0.25.2 rolls back the previous zsh fixture condition change.

Fixes

The zshrc fixture condition from 0.25.1 was reverted because the original behavior was correct.

unihan-etl 0.25.1 (2023-07-08)

unihan-etl 0.25.1 was rolled back by 0.25.2.

Fixes

This release attempted to fix the pytest plugin’s zshrc skip condition.

unihan-etl 0.25.0 (2023-07-01)

unihan-etl 0.25.0 is a maintenance release for linting and type aliases.

Development

Ruff gained additional lint rules, and automatic plus manual fixes were applied. The release also extracted reusable typing names for log levels and export formats.

unihan-etl 0.24.0 (2023-06-24)

unihan-etl 0.24.0 is a dependency maintenance release.

Dependencies

zhon 2.0 (#289)

zhon was updated from 1.1.5 to 2.0.0, resolving pytest warnings related to regular expressions. See the zhon 2.0.0 release notes.

unihan-etl 0.23.0 (2023-06-24)

unihan-etl 0.23.0 improves the internal app-directory helper used for cache and config paths.

Breaking changes

app_dirs moved under _internal (#287)

The app-directory helper moved from unihan_etl.app_dirs to unihan_etl._internal.app_dirs. This module is documented under App directories - unihan_etl._internal.app_dirs because it is not part of the primary public API surface.

What’s new

Configurable app directories (#287)

App directory values can now be overridden on a one-off basis, and template values can expand environment variables and user-home references through the standard os.path expansion behavior.

Documentation

The internal app-directory docs gained doctest examples, and the API table-of-contents depth and section heading were cleaned up.

unihan-etl 0.22.1 (2023-06-18)

unihan-etl 0.22.1 fixes export destination path handling for the 0.22.x line.

Fixes

The configured destination path now replaces the output file extension correctly when export format changes.

unihan-etl 0.22.0 (2023-06-17)

unihan-etl 0.22.0 reorganizes the public API around clearer module names and a typed options object. It also adds the first project doctests and initial pytest plugin documentation.

Breaking changes

Processing module renamed to core (#284)

unihan_etl.process was renamed to unihan_etl.core, which is where unihan_etl.core.Packager now lives.

Configuration moved to Options (#280)

Configuration moved from dictionary-style settings to the unihan_etl.options.Options dataclass. This gives both library users and the CLI a typed configuration object.

Documentation

Doctest examples were added for README and utility behavior, initial pytest plugin docs landed, the API docs split into multiple pages, and the docs Makefile watch target was fixed.

unihan-etl 0.21.1 (2023-06-18)

unihan-etl 0.21.1 backports the destination path fix for the 0.21.x line.

Fixes

The configured destination path now replaces the output file extension correctly when export format changes.

unihan-etl 0.21.0 (2023-06-12)

unihan-etl 0.21.0 is an internal modernization release for path and callback typing.

Development

Internal file locations now use pathlib.Path, and download callback types are expressed through typed protocols.

unihan-etl 0.20.0 (2023-06-11)

unihan-etl 0.20.0 drops Python 3.7 and simplifies typing support around the standard library.

Breaking changes

Python 3.7 support dropped (#272)

Python 3.7 was dropped ahead of its June 27, 2023 end-of-life date. The project can now use standard-library TypedDict and Protocol support from Python 3.8+ rather than relying on compatibility shims.

Development

Imports now use the typing as t namespace style, and repeated type shapes were extracted into aliases where that made expansion tests and options code clearer.

unihan-etl 0.19.1 (2023-05-28)

unihan-etl 0.19.1 restores Black while the project continues its Ruff migration.

Development

Black was added back because Ruff had not yet fully replaced it for this project’s formatting needs.

unihan-etl 0.19.0 (2023-05-27)

unihan-etl 0.19.0 starts the move to Ruff-based linting and formatting.

Development

Ruff replaced the earlier Black, isort, flake8, and flake8-plugin stack for faster linting and style feedback. Poetry moved from 1.4.0 to 1.5.0, a pytest warning from zhon was addressed, and merge_dict typing was tightened.

unihan-etl 0.18.1 (2022-10-01)

unihan-etl 0.18.1 fixes packaging and CI maintenance issues after the source-layout migration.

Dependencies

PyYAML was added as a dependency.

Development

CI was split so release-only PyPI upload dependencies do not run on every test job, CodeQL was cleaned up, Poetry moved to 1.2.x, and coverage settings moved from .coveragerc into pyproject.toml.

unihan-etl 0.18.0 (2022-09-11)

unihan-etl 0.18.0 moves the package to a src/ layout and adopts gp-libs documentation helpers.

Documentation

The changelog and issue references render through gp-libs linkification helpers, autodoc table-of-contents rendering was fixed, and docs doctests began running through the gp-libs docutils doctest tooling.

Development

The source tree moved to src/, and linting gained flake8-bugbear plus flake8-comprehensions.

unihan-etl 0.17.2 (2022-08-21)

unihan-etl 0.17.2 is a documentation-linking maintenance release.

Documentation

The docs gained an updated vendored issue-linking helper, replacing the older sphinx-issues package.

unihan-etl 0.17.1 (2022-08-21)

unihan-etl 0.17.1 follows up on the strict typing work from 0.17.0.

Fixes

merge_dict() now handles the edge case where the destination key is missing, and download() handles local file paths correctly when the “download” source is already on disk.

unihan-etl 0.17.0 (2022-08-21)

unihan-etl 0.17.0 enables strict mypy validation.

Development

The codebase was annotated enough to run mypy with --strict, improving confidence in the ETL and expansion paths.

unihan-etl 0.16.0 (2022-08-20)

unihan-etl 0.16.0 adds cache-busting support and expands CI coverage for the typing work that followed.

What’s new

--no-cache (#259)

The CLI can now ignore cached zip and extracted files with --no-cache, forcing a fresh fetch/extract cycle.

Development

Python 3.8 and 3.9 were added to CI to prepare for stricter typing across supported Python versions.

unihan-etl 0.15.0 (2022-08-29)

unihan-etl 0.15.0 removes the remaining Python 2 compatibility layer.

Breaking changes

Python 2 compatibility modules and imports were removed. Python 2 had already been officially dropped in 0.12.0.

unihan-etl 0.14.0 (2022-08-16)

unihan-etl 0.14.0 modernizes supported Python versions and starts the current doctest/type-checking direction.

Breaking changes

Python 3.6 support was dropped.

What’s new

load_data now accepts lists of pathlib.Path objects in addition to lists of strings.

Documentation

The docs moved to the Furo theme, added a Quickstart page, and linked to cihai developer documentation.

Development

Python 3.10 support was added, Poetry 1.1 became the development baseline, tests moved from tmpdir to tmp_path, pyupgrade ran for Python 3.7+, and initial mypy plus doctest validation landed.

unihan-etl 0.13.0 (2021-06-16)

unihan-etl 0.13.0 converts the documentation source format.

Documentation

Documentation moved to Markdown.

unihan-etl 0.12.0 (2021-06-15)

unihan-etl 0.12.0 drops legacy Python support and refreshes packaging metadata.

Breaking changes

Python 2.7 and Python 3.5 support were removed, along with Python 2 modesets and __future__ compatibility imports.

Development

Black moved to 21.6b0 and trove classifiers were updated for Python 3.9.

unihan-etl 0.11.0 (2020-08-09)

unihan-etl 0.11.0 is a packaging and documentation infrastructure release.

Development

Packaging and publishing moved to Poetry, docs became self-hosted with additional metadata and icons, CI moved from Travis to GitHub Actions, and Makefiles were overhauled.

unihan-etl 0.10.4 (2020-08-05)

unihan-etl 0.10.4 fixes changelog links and packaging constraints.

Fixes

Changelog headings now produce working links, and the appdirs version constraint was relaxed.

Development

The project moved from Pipfile-based packaging to Poetry.

unihan-etl 0.10.3 (2019-08-18)

unihan-etl 0.10.3 fixes a visible CLI polish issue.

Fixes

The download progress bar no longer flickers during updates.

unihan-etl 0.10.2 (2019-08-17)

unihan-etl 0.10.2 is a packaging and compatibility maintenance release.

Development

The package gained project_urls, CHANGES moved back to plain reStructuredText for packaging compatibility, collection imports were made Python 2/3 compatible, and PEP 8 cleanups landed.

unihan-etl 0.10.1 (2017-09-08)

unihan-etl 0.10.1 improves package metadata and API docs.

Documentation

API docs gained code links.

Development

__version__ was added to unihan_etl.

unihan-etl 0.10.0 (2017-08-29)

unihan-etl 0.10.0 adds Unicode 11-era UNIHAN fields and refreshes the project infrastructure used to build, test, and document the package.

What’s new

UNIHAN Revision 25 fields (#91)

Support was added for fields from UNIHAN Revision 25, including kJinmeiyoKanji, kJoyoKanji, kKoreanEducationHanja, kKoreanName, and kTGH.

Documentation

Documentation moved to NumPy-style docstrings.

Development

The release added tests and example corpus data for kCCCII, configured isort and flake8, added Pipfile-based development workflows, updated developer dependencies, added sphinxcontrib-napoleon, and changed the license from BSD to MIT for future cihai software foundation contributions.

unihan-etl 0.9.5 (2017-06-26)

unihan-etl 0.9.5 improves structured expansion support for dictionary locations.

Fixes

Location parsing for kHDZRadBreak fields was improved.

unihan-etl 0.9.4 (2017-06-05)

unihan-etl 0.9.4 fixes several field-expansion edge cases.

Fixes

kIRG_GSource values without a location now expand correctly, kFenn output was fixed, and kHanyuPinlu support handles n diacritics correctly.

unihan-etl 0.9.3 (2017-05-31)

unihan-etl 0.9.3 expands support for IRG source fields.

What’s new

Expansion support was added for kIRGKangXi.

unihan-etl 0.9.2 (2017-05-31)

unihan-etl 0.9.2 normalizes more structured expansion outputs.

Fixes

Radical-stroke expansion for kRSUnicode was normalized, more field expansions moved to regular expressions, and character fields were normalized for dictionary and index fields including kDaeJaweon, kHanyuPinyin, kCheungBauer, kFennIndex, kCheungBauerIndex, kIICore, and kIRGHanyuDaZidian.

unihan-etl 0.9.1 (2017-05-27)

unihan-etl 0.9.1 adds another structured expansion and refactors parsing internals.

What’s new

Expansion support was added for kGSR, and several field expansions moved to regex-based parsing.

unihan-etl 0.9.0 (2017-05-26)

unihan-etl 0.9.0 is the rename and ETL feature release. The project moved from unihan-tabular to unihan-etl and gained the core export-shaping options that still define the tool.

What’s new

Multi-value expansion and empty-field pruning

Exports can now expand multi-value UNIHAN fields into structured values and prune empty fields from output.

Fixes

The first-run destination bug that created a directory instead of an output file was fixed.

Documentation

The docs gained a page explaining UNIHAN and the project.

Development

Constants and expansion behavior were split into dedicated modules.

unihan-etl 0.8.1 (2017-05-20)

unihan-etl 0.8.1 updates field metadata for Unicode 8.0.0.

What’s new

kJa was added and kCompatibilityVariant source-file metadata was adjusted for Unicode 8.0.0.

unihan-etl 0.8.0 (2017-05-17)

unihan-etl 0.8.0 replaces direct printing with library logging.

What’s new

Logging can now be configured through options and the CLI.

Development

Internal diagnostics moved from print() calls to loggers.

unihan-etl 0.7.4 (2017-05-14)

unihan-etl 0.7.4 improves local development and offline workflows.

What’s new

UNIHAN zip sources can now be local filesystem paths, and already-extracted archives are not extracted again unnecessarily.

unihan-etl 0.7.3 (2017-05-13)

unihan-etl 0.7.3 is a package metadata refresh.

Development

Package classifiers were updated.

unihan-etl 0.7.2 (2017-05-13)

unihan-etl 0.7.2 restores datapackage output metadata.

Fixes

The datapackage file was added back.

unihan-etl 0.7.1 (2017-05-12)

unihan-etl 0.7.1 fixes CSV behavior and defaults.

Fixes

CSV output works correctly on Python 2 again, and CSV became the default export format.

unihan-etl 0.7.0 (2017-05-12)

unihan-etl 0.7.0 improves output location handling and platform directory support.

What’s new

The project now depends on unicodecsv, supports XDG directory conventions, and allows custom destination templates using {ext} replacement.

unihan-etl 0.6.3 (2017-05-11)

unihan-etl 0.6.3 fixes package metadata placement.

Fixes

The __about__.py metadata file moved to the module level.

unihan-etl 0.6.2 (2017-05-11)

unihan-etl 0.6.2 fixes a package import issue.

Fixes

Python package imports work again.

unihan-etl 0.6.1 (2017-05-10)

unihan-etl 0.6.1 fixes the PyPI README rendering.

Fixes

The README renders correctly on PyPI.

unihan-etl 0.6.0 (2017-05-10)

unihan-etl 0.6.0 adds structured export formats.

What’s new

Exports can now be written as YAML and JSON in addition to CSV, and the library can return data as a list for Python callers.

Development

Internals were factored and simplified around the new export paths.

unihan-etl 0.5.1 (2017-05-08)

unihan-etl 0.5.1 drops old Python 3 minors.

Breaking changes

Python 3.3 and Python 3.4 support were dropped.

unihan-etl 0.5.0 (2017-05-08)

unihan-etl 0.5.0 is the rename to the unihan-tabular package and the move away from datapackage-first exports.

Breaking changes

The package was renamed from cihaidata_unihan to unihan_tabular.

What’s new

Exports now use a universal JSON, YAML, and CSV model rather than datapackages as the primary output. Python 2 CSV output only uses UnicodeWriter where necessary, avoiding byte-prefix artifacts in values.

unihan-etl 0.4.2 (2017-05-07)

unihan-etl 0.4.2 continues the early package rename work.

Development

The scripts/ directory was renamed to cihaidata_unihan/.

unihan-etl 0.4.1 (2017-05-07)

unihan-etl 0.4.1 restores the command-line entry point for the early package name.

Fixes

The tool can be invoked as cihaidata_unihan.

unihan-etl 0.4.0 (2017-05-07)

unihan-etl 0.4.0 is an early test and packaging cleanup release.

Development

The codebase received a major internal refactor, converted tests to pytest functions and fixtures, restored CLI documentation, improved test coverage, removed unused imports, and switched the license from BSD to MIT.

unihan-etl 0.3.0 (2017-04-17)

unihan-etl 0.3.0 is the project reboot release.

What’s new

The reboot modernized the root and docs Makefiles, refreshed package metadata, split requirements into base/test/docs groups, updated docs styling, moved project links to HTTPS, added Travis coverage, tested up to Python 3.6, added PyPy coverage, locked base dependencies, and added development dependencies for isort, vulture, and flake8.