Skip to content

Conversation

@stvoutsin
Copy link
Contributor

Description

This PR is a draft with a purpose of showing one possible implementation of adding VOTable 1.6 UTF-8 character encoding support in the Char converter. VOTable 1.6 changed the char datatype semantics: arraysize now specifies byte length (not character count) and explicitly supports UTF-8 encoding.

PR'ing this early for visibility and iteration, looking for feedback on design decisions and current version.

Changes

  • Modified Char converter to detect VOTable 1.6+ via version_1_6_or_later config flag
  • Use numpy object arrays (format="O") for UTF-8 fields
  • Updated binary I/O methods (_binparse_fixed, _binoutput_fixed)
  • Fixed config passing through converter initialization
  • Added UTF-8 aware truncation in _binoutput_fixed to prevent incomplete multi-byte sequences

Fixes #18515

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

@github-actions
Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@github-actions
Copy link
Contributor

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@pllim pllim added this to the v8.0.0 milestone Nov 25, 2025
@stvoutsin stvoutsin changed the title votable: Add VOTable 1.6 UTF-8 char encoding and fix UnicodeChar binary handling votable: Add support for VOTable 1.6 UTF-8 char encoding Nov 25, 2025
@stvoutsin stvoutsin force-pushed the u/stvoutsin/unicode-char-18515 branch 2 times, most recently from 4e118f2 to d8e0bf3 Compare November 25, 2025 16:54
@stvoutsin stvoutsin force-pushed the u/stvoutsin/unicode-char-18515 branch from d8e0bf3 to 99abec6 Compare November 25, 2025 16:55
@bsipocz
Copy link
Member

bsipocz commented Dec 24, 2025

Thank you @stvoutsin, I totally missed this!

I'm adding xref to ivoa-std/VOTable#71; and also adding the keep-open label, as my understanding is that we use this PR as a reference implementation but don't merge it in until the upstream decision has been made. Or maybe even until it goes through the TCG.

@pllim pllim added the Upstream Action Required Was: Upstream Fix Required label Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VOTable export should default to char datatype instead of unicodeChar for string columns

3 participants