Skip to content

Conversation

@chelsea-lin
Copy link
Contributor

@chelsea-lin chelsea-lin commented Mar 5, 2025

We initially implemented a local pandas extension (db_dtypes.JSONType) for handling JSON data. Subsequently, the Arrow project introduced a native JSON data type in pyarrow after v19.0. We've opted to adopt this native type as our primary solution (see go/bf-json2 for internal design document). To ensure compatibility for users with older pyarrow versions, we've been using a custom Arrow extension as a fallback. This change transitions to using this custom Arrow extension as a stepping stone towards fully integrating the native pyarrow JSON type.

Release-As: 1.40.0

  • Fixes internal issue 401054811
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 401054811🦕

@chelsea-lin chelsea-lin requested review from a team as code owners March 5, 2025 23:04
@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Mar 5, 2025
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Mar 5, 2025
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_jsonarrowtypeonly branch from 7f8d18c to 5dcbdc0 Compare March 6, 2025 00:06
@GarrettWu GarrettWu removed their assignment Mar 6, 2025
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_jsonarrowtypeonly branch from 5dcbdc0 to 94ef33b Compare March 6, 2025 06:12
@tswast
Copy link
Collaborator

tswast commented Mar 6, 2025

Getting some test failures in presubmit:

FAILED tests/system/small/test_dataframe.py::test_df_drop_duplicates_w_json[first]
FAILED tests/system/small/test_dataframe.py::test_df_drop_duplicates_w_json[last]
FAILED tests/system/small/test_dataframe.py::test_df_drop_duplicates_w_json[False]
3 failed, 2715 passed, 16 skipped, 43 xfailed, 2 xpassed, 418 warnings in 1083.21s (0:18:03)

Also, could we make sure we add a Release-As: footer to our final commit message to make sure this doesn't trigger the 2.0 release? See: https://github.com/googleapis/release-please/blob/main/README.md#how-do-i-change-the-version-number

@chelsea-lin chelsea-lin force-pushed the main_chelsealin_jsonarrowtypeonly branch from 94ef33b to 5a2baa6 Compare March 6, 2025 23:55
@chelsea-lin chelsea-lin force-pushed the main_chelsealin_jsonarrowtypeonly branch from 65770f3 to a2edcbf Compare March 7, 2025 00:08
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One suggestion, but otherwise looks good.

GEO_DTYPE = gpd.array.GeometryDtype()
# JSON
JSON_DTYPE = db_dtypes.JSONDtype()
JSON_DTYPE = pd.ArrowDtype(db_dtypes.JSONArrowType())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we switch to pd.ArrowDtype(pyarrow.json_(pyarrow.string())) if pyarrow.json_ is available?

Also, would be good to make sure we align with OBJ_REF_DTYPE by creating a JSON_ARROW_TYPE variable to use here and there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will follow up with b/401055693. Thanks for reviewing!

@tswast tswast enabled auto-merge (squash) March 10, 2025 21:16
@tswast tswast merged commit e720f41 into main Mar 11, 2025
22 of 23 checks passed
@tswast tswast deleted the main_chelsealin_jsonarrowtypeonly branch March 11, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants