Skip to content

Conversation

@tswast
Copy link
Collaborator

@tswast tswast commented May 16, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue b/329460931 🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 16, 2025
@tswast
Copy link
Collaborator Author

tswast commented May 16, 2025

  • Missed Index.rename. Will mark as ready once that's up.

Edit. Done!

@tswast tswast marked this pull request as ready for review May 16, 2025 22:16
@tswast tswast requested review from a team as code owners May 16, 2025 22:16
@tswast tswast requested a review from TrevorBergeron May 16, 2025 22:16
@tswast
Copy link
Collaborator Author

tswast commented May 19, 2025

Re: failures

____________________________ test_loc_list_nameless ____________________________
[gw2] linux -- Python 3.9.20 /tmpfs/src/github/python-bigquery-dataframes/.nox/system-3-9/bin/python

scalars_df_index =           bool_col                                          bytes_col  \
rowindex                                     ....999999+00:00  
8                         T             <NA>                              <NA>  

[9 rows x 13 columns]
scalars_pandas_df_index =           bool_col  ...                     timestamp_col
rowindex            ...                                  
0 ... ...  2038-01-19 03:14:17.999999+00:00
8            False  ...                              <NA>

[9 rows x 13 columns]

    def test_loc_list_nameless(scalars_df_index, scalars_pandas_df_index):
        index_list = [0, 0, 0, 5, 4, 7]
    
        bf_series = scalars_df_index.string_col.rename(None)
        bf_result = bf_series.loc[index_list]
    
        pd_series = scalars_pandas_df_index.string_col.rename(None)
        pd_result = pd_series.loc[index_list]
    
>       pd.testing.assert_series_equal(
            bf_result.to_pandas(),
            pd_result,
        )
E       AssertionError: Series.index are different
E       
E       Attribute "names" are different
E       [left]:  [None]
E       [right]: ['rowindex']

I think I need to call .copy() before I do anything inplace.

Edit: this hypothesis seems to be incorrect. I don't add any system tests for inplace, only unit tests. Instead, it was the problem identified with tuples.

)

def rename(self, name: Union[str, Sequence[str]]) -> Index:
names = [name] if isinstance(name, str) else list(name)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change to Label (Hashable) is what's breaking some MultiIndex tests. Need a better way to check if something is already a sequence / iterable and not a string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 45f9232

I think we want to support integer names, too, so instead I specifically exclude tuple from the check for hashable objects.

TrevorBergeron
TrevorBergeron previously approved these changes May 19, 2025
@tswast
Copy link
Collaborator Author

tswast commented May 19, 2025

e2e failures:

FAILED tests/system/small/ml/test_linear_model.py::test_linear_reg_model_global_explain
FAILED tests/system/small/ml/test_preprocessing.py::test_label_encoder_series_default_params
FAILED tests/system/small/ml/test_preprocessing.py::test_one_hot_encoder_save_load
FAILED tests/system/small/ml/test_preprocessing.py::test_standard_scaler_normalizes
FAILED tests/system/small/ml/test_preprocessing.py::test_label_encoder_params
FAILED tests/system/small/ml/test_preprocessing.py::test_standard_scaler_normalizeds_fit_transform
FAILED tests/system/small/ml/test_llm.py::test_llm_gemini_pro_score_params[gemini-1.5-pro-002]

All of these seem to be timeout related with BQML integration. I don't think it should have been caused by my changes.

Presubmits are still running, but the 3.9 system tests passed.

@tswast tswast enabled auto-merge (squash) May 19, 2025 23:21
@tswast tswast changed the title feat: support inplace=True in rename and rename_axis feat: support inplace=True in rename and rename_axis May 19, 2025
@tswast tswast merged commit 734cc65 into main May 30, 2025
18 of 24 checks passed
@tswast tswast deleted the b329460931-rename-inplace branch May 30, 2025 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants