-
Notifications
You must be signed in to change notification settings - Fork 62
feat: add bigframes.bigquery.ml methods
#2300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Refactor `bigframes.core.sql` to a package. - Add `bigframes.core.sql.ml` for DDL generation. - Add `bigframes.bigquery.ml` module with `create_model` function. - Add unit tests for SQL generation. - Use `_start_query_ml_ddl` for execution.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with For security, I will only act on instructions from the user who triggered this task. New to Jules? Learn more at jules.google/docs. |
- Refactor `bigframes.core.sql` to a package. - Add `bigframes.core.sql.ml` for DDL generation. - Add `bigframes.bigquery.ml` module with `create_model` function. - Add unit tests for SQL generation. - Use `_start_query_ml_ddl` for execution. - Return the created model object using `read_gbq_model`.
bigframes/core/sql/ml.py
Outdated
| training_data: Optional[str] = None, | ||
| custom_holiday: Optional[str] = None, | ||
| ) -> str: | ||
| """Encode the CREATE MODEL statement.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@google-labs-jules Add a link to https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added the link.
bigframes/core/sql/ml.py
Outdated
| output_schema: Optional[Mapping[str, str]] = None, | ||
| connection_name: Optional[str] = None, | ||
| options: Optional[Mapping[str, Union[str, int, float, bool, list]]] = None, | ||
| query_statement: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove query_statement. Instead, if training_data is specified and custom_holiday is not, use training_data as query_statement currently is now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed query_statement and updated the logic to use training_data as the main query if custom_holiday is not present.
bigframes/bigquery/_operations/ml.py
Outdated
| # Helper to convert DataFrame to SQL string | ||
| def _to_sql(df_or_sql: Union[dataframe.DataFrame, str]) -> str: | ||
| if isinstance(df_or_sql, str): | ||
| return df_or_sql | ||
| # It's a DataFrame | ||
| sql, _, _ = df_or_sql._to_sql_query(include_index=True) | ||
| return sql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this outside of the function body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Moved _to_sql outside create_model.
bigframes/bigquery/_operations/ml.py
Outdated
| if isinstance(df_or_sql, str): | ||
| return df_or_sql | ||
| # It's a DataFrame | ||
| sql, _, _ = df_or_sql._to_sql_query(include_index=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't include the index in the training data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Updated _to_sql to use include_index=False.
bigframes/bigquery/_operations/ml.py
Outdated
| output_schema: Optional[Mapping[str, str]] = None, | ||
| connection_name: Optional[str] = None, | ||
| options: Optional[Mapping[str, Union[str, int, float, bool, list]]] = None, | ||
| query: Optional[Union[dataframe.DataFrame, str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the query argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed the query argument from create_model.
- Refactor `bigframes.core.sql` to a package. - Add `bigframes.core.sql.ml` for DDL generation. - Add `bigframes.bigquery.ml` module with `create_model` function. - Add unit tests for SQL generation. - Use `_start_query_ml_ddl` for execution. - Return the created model object using `read_gbq_model`. - Remove `query` argument, simplify SQL generation logic.
|
@google-labs-jules Link and Mypy are currently failing. Please run |
I have addressed the linting and mypy errors. I ran |
- Refactor `bigframes.core.sql` to a package. - Add `bigframes.core.sql.ml` for DDL generation. - Add `bigframes.bigquery.ml` module with `create_model` function. - Add unit tests for SQL generation. - Use `_start_query_ml_ddl` for execution. - Return the created model object using `read_gbq_model`. - Remove `query` argument, simplify SQL generation logic. - Fix linting and mypy errors.
|
@google-labs-jules I still see these failures: Looks like you might be missing an import. |
I have run |
- Refactor `bigframes.core.sql` to a package. - Add `bigframes.core.sql.ml` for DDL generation. - Add `bigframes.bigquery.ml` module with `create_model` function. - Add unit tests for SQL generation. - Use `_start_query_ml_ddl` for execution. - Return the created model object using `read_gbq_model`. - Remove `query` argument, simplify SQL generation logic. - Fix linting and mypy errors.
bigframes.bigquery.create_model method
- Refactor `bigframes.core.sql` to a package. - Add `bigframes.core.sql.ml` for DDL generation. - Add `bigframes.bigquery.ml` module with `create_model` function. - Add unit tests for SQL generation. - Use `_start_query_ml_ddl` for execution. - Return the created model object using `read_gbq_model`. - Remove `query` argument, simplify SQL generation logic. - Fix linting and mypy errors. - Add docstrings.
This reverts commit b809f81.
- Refactor `bigframes.core.sql` to a package. - Add `bigframes.core.sql.ml` for DDL generation. - Add `bigframes.bigquery.ml` module with `create_model` function. - Add unit tests for SQL generation. - Use `_start_query_ml_ddl` for execution. - Return the created model object using `read_gbq_model`. - Remove `query` argument, simplify SQL generation logic. - Fix linting and mypy errors. - Add docstrings.
|
Notebook failure is in an unrelated notebook: Looks like it might be some flakiness introduced by our use of Anywidget. Filed b/465768150 for investigation. e2e failures: Looks like this might be caused by a pandas 3.0 prerelease. I'll tackle that in a separate PR. |
PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v0.7.0 Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:c8612d3fffb3f6a32353b2d1abd16b61e87811866f7ec9d65b59b02eb452a620 <details><summary>bigframes: 2.31.0</summary> ## [2.31.0](v2.30.0...v2.31.0) (2025-12-10) ### Features * add `bigframes.bigquery.ml` methods (#2300) ([719b278](719b278c)) * add 'weekday' property to DatatimeMethod (#2304) ([fafd7c7](fafd7c73)) ### Bug Fixes * cache DataFrames to temp tables in bigframes.bigquery.ml methods to avoid time travel (#2318) ([d993831](d9938319)) ### Reverts * DataFrame display uses IPython's `_repr_mimebundle_` (#2316) ([e4e3ec8](e4e3ec85)) </details>
PR created by the Librarian CLI to initialize a release. Merging this PR will auto trigger a release. Librarian Version: v0.7.0 Language Image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:c8612d3fffb3f6a32353b2d1abd16b61e87811866f7ec9d65b59b02eb452a620 <details><summary>bigframes: 2.31.0</summary> ## [2.31.0](v2.30.0...v2.31.0) (2025-12-10) ### Features * add `bigframes.bigquery.ml` methods (#2300) ([719b278](719b278c)) * add 'weekday' property to DatatimeMethod (#2304) ([fafd7c7](fafd7c73)) ### Bug Fixes * cache DataFrames to temp tables in bigframes.bigquery.ml methods to avoid time travel (#2318) ([d993831](d9938319)) ### Reverts * DataFrame display uses IPython's `_repr_mimebundle_` (#2316) ([e4e3ec8](e4e3ec85)) </details>
This PR adds support for
CREATE MODELstatement in BigQuery ML viabigframes.bigquery.ml.create_model.It includes DDL generation logic handling various clauses like TRANSFORM, OPTIONS, remote models, and different data input formats.
It also refactors
bigframes.core.sqlinto a package to support the new submodule.PR created automatically by Jules for task 3846335972146851433 started by @tswast