Skip to content

Commit 67d076d

Browse files
authored
Document inference w/ preprocessing (#520)
Co-authored-by: Montana Low <montana.low@gmail.com>
1 parent 8220b53 commit 67d076d

File tree

2 files changed

+10
-1
lines changed

2 files changed

+10
-1
lines changed

pgml-docs/docs/user_guides/training/preprocessing.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ There are 3 steps to preprocessing data:
2929
These preprocessing steps may be specified on a per-column basis to the [train()](/user_guides/training/overview/) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations.
3030

3131
```postgresql title="pgml.train()"
32-
select pgml.train(
32+
SELECT pgml.train(
3333
project_name => 'preprocessed_model',
3434
task => 'classification',
3535
relation_name => 'weather_data',
@@ -52,6 +52,14 @@ In some cases, it may make sense to use multiple steps for a single column. For
5252
!!! note
5353
TEXT is used in this document to also refer to VARCHAR and CHAR(N) types.
5454

55+
## Predicting with Preprocessors
56+
57+
A model that has been trained with preprocessors should use a Postgres tuple for prediction, rather than a `FLOAT4[]`. Tuples may contain multiple different types (like `TEXT` and `BIGINT`), while an ARRAY may only contain a single type. You can use parenthesis around values to create a Postgres tuple.
58+
59+
```postgresql title="pgml.predict()"
60+
SELECT pgml.predict('preprocessed_model', ('jan', 'nimbus', 0.5, 7));
61+
```
62+
5563
## Categorical encodings
5664
Encoding categorical variables is an O(N log(M)) where N is the number of rows, and M is the number of distinct categories.
5765

pgml-docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,6 +127,7 @@ nav:
127127
- Training:
128128
- Training Overview: user_guides/training/overview.md
129129
- Algorithm Selection: user_guides/training/algorithm_selection.md
130+
- Preprocessing Data: user_guides/training/preprocessing.md
130131
- Hyperparameter Search: user_guides/training/hyperparameter_search.md
131132
- Joint Optimization: user_guides/training/joint_optimization.md
132133
- Predictions:

0 commit comments

Comments
 (0)