diff --git a/pgml-cms/docs/SUMMARY.md b/pgml-cms/docs/SUMMARY.md index 84e656fcb..bfc9ef6a1 100644 --- a/pgml-cms/docs/SUMMARY.md +++ b/pgml-cms/docs/SUMMARY.md @@ -36,7 +36,7 @@ * [pgml.tune()](introduction/apis/sql-extensions/pgml.tune.md) * [Client SDKs](introduction/apis/client-sdks/README.md) * [Overview](introduction/apis/client-sdks/getting-started.md) - * [Collections](../../pgml-docs/docs/guides/sdks/collections.md) + * [Collections](introduction/apis/client-sdks/collections.md) * [Pipelines](introduction/apis/client-sdks/pipelines.md) * [Search](introduction/apis/client-sdks/search.md) * [Tutorials](introduction/apis/client-sdks/tutorials/README.md) diff --git a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md index e24dabf05..e5c52f793 100644 --- a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md +++ b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.deploy.md @@ -26,11 +26,11 @@ pgml.deploy( There are 3 different deployment strategies available: -| Strategy | Description | -| ------------- | --------------------------------------------------------------------------------------------------------------------- | -| `most_recent` | The most recently trained model for this project is immediately deployed, regardless of metrics. | -| `best_score` | The model that achieved the best key metric score is immediately deployed. | -| `rollback` | The model that was last deployed for this project is immediately redeployed, overriding the currently deployed model. | +| Strategy | Description | +| ------------- |--------------------------------------------------------------------------------------------------| +| `most_recent` | The most recently trained model for this project is immediately deployed, regardless of metrics. | +| `best_score` | The model that achieved the best key metric score is immediately deployed. | +| `rollback` | The model that was deployed before to the current one is deployed. | The default deployment behavior allows any algorithm to qualify. It's automatically used during training, but can be manually executed as well: @@ -40,11 +40,12 @@ The default deployment behavior allows any algorithm to qualify. It's automatica #### SQL -
SELECT * FROM pgml.deploy(
- 'Handwritten Digit Image Classifier',
+```sql
+SELECT * FROM pgml.deploy(
+ 'Handwritten Digit Image Classifier',
strategy => 'best_score'
);
-
+```
#### Output
@@ -121,3 +122,22 @@ SELECT * FROM pgml.deploy(
Handwritten Digit Image Classifier | rollback | xgboost
(1 row)
```
+
+### Specific Model IDs
+
+In the case you need to deploy an exact model that is not the `most_recent` or `best_score`, you may deploy a model by id. Model id's can be found in the `pgml.models` table.
+
+#### SQL
+
+```sql
+SELECT * FROM pgml.deploy(12);
+```
+
+#### Output
+
+```sql
+ project | strategy | algorithm
+------------------------------------+----------+-----------
+ Handwritten Digit Image Classifier | specific | xgboost
+(1 row)
+```
diff --git a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md
index 8d4aeb222..3362c99bd 100644
--- a/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md
+++ b/pgml-cms/docs/introduction/apis/sql-extensions/pgml.train/data-pre-processing.md
@@ -25,11 +25,11 @@ In this example:
There are 3 steps to preprocessing data:
-* [Encoding](data-pre-processing.md#ordinal-encoding) categorical values into quantitative values
-* [Imputing](data-pre-processing.md#imputing-missing-values) NULL values to some quantitative value
-* [Scaling](data-pre-processing.md#scaling-values) quantitative values across all variables to similar ranges
+* [Encoding](../../../../../../pgml-dashboard/content/docs/training/preprocessing.md#categorical-encodings) categorical values into quantitative values
+* [Imputing](../../../../../../pgml-dashboard/content/docs/training/preprocessing.md#imputing-missing-values) NULL values to some quantitative value
+* [Scaling](../../../../../../pgml-dashboard/content/docs/training/preprocessing.md#scaling-values) quantitative values across all variables to similar ranges
-These preprocessing steps may be specified on a per-column basis to the [train()](./) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations.
+These preprocessing steps may be specified on a per-column basis to the [train()](../../../../../../docs/training/overview/) function. By default, PostgresML does minimal preprocessing on training data, and will raise an error during analysis if NULL values are encountered without a preprocessor. All types other than `TEXT` are treated as quantitative variables and cast to floating point representations before passing them to the underlying algorithm implementations.
```sql
SELECT pgml.train(
diff --git a/pgml-cms/docs/resources/developer-docs/contributing.md b/pgml-cms/docs/resources/developer-docs/contributing.md
index 38688dc26..3648acbe3 100644
--- a/pgml-cms/docs/resources/developer-docs/contributing.md
+++ b/pgml-cms/docs/resources/developer-docs/contributing.md
@@ -67,7 +67,7 @@ Once there, you can initialize `pgrx` and get going:
#### Pgrx command line and environments
```commandline
-cargo install cargo-pgrx --version "0.9.8" --locked && \
+cargo install cargo-pgrx --version "0.11.2" --locked && \
cargo pgrx init # This will take a few minutes
```
diff --git a/pgml-cms/docs/resources/developer-docs/installation.md b/pgml-cms/docs/resources/developer-docs/installation.md
index 990cec5a8..119080bf2 100644
--- a/pgml-cms/docs/resources/developer-docs/installation.md
+++ b/pgml-cms/docs/resources/developer-docs/installation.md
@@ -36,7 +36,7 @@ brew bundle
PostgresML is written in Rust, so you'll need to install the latest compiler from [rust-lang.org](https://rust-lang.org). Additionally, we use the Rust PostgreSQL extension framework `pgrx`, which requires some initialization steps:
```bash
-cargo install cargo-pgrx --version 0.9.8 && \
+cargo install cargo-pgrx --version 0.11.2 && \
cargo pgrx init
```
@@ -63,8 +63,7 @@ To install the necessary Python packages into a virtual environment, use the `vi
```bash
virtualenv pgml-venv && \
source pgml-venv/bin/activate && \
-pip install -r requirements.txt && \
-pip install -r requirements-xformers.txt --no-dependencies
+pip install -r requirements.txt
```
{% endtab %}
@@ -146,7 +145,7 @@ pgml_test=# SELECT pgml.version();
We like and use pgvector a lot, as documented in our blog posts and examples, to store and search embeddings. You can install pgvector from source pretty easily:
```bash
-git clone --branch v0.4.4 https://github.com/pgvector/pgvector && \
+git clone --branch v0.5.0 https://github.com/pgvector/pgvector && \
cd pgvector && \
echo "trusted = true" >> vector.control && \
make && \
@@ -288,7 +287,7 @@ We use the `pgrx` Postgres Rust extension framework, which comes with its own in
```bash
cd pgml-extension && \
-cargo install cargo-pgrx --version 0.9.8 && \
+cargo install cargo-pgrx --version 0.11.2 && \
cargo pgrx init
```
diff --git a/pgml-docs/docs/guides/sdks/collections.md b/pgml-docs/docs/guides/sdks/collections.md
deleted file mode 100644
index 2ebc415d5..000000000
--- a/pgml-docs/docs/guides/sdks/collections.md
+++ /dev/null
@@ -1,349 +0,0 @@
-# Collections
-
-Collections are the organizational building blocks of the SDK. They manage all documents and related chunks, embeddings, tsvectors, and pipelines.
-
-## Creating Collections
-
-By default, collections will read and write to the database specified by `DATABASE_URL` environment variable.
-
-### **Default `DATABASE_URL`**
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const collection = pgml.newCollection("test_collection")
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-collection = Collection("test_collection")
-```
-{% endtab %}
-{% endtabs %}
-
-### **Custom DATABASE\_URL**
-
-Create a Collection that reads from a different database than that set by the environment variable `DATABASE_URL`.
-
-{% tabs %}
-{% tab title="Javascript" %}
-```javascript
-const collection = pgml.newCollection("test_collection", CUSTOM_DATABASE_URL)
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-collection = Collection("test_collection", CUSTOM_DATABASE_URL)
-```
-{% endtab %}
-{% endtabs %}
-
-## Upserting Documents
-
-Documents are dictionaries with two required keys: `id` and `text`. All other keys/value pairs are stored as metadata for the document.
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const documents = [
- {
- id: "Document One",
- text: "document one contents...",
- random_key: "this will be metadata for the document",
- },
- {
- id: "Document Two",
- text: "document two contents...",
- random_key: "this will be metadata for the document",
- },
-];
-await collection.upsert_documents(documents);
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-documents = [
- {
- "id": "Document 1",
- "text": "Here are the contents of Document 1",
- "random_key": "this will be metadata for the document"
- },
- {
- "id": "Document 2",
- "text": "Here are the contents of Document 2",
- "random_key": "this will be metadata for the document"
- }
-]
-collection = Collection("test_collection")
-await collection.upsert_documents(documents)
-```
-{% endtab %}
-{% endtabs %}
-
-Document metadata can be replaced by upserting the document without the `text` key.
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const documents = [
- {
- id: "Document One",
- random_key: "this will be NEW metadata for the document",
- },
- {
- id: "Document Two",
- random_key: "this will be NEW metadata for the document",
- },
-];
-await collection.upsert_documents(documents);
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-documents = [
- {
- "id": "Document 1",
- "random_key": "this will be NEW metadata for the document"
- },
- {
- "id": "Document 2",
- "random_key": "this will be NEW metadata for the document"
- }
-]
-collection = Collection("test_collection")
-await collection.upsert_documents(documents)
-```
-{% endtab %}
-{% endtabs %}
-
-Document metadata can be merged with new metadata by upserting the document without the `text` key and specifying the merge option.
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const documents = [
- {
- id: "Document One",
- text: "document one contents...",
- },
- {
- id: "Document Two",
- text: "document two contents...",
- },
-];
-await collection.upsert_documents(documents, {
- metdata: {
- merge: true
- }
-});
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-documents = [
- {
- "id": "Document 1",
- "random_key": "this will be NEW merged metadata for the document"
- },
- {
- "id": "Document 2",
- "random_key": "this will be NEW merged metadata for the document"
- }
-]
-collection = Collection("test_collection")
-await collection.upsert_documents(documents, {
- "metadata": {
- "merge": True
- }
-})
-```
-{% endtab %}
-{% endtabs %}
-
-## Getting Documents
-
-Documents can be retrieved using the `get_documents` method on the collection object.
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const collection = Collection("test_collection")
-const documents = await collection.get_documents({limit: 100 })
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-collection = Collection("test_collection")
-documents = await collection.get_documents({ "limit": 100 })
-```
-{% endtab %}
-{% endtabs %}
-
-### Paginating Documents
-
-The SDK supports limit-offset pagination and keyset pagination.
-
-#### Limit-Offset Pagination
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const collection = pgml.newCollection("test_collection")
-const documents = await collection.get_documents({ limit: 100, offset: 10 })
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-collection = Collection("test_collection")
-documents = await collection.get_documents({ "limit": 100, "offset": 10 })
-```
-{% endtab %}
-{% endtabs %}
-
-#### Keyset Pagination
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const collection = Collection("test_collection")
-const documents = await collection.get_documents({ limit: 100, last_row_id: 10 })
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-collection = Collection("test_collection")
-documents = await collection.get_documents({ "limit": 100, "last_row_id": 10 })
-```
-{% endtab %}
-{% endtabs %}
-
-The `last_row_id` can be taken from the `row_id` field in the returned document's dictionary.
-
-### Filtering Documents
-
-Metadata and full text filtering are supported just like they are in vector recall.
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const collection = pgml.newCollection("test_collection")
-const documents = await collection.get_documents({
- limit: 100,
- offset: 10,
- filter: {
- metadata: {
- id: {
- $eq: 1
- }
- },
- full_text_search: {
- configuration: "english",
- text: "Some full text query"
- }
- }
-})
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-collection = Collection("test_collection")
-documents = await collection.get_documents({
- "limit": 100,
- "offset": 10,
- "filter": {
- "metadata": {
- "id": {
- "$eq": 1
- }
- },
- "full_text_search": {
- "configuration": "english",
- "text": "Some full text query"
- }
- }
-})
-```
-{% endtab %}
-{% endtabs %}
-
-### Sorting Documents
-
-Documents can be sorted on any metadata key. Note that this does not currently work well with Keyset based pagination. If paginating and sorting, use Limit-Offset based pagination.
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const collection = pgml.newCollection("test_collection")
-const documents = await collection.get_documents({
- limit: 100,
- offset: 10,
- order_by: {
- id: "desc"
- }
-})
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-collection = Collection("test_collection")
-documents = await collection.get_documents({
- "limit": 100,
- "offset": 10,
- "order_by": {
- "id": "desc"
- }
-})
-```
-{% endtab %}
-{% endtabs %}
-
-### Deleting Documents
-
-Documents can be deleted with the `delete_documents` method on the collection object.
-
-Metadata and full text filtering are supported just like they are in vector recall.
-
-{% tabs %}
-{% tab title="JavaScript" %}
-```javascript
-const collection = pgml.newCollection("test_collection")
-const documents = await collection.delete_documents({
- metadata: {
- id: {
- $eq: 1
- }
- },
- full_text_search: {
- configuration: "english",
- text: "Some full text query"
- }
-})
-```
-{% endtab %}
-
-{% tab title="Python" %}
-```python
-documents = await collection.delete_documents({
- "metadata": {
- "id": {
- "$eq": 1
- }
- },
- "full_text_search": {
- "configuration": "english",
- "text": "Some full text query"
- }
-})
-```
-{% endtab %}
-{% endtabs %}