This document provides a technical overview of the Feast feature store system, covering its core architecture, key components, and main workflows. It introduces the primary code entities and their interactions to help developers understand how Feast operates as a unified system for managing and serving machine learning features.
For detailed information about specific subsystems:
Sources: README.md26-38 sdk/python/feast/feature_store.py100-109
Feast (Feature Store) is an open-source feature store for machine learning that manages the lifecycle of features from definition to serving. At its core, Feast provides infrastructure to:
Entity, FeatureView, DataSource)RegistryOfflineStore for trainingOnlineStore for low-latency inferenceFeatureStore class or dedicated feature serversThe system is designed to be modular and extensible, supporting multiple storage backends, data sources, and deployment patterns through a plugin architecture.
Sources: README.md29-38 sdk/python/feast/feature_store.py100-115
The following diagram illustrates the primary code entities in Feast and their relationships:
Sources: sdk/python/feast/feature_store.py100-203 sdk/python/feast/repo_config.py193-256 sdk/python/feast/infra/provider.py49-63 sdk/python/feast/infra/passthrough_provider.py58-130
The FeatureStore class (sdk/python/feast/feature_store.py100-189) is the primary interface for all Feast operations. It acts as a facade that:
feature_store.yaml via RepoConfigRegistry implementation based on configurationProvider instance to orchestrate infrastructure operationsKey attributes:
config: A RepoConfig instance containing all configurationrepo_path: Path to the feature repository_registry: The registry implementation (file, SQL, or remote)_provider: The provider implementation (typically PassthroughProvider)Sources: sdk/python/feast/feature_store.py100-203
The RepoConfig class (sdk/python/feast/repo_config.py193-557) defines the structure of feature_store.yaml and handles:
The configuration uses Pydantic models with validation to ensure correctness at load time. The feature_store.yaml file is typically located at the repository root and loaded automatically when instantiating a FeatureStore.
Sources: sdk/python/feast/repo_config.py193-557
The registry stores all metadata about feature definitions, including entities, feature views, data sources, and feature services. The abstract BaseRegistry interface defines the contract, with multiple implementations:
| Registry Type | Class | Storage Backend |
|---|---|---|
| File | Registry | Local filesystem or cloud object storage (S3, GCS) |
| SQL | SqlRegistry | PostgreSQL, MySQL, or other SQL databases |
| Snowflake | SnowflakeRegistry | Snowflake table |
| Remote | RemoteRegistry | Remote registry server via gRPC |
The registry provides methods like:
apply_entity(), apply_feature_view() - Register objectsget_entity(), get_feature_view() - Retrieve objectslist_entities(), list_feature_views() - List all objectsrefresh() - Update cached registry stateFor production deployments, the SQL registry is recommended for its transactional guarantees and concurrent-write safety.
Sources: sdk/python/feast/feature_store.py155-177 sdk/python/feast/repo_config.py39-44 docs/how-to-guides/running-feast-in-production.md34-37
The Provider interface (sdk/python/feast/infra/provider.py49-105) abstracts infrastructure operations. The primary implementation is PassthroughProvider (sdk/python/feast/infra/passthrough_provider.py58-130), which delegates to:
The provider pattern allows Feast to support cloud-specific optimizations (GCP, AWS, Azure) while maintaining a consistent interface.
Sources: sdk/python/feast/infra/provider.py49-105 sdk/python/feast/infra/passthrough_provider.py58-175
The following diagram shows the relationships between feature definition objects:
Sources: sdk/python/feast/entity.py sdk/python/feast/feature_view.py sdk/python/feast/on_demand_feature_view.py sdk/python/feast/data_source.py sdk/python/feast/feature_service.py
An Entity defines a join key used to retrieve features. It specifies:
name: Unique identifierjoin_keys: List of column names used for joining (typically one, e.g., ["driver_id"])value_type: Data type of the join keydescription: Human-readable descriptionEntities are referenced by feature views to establish relationships between features and the objects they describe.
Sources: README.md145-147 docs/getting-started/quickstart.md145-147
A FeatureView (sdk/python/feast/feature_view.py) is the core abstraction for defining features. It includes:
name: Unique identifierentities: List of entity referencesschema: List of Field objects defining feature names and typessource: The DataSource containing raw datattl: Time-to-live for features in the online storeonline: Boolean indicating whether to materialize to online storeSubclasses include:
StreamFeatureView: For real-time streaming featuresBatchFeatureView: For features with time-window aggregationsSources: README.md162-182 docs/getting-started/quickstart.md162-182
An OnDemandFeatureView (sdk/python/feast/on_demand_feature_view.py) defines features computed on-the-fly during retrieval using:
sources: Dictionary mapping names to FeatureView or RequestSource objectsschema: Output feature schematransformation: Python function or Pandas UDF for computing featureswrite_to_online_store: Boolean to materialize computed featuresOn-demand transformations enable feature engineering logic to be version-controlled and reused across training and serving.
Sources: README.md184-214 docs/getting-started/quickstart.md184-214
DataSource (sdk/python/feast/data_source.py) is an abstract base class representing where raw feature data resides. Implementations include:
| DataSource | Module | Description |
|---|---|---|
FileSource | file.py | Parquet/CSV files (local or cloud storage) |
BigQuerySource | bigquery_source.py | Google BigQuery tables |
SnowflakeSource | snowflake_source.py | Snowflake tables |
RedshiftSource | redshift_source.py | AWS Redshift tables |
KafkaSource | kafka_source.py | Kafka topics (streaming) |
KinesisSource | kinesis_source.py | AWS Kinesis streams |
PushSource | push_source.py | Push API for real-time ingestion |
Each data source specifies:
timestamp_field: Column containing event timestampscreated_timestamp_column: Optional column for data versioningfield_mapping: Optional mapping to rename columnsSources: sdk/python/feast/data_source.py README.md152-157
A FeatureService (sdk/python/feast/feature_service.py) groups features for a specific model or use case:
name: Unique identifierfeatures: List of feature references (e.g., "driver_hourly_stats:conv_rate")description: Purpose of the servicetags: Key-value metadataFeature services enable versioning of feature sets and facilitate model-centric feature tracking.
Sources: README.md216-231 docs/getting-started/quickstart.md216-231
The registration workflow validates and stores feature definitions in the registry:
Key Files:
feast apply command invokes apply_total() (sdk/python/feast/repo_operations.py399-416)parse_repo() (sdk/python/feast/repo_operations.py114-220) discovers Python objectsFeatureStore.apply() (sdk/python/feast/feature_store.py821-1005) performs validation and registrationPassthroughProvider.update_infra() (sdk/python/feast/infra/passthrough_provider.py142-174) creates online store tablesThe workflow includes:
Sources: sdk/python/feast/repo_operations.py399-416 sdk/python/feast/feature_store.py821-1005 sdk/python/feast/infra/passthrough_provider.py142-174
Materialization copies features from the offline store to the online store for low-latency serving:
Key Components:
FeatureStore.materialize() (sdk/python/feast/feature_store.py1289-1376)Provider.materialize_single_feature_view() (sdk/python/feast/infra/provider.py222-246)LocalComputeEngine (in-process, default)LambdaComputeEngine (AWS serverless)SnowflakeComputeEngine (uses Snowflake compute)OfflineStore.pull_latest_from_table_or_query() gets the most recent feature valuesOnlineStore.online_write_batch() writes features indexed by entity keysThe materialization process:
Sources: sdk/python/feast/feature_store.py1289-1376 sdk/python/feast/infra/passthrough_provider.py311-404 docs/getting-started/concepts/data-ingestion.md
Retrieves point-in-time correct features for model training:
Key Methods:
FeatureStore.get_historical_features() (sdk/python/feast/feature_store.py1738-1871)get_historical_features():
.to_df() or .to_arrow() is calledThe point-in-time join logic:
RetrievalJob that can be executed to get resultsSources: sdk/python/feast/feature_store.py1738-1871 sdk/python/feast/infra/offline_stores/bigquery.py235-340 sdk/python/feast/infra/offline_stores/offline_utils.py168-350
Retrieves features from the online store for real-time inference:
Key Components:
FeatureStore.get_online_features() (sdk/python/feast/feature_store.py1469-1580)get_online_features():
serialize_entity_key() converts entity values to a consistent binary formatThe retrieval process:
Sources: sdk/python/feast/feature_store.py1469-1580 sdk/python/feast/infra/online_stores/sqlite.py178-289 sdk/python/feast/infra/passthrough_provider.py239-258
The codebase follows a standard Python package structure:
feast/
├── sdk/python/feast/ # Core Python SDK
│ ├── feature_store.py # Main FeatureStore class
│ ├── repo_config.py # Configuration
│ ├── infra/ # Infrastructure implementations
│ │ ├── provider.py # Provider interface
│ │ ├── offline_stores/ # Offline store plugins
│ │ ├── online_stores/ # Online store plugins
│ │ ├── compute_engines/ # Compute engine plugins
│ │ └── registry/ # Registry implementations
│ ├── entity.py # Entity definition
│ ├── feature_view.py # FeatureView definition
│ └── data_source.py # DataSource base class
├── protos/ # Protobuf definitions
├── infra/ # Infrastructure-related code
│ ├── feast-operator/ # Kubernetes operator
│ └── charts/ # Helm charts
├── docs/ # Documentation
└── examples/ # Example repositories
Sources: sdk/python/feast/
The Feast CLI provides commands for all major workflows:
| Command | Function | Implementation |
|---|---|---|
feast init | Bootstrap new repository | sdk/python/feast/repo_operations.py448-503 |
feast apply | Register feature definitions | sdk/python/feast/repo_operations.py399-416 |
feast materialize | Load features to online store | CLI wrapper around FeatureStore.materialize() |
feast materialize-incremental | Incremental materialization | CLI wrapper with automatic date range |
feast ui | Launch Web UI | Starts React development server |
feast serve | Launch feature server | Starts FastAPI server for HTTP/gRPC serving |
CLI commands are implemented in sdk/python/feast/cli.py and delegate to functions in sdk/python/feast/repo_operations.py
Sources: docs/reference/feast-cli-commands.md1-16 sdk/python/feast/repo_operations.py399-503
A typical Feast deployment uses a feature repository: a directory containing:
feature_store.yaml: Configuration file*.py: Python files with feature definitionsdata/: Optional local data files.feastignore: Files to exclude from discoveryThe parse_repo() function (sdk/python/feast/repo_operations.py114-220) scans the repository to discover feature objects:
.feastignore to determine exclusion patterns.py files in the repositoryEntity, FeatureView, DataSource, and FeatureService instancesRepoContents object with all discovered objectsThis pattern enables version control of feature definitions alongside model code.
Sources: sdk/python/feast/repo_operations.py114-220 docs/getting-started/quickstart.md76-103
Feast uses a comprehensive type system to ensure consistency across different storage backends:
Key Files:
Conversion Functions:
python_type_to_feast_value_type(): Infers Feast types from Python typespython_values_to_proto_values(): Serializes Python values to protobuffeast_value_type_to_python_type(): Deserializes protobuf to Pythonbq_to_feast_value_type(), pa_to_feast_value_type(): Database-specific conversionsThe type system ensures:
Sources: sdk/python/feast/value_type.py sdk/python/feast/types.py sdk/python/feast/type_map.py protos/feast/types/Value.proto
Feast is designed to be extensible through a plugin architecture:
To add a custom offline store:
OfflineStore (sdk/python/feast/infra/offline_stores/offline_store.py)get_historical_features(): Point-in-time join logicpull_latest_from_table_or_query(): Latest value retrieval for materializationFeastConfigBaseModelOFFLINE_STORE_CLASS_FOR_TYPE or use the full class path in feature_store.yamlExample offline stores: BigQuery (sdk/python/feast/infra/offline_stores/bigquery.py), Snowflake, Redshift, DuckDB, Spark.
Sources: sdk/python/feast/infra/offline_stores/offline_store.py docs/how-to-guides/customizing-feast/adding-a-new-offline-store.md
To add a custom online store:
OnlineStore (sdk/python/feast/infra/online_stores/online_store.py)online_write_batch(): Write featuresonline_read(): Read features by entity keysupdate(): Create/update tablesONLINE_STORE_CLASS_FOR_TYPE or specify full class pathExample online stores: Redis, DynamoDB, SQLite (sdk/python/feast/infra/online_stores/sqlite.py), PostgreSQL, Cassandra.
Sources: sdk/python/feast/infra/online_stores/online_store.py sdk/python/feast/infra/online_stores/sqlite.py docs/how-to-guides/customizing-feast/adding-support-for-a-new-online-store.md
The Provider interface (sdk/python/feast/infra/provider.py49-105) can be extended for cloud-specific optimizations. Most users can use PassthroughProvider, but custom providers enable:
Sources: sdk/python/feast/infra/provider.py49-105 docs/how-to-guides/customizing-feast/creating-a-custom-provider.md
The ComputeEngine interface enables different materialization strategies:
LocalComputeEngine: In-process (default)LambdaComputeEngine: AWS Lambda for serverless materializationSnowflakeComputeEngine: Uses Snowflake compute resourcesBytewaxComputeEngine: Kubernetes-based streamingCustom compute engines can optimize for specific infrastructure requirements.
Sources: sdk/python/feast/infra/passthrough_provider.py92-129 sdk/python/feast/repo_config.py46-53
Feast provides a complete feature store system built around several key principles:
FeatureStore classThe system is designed to work with existing data infrastructure while providing a consistent interface for feature management across the ML lifecycle.
Sources: README.md29-38 sdk/python/feast/feature_store.py100-203 docs/getting-started/quickstart.md1-60
Refresh this wiki
This wiki was recently refreshed. Please wait 4 days to refresh again.