Manage AML environments from conda YAML files#168
Manage AML environments from conda YAML files#168sudivate merged 38 commits intomicrosoft:masterfrom
Conversation
bump pip requirements versions (microsoft#104)
* Move all 3 conda files to a single dir * Do not use conda-merge * Pin package versions
| # Create a run configuration environment | ||
| conda_deps_file = "diabetes_regression/training_dependencies.yml" | ||
| conda_deps = CondaDependencies(conda_deps_file) | ||
| run_config = RunConfiguration(conda_dependencies=conda_deps) |
There was a problem hiding this comment.
Can you leverage the environments similar to diabetes_regression_build_train_pipeline.py
ml_service/pipelines/diabetes_regression_build_train_pipeline_with_r.py
Outdated
Show resolved
Hide resolved
| @@ -1,16 +0,0 @@ | |||
| computeType: AKS | |||
| autoScaler: | |||
There was a problem hiding this comment.
Why is this AKS config deleted ?
There was a problem hiding this comment.
Environment definition is moved to Python script.
AML Python SDK does not provide a way to define an environment in a YAML file, only in code (I have checked - the YAML parsing functionality is implemented directly in the AZ CLI ML extension - would be a good thing to propose to add in the SDK!)
There was a problem hiding this comment.
Please change this to CLI and config files instead of SDK to deploy model. We want to demonstrate the ease of CLI to deploy models
There was a problem hiding this comment.
We would still not have YML files anymore, and need an SDK script to generate the environment, and put the name in a json. Then we would be able to call the CLI. We would essentially remove a line from the existing script, but then have a delay while the extension gets installed every time for every build. Having a program generate input for a CLI would not seem to be a meaningful option to users.
Ideally the service itself should support such declarative environment versioning, rather than the imperative (additive) version behavior available today which is hard to use with DevOps.
Linting & UT fixed. |
|
IMO it's overcomplicated.
|
dtzar
left a comment
There was a problem hiding this comment.
Upon a closer look I agree changing up the deployment aspects makes things more complicated. Furthermore it gets away from the title of the PR "Manage AML environments from conda YAML files". I suggest you remove the deployment modifications for now and just stick to environments for PR. Let's work together on a plan for desired deployment modifications.
| @@ -0,0 +1,39 @@ | |||
| { | |||
| "name": "diabetes_regression_sklearn", | |||
| "version": null, | |||
There was a problem hiding this comment.
In Environment.load_from_directory(). Added comment to script for clarity fe35f83
There was a problem hiding this comment.
azureml_environment.json is not pointing to any conda specification file
There was a problem hiding this comment.
That is the file as generated by az ml environment scaffold.
I assume your concern is about the null values in this section of the file:
"python": {
"userManagedDependencies": false,
"interpreterPath": "python",
"condaDependenciesFile": null,
"baseCondaEnvironment": null
},
If you look at the AML SDK source code, load_from_directory() actually fills the conda_dependencies from a file named conda_dependencies.yml is one exists in the passed directory:
def load_from_directory(path):
"""Load an environment definition from the files in a directory.
:param path: Path to the source directory.
:type path: str
"""
definition_path = os.path.join(path, _DEFINITION_FILE_NAME)
if not os.path.isfile(definition_path):
raise FileNotFoundError(definition_path)
with open(definition_path, "r") as definition:
environment_dict = json.load(definition)
env = Environment._deserialize_and_add_to_object(environment_dict)
conda_file_path = os.path.join(path, _CONDA_DEPENDENCIES_FILE_NAME)
if os.path.isfile(conda_file_path):
env.python = env.python or PythonSection(_skip_defaults=True)
env.python.conda_dependencies = CondaDependencies(conda_file_path)
base_dockerfile_path = os.path.join(path, _BASE_DOCKERFILE_FILE_NAME)
if os.path.isfile(base_dockerfile_path):
with open(base_dockerfile_path, "r") as base_dockerfile:
env.docker = env.docker or DockerSection(_skip_defaults=True)
env.docker.base_dockerfile = base_dockerfile.read()
I've tested that things works as expected, as when I uncomment r-essentials from the conda file and run ml_service/pipelines/diabetes_regression_build_train_pipeline_with_r.py, training succeeds as shown by R output in the logs.
Builds on top of PR #158
Overview:
Detailed description:
Leverage Azure ML environments in order to avoid re-creating container images for the training and scoring containers (which are usually identical from one run to the next).
For DevOps scenarios, it's important to have reproducible outcomes, so that environments should be automatically updated whenever the Conda YAML file is modified. It's also important that the pipeline can run in multiple branches in parallel without interference. To enable that, this function automatically creates an environment name from a base name with a checksum of the Conda YAML file appended. If that environment already exists, it is retrieved, otherwise it is created from the Conda file.
As the environment name is now variable, the inference config is now dynamic. While we could have generated an inference config JSON file to feed the AMLDeploy task, it was simpler to just deploy the service using the Python SDK instead. This has the side benefit of not requiring the installation of the Az CLI ML extension (77 MB as it packs its own python libraries including AML SDK). (The AMLDeploy checks for the presence of the Az CLI ML and installs it if absent, which causes a slight delay. The AMLDeploy task is just a wrapper for the Az CLI ML task, its only potential benefit is in graphical DevOps pipelines).