Adding Perrone example for building surrogate#832

Merged

Neeratyoy merged 6 commits intodevelopfrom

transfer_learning_example

Oct 18, 2019

Contributor

Neeratyoy commented Oct 15, 2019

What does this PR implement/fix? Explain your changes.

Shows an example of Perrone et al's use of previous OpenML runs to build a surrogate model.


Adding Perrone example for building surrogate

2796b9a

Contributor Author

Neeratyoy commented Oct 15, 2019

@mfeurer might need some more textual descriptions, but I presume that the person going through this example already is aware of the paper.

Neeratyoy requested a review from mfeurer

October 15, 2019 18:18

mfeurer requested changes

View reviewed changes

examples/40_paper/2018_neurips_perrone_example.py Outdated

+a tabular format that can be used to build models.
 """
+def fetch_evaluations(run_full=False, flow_type='svm', metric = 'area_under_roc_curve'):

Collaborator

mfeurer Oct 15, 2019

Could you please remove the whitespace around the equals for metric=...?

examples/40_paper/2018_neurips_perrone_example.py Outdated

		return eval_df, task_ids, flow_id


		def create_table_from_evaluations(eval_df, flow_type='svm', run_count=np.iinfo(np.int64).max,

Collaborator

mfeurer Oct 15, 2019

Could you please put each argument into its own line?

examples/40_paper/2018_neurips_perrone_example.py Outdated



		def create_table_from_evaluations(eval_df, flow_type='svm', run_count=np.iinfo(np.int64).max,
		metric = 'area_under_roc_curve', task_ids=None):

Collaborator

mfeurer Oct 15, 2019

Same as above.

examples/40_paper/2018_neurips_perrone_example.py Outdated

+    values : list
+    '''
+    if task_ids is not None:
+        eval_df = eval_df.loc[eval_df.task_id.isin(task_ids)]

Collaborator

mfeurer Oct 15, 2019

What about eval_df.query()? I usually find this easier to read where possible.

examples/40_paper/2018_neurips_perrone_example.py Outdated

+    '''
+    if task_ids is not None:
+        eval_df = eval_df.loc[eval_df.task_id.isin(task_ids)]
+    ncols = 4 if flow_type == 'svm' else 10  # ncols determine the number of hyperparameters

Collaborator

mfeurer Oct 15, 2019

That line is duplicated in the if/else statement below.

examples/40_paper/2018_neurips_perrone_example.py Outdated Show resolved Hide resolved

examples/40_paper/2018_neurips_perrone_example.py Outdated

+    # Replacing NaNs with fixed values outside the range of the parameters
+    # given in the supplement material of the paper
+    if flow_type == 'svm':
+        eval_table.kernel.fillna("None", inplace=True)

Collaborator

mfeurer Oct 15, 2019

Could you please not use the syntax of indexing the columns of the array like they were attributes? This is likely to be removed from pandas in the future (and kind of confusing attributes of the dataframe object with the content of the dataframe).

examples/40_paper/2018_neurips_perrone_example.py Outdated

+    eval_table = impute_missing_values(eval_table, flow_type)
+    # Encode categorical variables as one-hot vectors
+    enc = OneHotEncoder(handle_unknown='ignore')
+    enc.fit(eval_table.kernel.to_numpy().reshape(-1, 1))

Collaborator

mfeurer Oct 15, 2019

Same as above.

examples/40_paper/2018_neurips_perrone_example.py Outdated



		def preprocess(eval_table, flow_type='svm'):
		eval_table = impute_missing_values(eval_table, flow_type)

Collaborator

mfeurer Oct 15, 2019

Could you please construct a scikit-learn pipeline? That could then be easily used for also predicting for new hyperparameter settings with calling a single function.

examples/40_paper/2018_neurips_perrone_example.py Outdated

+eval_df, task_ids, flow_id = fetch_evaluations(run_full=False)
+X, y = create_table_from_evaluations(eval_df, run_count=1000)
+X = preprocess(X)

Collaborator

mfeurer Oct 15, 2019

Could you please add a print statement here so that the user sees the data format?

Collaborator

mfeurer Oct 15, 2019

Also, could you please relate more to the paper? Is this the metadata for the meta-tasks in the paper?


Intermediate changes; pipeline additions remain

1a3f456

codecov-io commented Oct 16, 2019 •

edited

Loading

Codecov Report

❗ No coverage uploaded for pull request base (develop@29a023c). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             develop     #832   +/-   ##
==========================================
  Coverage           ?   89.78%           
==========================================
  Files              ?       36           
  Lines              ?     5118           
  Branches           ?        0           
==========================================
  Hits               ?     4595           
  Misses             ?      523           
  Partials           ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29a023c...9ca9d87. Read the comment docs.

Neeratyoy added 2 commits

October 17, 2019 16:29


Finishing the whole example design

cfba39d


Making pandas related changes suggested by Matthias

9ca9d87

Neeratyoy requested a review from mfeurer

October 17, 2019 14:45

mfeurer added 2 commits

October 17, 2019 19:56


minor reformatting

cd3ba29


add a print statement

f6a2a95

mfeurer approved these changes

View reviewed changes

Collaborator

mfeurer commented Oct 17, 2019

@Neeratyoy could you please have a look at my changes? Apparently, the flake8 checks only fail on the push, not the pr check, so we should be fine. Please feel free to merge if you agree with my changes.

Neeratyoy merged commit 56fa7f9 into develop

Neeratyoy deleted the transfer_learning_example branch

October 18, 2019 09:23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment