Conversation
tcare
commented
Apr 2, 2020
- Project name is now additionally restricted to letters and underscores
- Fix an issue where we try to load a non-existant dataset from sklearn after bootstrapping
- Added some general error handling
- Standardized arguments to short and long forms & updated README
- General code cleanup
d8a4eee to
6c9cfa4
Compare
| try: | ||
| from sklearn.datasets import load_diabetes | ||
| except ImportError as e: | ||
| print("Project has already been bootstrapped, you must provide your own data.") # NOQA: E501 |
There was a problem hiding this comment.
I wouldn't be so strong in this message. We don't know the reason of the error for sure. We just guess. I would go with something like "Failed to load diabetes dataset, perhaps the project has already ..."
The thing is that it will still rename load_diabetes into load_we_dont_call_his_name which introduces a buggy code (that we handle with this try-except). Perhaps it would make sense to move out all this load_diabetes dataset creation to a separate module (imported in this file) and exclude that module/file from "files" in replace_project_name.
There was a problem hiding this comment.
Yes, separating dataset creation is the right way to go. A quick hotfix is to call replaceprojectname on the training script to rename the specific import.
There was a problem hiding this comment.
I thought ImportError would be enough of a limited scope to avoid any weirdness. However you made me realize that this would actually be the first time that we encounter sklearn in the flow, so this could hide a dependency error. At the very least, we need to check that sklearn exists and the loading function doesn't.
Re: moving it out, if we're going to do the non-hacky fix, I feel that this should be generic enough that you don't need to rely on having a predefined dataset to export and rather can use a csv.
|
I decided to take the middle ground. CSV creation from the diabetes data has been factored out, but when the project bootstraps the CSV loading will fail with a message that they need to provide a CSV. This way, we avoid a situation where we silently use diabetes data (happened to me :)) and still make it easy to bring a CSV to use. |
|
lgtm |