Imagine that a used car dealership chain has asked us to help them modernize their infrastructure, improve their business processes, and increase their profitability. The car dealership has provided us with an anonymized sample of used car data. For this code challenge, your goal is to investigate these data and come up with at least one data science service that we could offer to the dealership.
The purpose of this code challenge is to assess:
- Your ability to translate business problems into data science questions.
- Your use of the Python data science stack to work with and manipulate data.
- Your application of data science tools and methods to provide solutions to business problems.
- Your ability to communicate your work to different audiences.
Your task is to come up with at least one data science service or offering to pitch to the client.
For example, your data science solution might improve business operations, increase profitability, or reduce the time it takes to sell a vehicle.
To complete this task you should:
- Use a Python and a Jupyter notebook for your analysis and presentation.
- Load the anonymized sample of used car data into a Pandas dataframe and perform any necessary pre-processing steps.
- Conduct an exploratory data analysis and look for interesting patterns and stories. Be sure to support your findings with plots, figures, and graphics using Matplotlib, Seaborn, or another data visualization package of your choice.
- Identify a business problem that where data science may offer a solution. Implement data science methods and/or build a prototype to solve this business problem.
- Create a short slide deck that communicates your findings and service or offering to the client.
As you work, consider how you might productionize your data science solution or offering if the used car dealership contracted us to perform this work. Keep in mind that you do not need to actually productionize your work for this code challenge, but be prepared to talk about how you might implement your data science service or offering.
Keep note of any challenges or blockers you encounter as you work and be ready to discuss your development work flow, including things you tried and lessons you've learned as you completed the code challenge.
Finally, be creative and have fun! 😃
Please submit the following deliverables to complete this code challenge:
-
A well-documented Jupyter notebook.
This notebook should include your Python code for pre-processing, manipulating, and exploring the data provided for this challenge. Use Pandas for this task. Please also include your code for developing any machine learning models, figures and graphics, if applicable.
-
Documentation that sufficiently describes your development workflow.
This documentation should include the requirements and libraries you used to create your data science service or offering. Using your documentation, we should be able to execute the code in your Jupyter notebook and reproduce your output.
-
A short slide deck that communicates your work to the client.
Your slide deck may be delivered in Microsoft PowerPoint, Google Slides, or other software of your choice. This presentation should clearly define your data science question and how it relates to the business problem you identified. The slide deck should also showcase your service or offering and answer your data science question.
Please limit your presentation to no more than 5 slides.
Please archive all of your deliverables into a single compressed file and save the file using the file naming convention shown below. Please substitute your first name for FIRST_NAME, last name for LAST_NAME, and the year (YY), month (MM), and date (DD) when you submitted your code challenge.
FIRST_NAME-LAST_NAME-YYMMDD-data-science-code-challenge
Please submit your solution within 1 week (7 days) of receiving the code challenge. If you require more time, drop us a line at DataScienceU@u.group. Let us know why you need more time, and how much more time you'll need to complete the challenge.
Please send your completed code challenge to DataScienceU@u.group.
The data you will use for this code challenge is a historical snapshot of vehicles that have been sold by a used car dealership across multiple stores.
| Column Title | Column Description |
|---|---|
| VIN | Vehicle Identification Number (Unique vehicle ID) |
| CarYear | Model year of vehicle |
| Color | Color of vehicle |
| VehBody | Body type |
| EngineType | Engine size in cylinders |
| Make | Model name |
| Miles | Odometer reading at time of inventory |
| SaleType | (R) for resale, (N) for new vehicle |
| Odometer | Accuracy of odometer reading |
| Brand | Vehicle make |
| VehType | Passenger, Truck, or Motorcycle |
| LocationNum | ID number of store that acquired vehicle |
| CarType | Vehicle segment type |
| IsDomestic | 0 = Foreign make, 1 = American make |
| EngineLiters | Engine displacement in Liters |
| Country | Country of manufacture |
| FuelType | Fuel type of vehicle |
| SeatingCapacity | Seating capacity of vehicle |
| Transmission | Transmission type of vehicle |
| Status | Status of vehicle |
| SaleLoc | ID number of dealership that sold vehicle |
| ZIP | Zip code of dealership that sold vehicle |
| SellVal | Final sale price of vehicle |
| PurchVal | Purchase price of vehicle |
| ServiceVal | Cost to service the vehicle (on-site repairs) |
| ReconVal | Cost to recondition the vehicle (off site repairs) |
| LotTime | Days vehicle spent on lot before sale |
Please send all questions about the code challenge to DataScienceU@u.group.