Decision Tree Analysis
Explain the use of classification analysis in data science.
Sprockets Corporation designs high-end, specialty machine parts for a variety of industries. You have been hired by Sprockets to assist them with their data analysis needs. Sprockets Corporation management would like more insight to their sales data. You have been directed to perform a decision tree analysis between the sale price and the related product features to provide a different insight into the predictive power of these variables. Use two separate packages for your analysis: the Python programming language and Python module: docclass.py from “programming collective intelligence” and the R Programming language and the rpart function.
John Sprocket, CEO has asked you to prepare a presentation for the leadership team showcasing your decision tree models using R programming language and rpart function, and Python programming language and Python based module. You will also include an executive summary including all source code, results and supplemental information necessary for the leadership team.
- With the R programming rpart function, generate a decision tree model using the sales price PRICEEACH as your dependent variable and consider the SALES, QUANTITYORDERED and PRICEEACH as features.
- With the docclass.py Python based module and the Python language, generate a decision tree using PRICEEACH as your dependent variable and also include CITY, COUNTRY, DEALSIZE and CUSTOMER added to your existing features of SALES, QUANTITYORDERED AND PRICEEACH. Note that the docclass.py module will accept string values as input for the decision tree.
- In both libraries, prune the tree appropriately in order to support a concise description that can lead to actionable results.
Please see Scoring Guide Attachment and example data sets are linked below!