PMML (Predictive Modeling Markup Language) is an XML-based language used to define predictive models. It was developed by the Data Mining Group (DMG), a vendor-led consortium responsible for developing data mining standards, to provide a vendor-independent method of specifying models so that proprietary issues do not become a problem when models are exchanged between different applications. In this manner, no matter which application package is used to build your predictive model, it can easily be encoded into PMML and uploaded into ADAPA® for execution.
ADAPA® is a PMML 3.2 consumer and in addition to the example model files below, we also provide a set of data files which contain the model input as well as expected results. These files are used for model validation as they execute the model with a given input and compare the computed results against the expected output.
To experiment with the examples, please follow these steps:
- Save model and data file to your local computer
- Upload a model file in the ADAPA Demo
- Validate the model by executing the respective data file
The examples listed below are based on publicly available datasets.
| Model Type | PMML Model File | Dataset | Neural Network for binary classification | Audit_NN.xml | Audit_NN.csv |
|---|---|---|
| Support Vector Machine for binary classification | Audit_SVM.xml | Audit_SVM.csv |
| Neural Network | ElNino_NN.xml | ElNino_NN.csv |
| Neural Network for multi-class classification | Iris_NN.xml | Iris_NN.csv |
| Support Vector Machine for multi-class classification | Iris_SVM.xml | Iris_SVM.csv |
| Multinomial Logistic Regression (General Regression) | Iris_MLR.xml | Iris_MLR.csv |
| Generalized Linear Model (General Regression) | Shuttle_GZLM.xml | Shuttle_GZLM.csv |
Acknowledgements:
The DMG publishes a list of PMML sample models which inspired our collection of PMML 3.2 examples presented here.
For more information on the Iris and El Nino datasets, please refer to: Asuncion, A. & Newman, D.J. (2007). UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science.
The audit dataset is available through the R rattle package. For more information on rattle, please refer to http://rattle.togaware.com.
The shuttle O-ring data is based on a number of O-ring failures for each shuttle flight preceding the Challenger disaster. The cause of the explosion was determined to be an O-ring failure in the right solid rocket booster.
The Challenger disaster has become a case study in the possible consequences of poor data analysis.
