## Validation results

This is a validation report for model Carcinogenicity prediction with Ensemble of Classifier Chains.

## General information

The model was validated with a 10-times repeated 10-fold cross-validation.

### Performance measures

measure | full-name | synonyms | description | details |
---|---|---|---|---|

accuracy | correct predictions / all predictions | |||

auc | area under (the roc) curve | probability that the classifier ranks a compound with class active higher than with class inactive | to compute auc, the predictions are ranked according to confidences given by the classifier for each prediction, i.e. first the compounds with high confidence for class active, than the compounds the classifier is unsure about, than the compounds with high confidence for class inactive | |

sensitivity | recall, true positive rate | correctly predicted active compounds / all compounds that are really active | ||

specificity | true negative rate | correctly predicted inactive compounds / all compounds that are really inactive | ||

ppv | positive predictive value | precision, selectivity | correctly predicted active compounds / all compounds that are predicted as active | ppv is the probability that a active prediction is correct |

npv | negative predictive value | correctly predicted inactive compounds / all compounds that are predicted as inactive | ppv is the probability that a inactive prediction is correct | |

subset-accuracy | number of test compounds with all endpoints predicted correctly / number of all test compounds | |||

inside-ad | number of test compounds inside the applicability domain / number of all test compounds |

### Probability that a prediction is correct

When applying the model to an unseen compound, the performance measures ppv and npv give a probability estimate that the prediction is correct. The confidence of the prediction is taken into account to make the probability estimate more accurate. Therefore, ppv and npv have been computed for different confidence levels.

## Average performance over all endpoints

The average measures have been computed as the mean of all single-endpoint measures, these measures are so-called 'macro'-measures (Exception: subset-accuracy is computed using all endpoints). Each endpoint is weighted equally.

accuracy | auc | sensitivity | specificity | ppv | npv | subset-accuracy | inside-ad |
---|---|---|---|---|---|---|---|

0.666 | 0.749 | 0.642 | 0.704 | 0.689 | 0.663 | 0.482 | 0.978 |

## Single endpoint validation

### activityoutcome-cpdbas-singlecellcall

The endpoint activityoutcome-cpdbas-singlecellcall is 706 x active, 799 x inactive and 3 x missing in the training dataset. In each cross-validation 178.4 (of all 1505 non-missing compounds) were predicted with high confidence (>66%), 583.9 with medium confidence (>33%) and 739.1 with low confidence (<33%).

model confidence | accuracy | auc | sensitivity | specificity | ppv | npv | inside-ad |
---|---|---|---|---|---|---|---|

all predictions (ignoring confidence) | 67.402 | 73.566 | 67.451 | 67.454 | 70.233 | 64.609 | 99.761 |

predictions with high confidence (>66%) | 87.977 | 88.485 | 91.082 | 79.139 | 92.849 | 72.982 | 99.823 |

predictions with medium confidence (>33%) | 72.305 | 75.345 | 72.051 | 72.683 | 72.383 | 72.294 | 99.583 |

predictions with low confidence (<33%) | 58.671 | 61.616 | 55.289 | 62.185 | 59.579 | 58.037 | 99.876 |

### activityoutcome-cpdbas-rat

The endpoint activityoutcome-cpdbas-rat is 618 x active, 580 x inactive and 310 x missing in the training dataset. In each cross-validation 104.2 (of all 1198 non-missing compounds) were predicted with high confidence (>66%), 434.1 with medium confidence (>33%) and 655.1 with low confidence (<33%).

model confidence | accuracy | auc | sensitivity | specificity | ppv | npv | inside-ad |
---|---|---|---|---|---|---|---|

all predictions (ignoring confidence) | 65.1 | 71.201 | 57.297 | 72.574 | 66.383 | 64.376 | 99.62 |

predictions with high confidence (>66%) | 86.862 | 87.887 | 86.906 | 86.995 | 95.067 | 70.62 | 99.84 |

predictions with medium confidence (>33%) | 73.387 | 73.694 | 58.148 | 85.023 | 74.579 | 72.99 | 99.559 |

predictions with low confidence (<33%) | 56.231 | 58.765 | 49.951 | 62.288 | 55.274 | 57.171 | 99.622 |

### activityoutcome-cpdbas-multicellcall

The endpoint activityoutcome-cpdbas-multicellcall is 540 x active, 580 x inactive and 388 x missing in the training dataset. In each cross-validation 148.5 (of all 1120 non-missing compounds) were predicted with high confidence (>66%), 436.5 with medium confidence (>33%) and 532 with low confidence (<33%).

model confidence | accuracy | auc | sensitivity | specificity | ppv | npv | inside-ad |
---|---|---|---|---|---|---|---|

all predictions (ignoring confidence) | 69.864 | 76.561 | 68.528 | 71.339 | 71.972 | 67.887 | 99.729 |

predictions with high confidence (>66%) | 92.002 | 91.705 | 92.904 | 87.033 | 96.556 | 80.873 | 99.443 |

predictions with medium confidence (>33%) | 74.421 | 76.628 | 70.708 | 77.836 | 71.197 | 77.245 | 99.568 |

predictions with low confidence (<33%) | 60.028 | 62.977 | 57.075 | 62.76 | 61.767 | 58.294 | 99.945 |

### activityoutcome-cpdbas-mouse

The endpoint activityoutcome-cpdbas-mouse is 536 x active, 444 x inactive and 528 x missing in the training dataset. In each cross-validation 82 (of all 980 non-missing compounds) were predicted with high confidence (>66%), 391 with medium confidence (>33%) and 502.5 with low confidence (<33%).

model confidence | accuracy | auc | sensitivity | specificity | ppv | npv | inside-ad |
---|---|---|---|---|---|---|---|

all predictions (ignoring confidence) | 66.229 | 72.283 | 56.275 | 74.677 | 64.901 | 67.175 | 99.537 |

predictions with high confidence (>66%) | 83.969 | 79.041 | 65.662 | 94.669 | 85.824 | 83.778 | 98.448 |

predictions with medium confidence (>33%) | 74.84 | 76.736 | 61.986 | 84.43 | 74.844 | 74.937 | 99.327 |

predictions with low confidence (<33%) | 56.637 | 59.42 | 51.686 | 61.487 | 56.902 | 56.328 | 99.905 |

### activityoutcome-cpdbas-mutagenicity

The endpoint activityoutcome-cpdbas-mutagenicity is 448 x active, 402 x inactive and 658 x missing in the training dataset. In each cross-validation 137.3 (of all 850 non-missing compounds) were predicted with high confidence (>66%), 354.5 with medium confidence (>33%) and 353.8 with low confidence (<33%).

model confidence | accuracy | auc | sensitivity | specificity | ppv | npv | inside-ad |
---|---|---|---|---|---|---|---|

all predictions (ignoring confidence) | 76.449 | 84.075 | 69.099 | 83.384 | 78.92 | 74.813 | 99.486 |

predictions with high confidence (>66%) | 91.873 | 92.386 | 91.326 | 92.762 | 94.773 | 88.528 | 100 |

predictions with medium confidence (>33%) | 84.265 | 86.059 | 75.37 | 91.044 | 85.886 | 83.237 | 99.101 |

predictions with low confidence (<33%) | 62.508 | 68.374 | 53.402 | 71.832 | 63.969 | 62.001 | 99.663 |

### activityoutcome-cpdbas-hamster

The endpoint activityoutcome-cpdbas-hamster is 41 x active, 45 x inactive and 1422 x missing in the training dataset. In each cross-validation 18.5 (of all 86 non-missing compounds) were predicted with high confidence (>66%), 27.2 with medium confidence (>33%) and 35.2 with low confidence (<33%).

model confidence | accuracy | auc | sensitivity | specificity | ppv | npv | inside-ad |
---|---|---|---|---|---|---|---|

all predictions (ignoring confidence) | 69.654 | 80.676 | 75.129 | 66.219 | 72.624 | 69.611 | 94.305 |

predictions with high confidence (>66%) | 91.661 | 86.581 | 94.977 | 80.68 | 92.009 | 92.636 | 95.582 |

predictions with medium confidence (>33%) | 79.007 | 85.645 | 84.062 | 75.476 | 80.38 | 81.439 | 95.842 |

predictions with low confidence (<33%) | 50.348 | 52.161 | 44.646 | 55.102 | 46.631 | 55.019 | 91.912 |

### activityoutcome-cpdbas-dog-primates

The endpoint activityoutcome-cpdbas-dog-primates is 17 x active, 15 x inactive and 1476 x missing in the training dataset. In each cross-validation 0.2 (of all 32 non-missing compounds) were predicted with high confidence (>66%), 5.4 with medium confidence (>33%) and 24 with low confidence (<33%).

model confidence | accuracy | auc | sensitivity | specificity | ppv | npv | inside-ad |
---|---|---|---|---|---|---|---|

all predictions (ignoring confidence) | 50.966 | 59.935 | 53.437 | 52.244 | 54.182 | 52.286 | 92.222 |

predictions with high confidence (>66%) | 0 | ? | 0 | 0 | 0 | 0 | 100 |

predictions with medium confidence (>33%) | 64.035 | 75 | 71.154 | 51.852 | 69.753 | 53.125 | 95 |

predictions with low confidence (<33%) | 46.784 | 58.153 | 49.775 | 50 | 48.214 | 51.157 | 91.879 |