PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and accurate evaluation of a large variety of classifiers.
Installation
- PyCM 2.4 is the last version to support Python 2.7 & Python 3.4
- Plotting capability requires Matplotlib (>= 3.0.0) or Seaborn (>= 0.9.1)
Source code
- Download Version 3.2 or Latest Source
- Run
pip install -r requirements.txt
orpip3 install -r requirements.txt
(Need root access) - Run
python3 setup.py install
orpython setup.py install
(Need root access)
PyPI
- Check Python Packaging User Guide
- Run
pip install pycm==3.2
orpip3 install pycm==3.2
(Need root access)
Conda
- Check Conda Managing Package
conda install -c sepandhaghighi pycm
(Need root access)
Easy install
- Run
easy_install --upgrade pycm
(Need root access)
MATLAB
- Download and install MATLAB (>=8.5, 64/32 bit)
- Download and install Python3.x (>=3.5, 64/32 bit)
- Select
Add to PATH
option - Select
Install pip
option
- Select
- Run
pip install pycm
orpip3 install pycm
(Need root access) - Configure Python interpreter
>> pyversion PYTHON_EXECUTABLE_FULL_PATH
- Visit MATLAB Examples
Docker
- Run
docker pull sepandhaghighi/pycm
(Need root access) - Configuration :
- Ubuntu 16.04
- Python 3.6
Usage
From vector
>>> from pycm import * >>> y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2] # or y_actu = numpy.array([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]) >>> y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2] # or y_pred = numpy.array([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]) >>> cm = ConfusionMatrix(actual_vector=y_actu, predict_vector=y_pred) # Create CM From Data >>> cm.classes [0, 1, 2] >>> cm.table {0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}} >>> print(cm) Predict 0 1 2 Actual 0 3 0 0 1 0 1 2 2 2 1 3 Overall Statistics : 95% CI (0.30439,0.86228) ACC Macro 0.72222 ARI 0.09206 AUNP 0.66667 AUNU 0.69444 Bangdiwala B 0.37255 Bennett S 0.375 CBA 0.47778 CSI 0.17778 Chi-Squared 6.6 Chi-Squared DF 4 Conditional Entropy 0.95915 Cramer V 0.5244 Cross Entropy 1.59352 F1 Macro 0.56515 F1 Micro 0.58333 FNR Macro 0.38889 FNR Micro 0.41667 FPR Macro 0.22222 FPR Micro 0.20833 Gwet AC1 0.38931 Hamming Loss 0.41667 Joint Entropy 2.45915 KL Divergence 0.09352 Kappa 0.35484 Kappa 95% CI (-0.07708,0.78675) Kappa No Prevalence 0.16667 Kappa Standard Error 0.22036 Kappa Unbiased 0.34426 Krippendorff Alpha 0.37158 Lambda A 0.16667 Lambda B 0.42857 Mutual Information 0.52421 NIR 0.5 Overall ACC 0.58333 Overall CEN 0.46381 Overall J (1.225,0.40833) Overall MCC 0.36667 Overall MCEN 0.51894 Overall RACC 0.35417 Overall RACCU 0.36458 P-Value 0.38721 PPV Macro 0.56667 PPV Micro 0.58333 Pearson C 0.59568 Phi-Squared 0.55 RCI 0.34947 RR 4.0 Reference Entropy 1.5 Response Entropy 1.48336 SOA1(Landis & Koch) Fair SOA2(Fleiss) Poor SOA3(Altman) Fair SOA4(Cicchetti) Poor SOA5(Cramer) Relatively Strong SOA6(Matthews) Weak Scott PI 0.34426 Standard Error 0.14232 TNR Macro 0.77778 TNR Micro 0.79167 TPR Macro 0.61111 TPR Micro 0.58333 Zero-one Loss 5 Class Statistics : Classes 0 1 2 ACC(Accuracy) 0.83333 0.75 0.58333 AGF(Adjusted F-score) 0.9136 0.53995 0.5516 AGM(Adjusted geometric mean) 0.83729 0.692 0.60712 AM(Difference between automatic and manual classification) 2 -1 -1 AUC(Area under the ROC curve) 0.88889 0.61111 0.58333 AUCI(AUC value interpretation) Very Good Fair Poor AUPR(Area under the PR curve) 0.8 0.41667 0.55 BCD(Bray-Curtis dissimilarity) 0.08333 0.04167 0.04167 BM(Informedness or bookmaker informedness) 0.77778 0.22222 0.16667 CEN(Confusion entropy) 0.25 0.49658 0.60442 DOR(Diagnostic odds ratio) None 4.0 2.0 DP(Discriminant power) None 0.33193 0.16597 DPI(Discriminant power interpretation) None Poor Poor ERR(Error rate) 0.16667 0.25 0.41667 F0.5(F0.5 score) 0.65217 0.45455 0.57692 F1(F1 score - harmonic mean of precision and sensitivity) 0.75 0.4 0.54545 F2(F2 score) 0.88235 0.35714 0.51724 FDR(False discovery rate) 0.4 0.5 0.4 FN(False negative/miss/type 2 error) 0 2 3 FNR(Miss rate or false negative rate) 0.0 0.66667 0.5 FOR(False omission rate) 0.0 0.2 0.42857 FP(False positive/type 1 error/false alarm) 2 1 2 FPR(Fall-out or false positive rate) 0.22222 0.11111 0.33333 G(G-measure geometric mean of precision and sensitivity) 0.7746 0.40825 0.54772 GI(Gini index) 0.77778 0.22222 0.16667 GM(G-mean geometric mean of specificity and sensitivity) 0.88192 0.54433 0.57735 IBA(Index of balanced accuracy) 0.95062 0.13169 0.27778 ICSI(Individual classification success index) 0.6 -0.16667 0.1 IS(Information score) 1.26303 1.0 0.26303 J(Jaccard index) 0.6 0.25 0.375 LS(Lift score) 2.4 2.0 1.2 MCC(Matthews correlation coefficient) 0.68313 0.2582 0.16903 MCCI(Matthews correlation coefficient interpretation) Moderate Negligible Negligible MCEN(Modified confusion entropy) 0.26439 0.5 0.6875 MK(Markedness) 0.6 0.3 0.17143 N(Condition negative) 9 9 6 NLR(Negative likelihood ratio) 0.0 0.75 0.75 NLRI(Negative likelihood ratio interpretation) Good Negligible Negligible NPV(Negative predictive value) 1.0 0.8 0.57143 OC(Overlap coefficient) 1.0 0.5 0.6 OOC(Otsuka-Ochiai coefficient) 0.7746 0.40825 0.54772 OP(Optimized precision) 0.70833 0.29545 0.44048 P(Condition positive or support) 3 3 6 PLR(Positive likelihood ratio) 4.5 3.0 1.5 PLRI(Positive likelihood ratio interpretation) Poor Poor Poor POP(Population) 12 12 12 PPV(Precision or positive predictive value) 0.6 0.5 0.6 PRE(Prevalence) 0.25 0.25 0.5 Q(Yule Q - coefficient of colligation) None 0.6 0.33333 QI(Yule Q interpretation) None Moderate Weak RACC(Random accuracy) 0.10417 0.04167 0.20833 RACCU(Random accuracy unbiased) 0.11111 0.0434 0.21007 TN(True negative/correct rejection) 7 8 4 TNR(Specificity or true negative rate) 0.77778 0.88889 0.66667 TON(Test outcome negative) 7 10 7 TOP(Test outcome positive) 5 2 5 TP(True positive/hit) 3 1 3 TPR(Sensitivity, recall, hit rate, or true positive rate) 1.0 0.33333 0.5 Y(Youden index) 0.77778 0.22222 0.16667 dInd(Distance index) 0.22222 0.67586 0.60093 sInd(Similarity index) 0.84287 0.52209 0.57508 >>> cm.print_matrix() Predict 0 1 2 Actual 0 3 0 0 1 0 1 2 2 2 1 3 >>> cm.print_normalized_matrix() Predict 0 1 2 Actual 0 1.0 0.0 0.0 1 0.0 0.33333 0.66667 2 0.33333 0.16667 0.5 >>> cm.print_matrix(one_vs_all=True,class_name=0) # One-Vs-All, new in version 1.4 Predict 0 ~ Actual 0 3 0 ~ 2 7 >>> cm = ConfusionMatrix(y_actu, y_pred, classes=[1,0,2]) # classes, new in version 3.2 >>> cm.print_matrix() Predict 1 0 2 Actual 1 1 0 2 0 0 3 0 2 1 2 3 >>> cm = ConfusionMatrix(y_actu, y_pred, classes=[1,0,4]) # classes, new in version 3.2 >>> cm.print_matrix() Predict 1 0 4 Actual 1 1 0 0 0 0 3 0 4 0 0 0
Direct CM
>>> from pycm import * >>> cm2 = ConfusionMatrix(matrix={"Class1": {"Class1": 1, "Class2":2}, "Class2": {"Class1": 0, "Class2": 5}}) # Create CM Directly >>> cm2 pycm.ConfusionMatrix(classes: ['Class1', 'Class2']) >>> print(cm2) Predict Class1 Class2 Actual Class1 1 2 Class2 0 5 Overall Statistics : 95% CI (0.44994,1.05006) ACC Macro 0.75 ARI 0.17241 AUNP 0.66667 AUNU 0.66667 Bangdiwala B 0.68421 Bennett S 0.5 CBA 0.52381 CSI 0.52381 Chi-Squared 1.90476 Chi-Squared DF 1 Conditional Entropy 0.34436 Cramer V 0.48795 Cross Entropy 1.2454 F1 Macro 0.66667 F1 Micro 0.75 FNR Macro 0.33333 FNR Micro 0.25 FPR Macro 0.33333 FPR Micro 0.25 Gwet AC1 0.6 Hamming Loss 0.25 Joint Entropy 1.29879 KL Divergence 0.29097 Kappa 0.38462 Kappa 95% CI (-0.354,1.12323) Kappa No Prevalence 0.5 Kappa Standard Error 0.37684 Kappa Unbiased 0.33333 Krippendorff Alpha 0.375 Lambda A 0.33333 Lambda B 0.0 Mutual Information 0.1992 NIR 0.625 Overall ACC 0.75 Overall CEN 0.44812 Overall J (1.04762,0.52381) Overall MCC 0.48795 Overall MCEN 0.29904 Overall RACC 0.59375 Overall RACCU 0.625 P-Value 0.36974 PPV Macro 0.85714 PPV Micro 0.75 Pearson C 0.43853 Phi-Squared 0.2381 RCI 0.20871 RR 4.0 Reference Entropy 0.95443 Response Entropy 0.54356 SOA1(Landis & Koch) Fair SOA2(Fleiss) Poor SOA3(Altman) Fair SOA4(Cicchetti) Poor SOA5(Cramer) Relatively Strong SOA6(Matthews) Weak Scott PI 0.33333 Standard Error 0.15309 TNR Macro 0.66667 TNR Micro 0.75 TPR Macro 0.66667 TPR Micro 0.75 Zero-one Loss 2 Class Statistics : Classes Class1 Class2 ACC(Accuracy) 0.75 0.75 AGF(Adjusted F-score) 0.53979 0.81325 AGM(Adjusted geometric mean) 0.73991 0.5108 AM(Difference between automatic and manual classification) -2 2 AUC(Area under the ROC curve) 0.66667 0.66667 AUCI(AUC value interpretation) Fair Fair AUPR(Area under the PR curve) 0.66667 0.85714 BCD(Bray-Curtis dissimilarity) 0.125 0.125 BM(Informedness or bookmaker informedness) 0.33333 0.33333 CEN(Confusion entropy) 0.5 0.43083 DOR(Diagnostic odds ratio) None None DP(Discriminant power) None None DPI(Discriminant power interpretation) None None ERR(Error rate) 0.25 0.25 F0.5(F0.5 score) 0.71429 0.75758 F1(F1 score - harmonic mean of precision and sensitivity) 0.5 0.83333 F2(F2 score) 0.38462 0.92593 FDR(False discovery rate) 0.0 0.28571 FN(False negative/miss/type 2 error) 2 0 FNR(Miss rate or false negative rate) 0.66667 0.0 FOR(False omission rate) 0.28571 0.0 FP(False positive/type 1 error/false alarm) 0 2 FPR(Fall-out or false positive rate) 0.0 0.66667 G(G-measure geometric mean of precision and sensitivity) 0.57735 0.84515 GI(Gini index) 0.33333 0.33333 GM(G-mean geometric mean of specificity and sensitivity) 0.57735 0.57735 IBA(Index of balanced accuracy) 0.11111 0.55556 ICSI(Individual classification success index) 0.33333 0.71429 IS(Information score) 1.41504 0.19265 J(Jaccard index) 0.33333 0.71429 LS(Lift score) 2.66667 1.14286 MCC(Matthews correlation coefficient) 0.48795 0.48795 MCCI(Matthews correlation coefficient interpretation) Weak Weak MCEN(Modified confusion entropy) 0.38998 0.51639 MK(Markedness) 0.71429 0.71429 N(Condition negative) 5 3 NLR(Negative likelihood ratio) 0.66667 0.0 NLRI(Negative likelihood ratio interpretation) Negligible Good NPV(Negative predictive value) 0.71429 1.0 OC(Overlap coefficient) 1.0 1.0 OOC(Otsuka-Ochiai coefficient) 0.57735 0.84515 OP(Optimized precision) 0.25 0.25 P(Condition positive or support) 3 5 PLR(Positive likelihood ratio) None 1.5 PLRI(Positive likelihood ratio interpretation) None Poor POP(Population) 8 8 PPV(Precision or positive predictive value) 1.0 0.71429 PRE(Prevalence) 0.375 0.625 Q(Yule Q - coefficient of colligation) None None QI(Yule Q interpretation) None None RACC(Random accuracy) 0.04688 0.54688 RACCU(Random accuracy unbiased) 0.0625 0.5625 TN(True negative/correct rejection) 5 1 TNR(Specificity or true negative rate) 1.0 0.33333 TON(Test outcome negative) 7 1 TOP(Test outcome positive) 1 7 TP(True positive/hit) 1 5 TPR(Sensitivity, recall, hit rate, or true positive rate) 0.33333 1.0 Y(Youden index) 0.33333 0.33333 dInd(Distance index) 0.66667 0.66667 sInd(Similarity index) 0.5286 0.5286 >>> cm2.stat(summary=True) Overall Statistics : ACC Macro 0.75 F1 Macro 0.66667 FPR Macro 0.33333 Kappa 0.38462 Overall ACC 0.75 PPV Macro 0.85714 SOA1(Landis & Koch) Fair TPR Macro 0.66667 Zero-one Loss 2 Class Statistics : Classes Class1 Class2 ACC(Accuracy) 0.75 0.75 AUC(Area under the ROC curve) 0.66667 0.66667 AUCI(AUC value interpretation) Fair Fair F1(F1 score - harmonic mean of precision and sensitivity) 0.5 0.83333 FN(False negative/miss/type 2 error) 2 0 FP(False positive/type 1 error/false alarm) 0 2 FPR(Fall-out or false positive rate) 0.0 0.66667 N(Condition negative) 5 3 P(Condition positive or support) 3 5 POP(Population) 8 8 PPV(Precision or positive predictive value) 1.0 0.71429 TN(True negative/correct rejection) 5 1 TON(Test outcome negative) 7 1 TOP(Test outcome positive) 1 7 TP(True positive/hit) 1 5 TPR(Sensitivity, recall, hit rate, or true positive rate) 0.33333 1.0 >>> cm3 = ConfusionMatrix(matrix={"Class1": {"Class1": 1, "Class2":0}, "Class2": {"Class1": 2, "Class2": 5}},transpose=True) # Transpose Matrix >>> cm3.print_matrix() Predict Class1 Class2 Actual Class1 1 2 Class2 0 5
matrix()
andnormalized_matrix()
renamed toprint_matrix()
andprint_normalized_matrix()
inversion 1.5
Activation threshold
threshold
is added in version 0.9
for real value prediction.
For more information visit Example3
Load from file
file
is added in version 0.9.5
in order to load saved confusion matrix with .obj
format generated by save_obj
method.
For more information visit Example4
Sample weights
sample_weight
is added in version 1.2
For more information visit Example5
Transpose
transpose
is added in version 1.2
in order to transpose input matrix (only in Direct CM
mode)
Relabel
relabel
method is added in version 1.5
in order to change ConfusionMatrix classnames.
>>> cm.relabel(mapping={0:"L1",1:"L2",2:"L3"}) >>> cm pycm.ConfusionMatrix(classes: ['L1', 'L2', 'L3'])
Position
position
method is added in version 2.8
in order to find the indexes of observations in predict_vector
which made TP, TN, FP, FN.
>>> cm.position() {0: {'FN': [], 'FP': [0, 7], 'TP': [1, 4, 9], 'TN': [2, 3, 5, 6, 8, 10, 11]}, 1: {'FN': [5, 10], 'FP': [3], 'TP': [6], 'TN': [0, 1, 2, 4, 7, 8, 9, 11]}, 2: {'FN': [0, 3, 7], 'FP': [5, 10], 'TP': [2, 8, 11], 'TN': [1, 4, 6, 9]}}
To array
to_array
method is added in version 2.9
in order to returns the confusion matrix in the form of a NumPy array. This can be helpful to apply different operations over the confusion matrix for different purposes such as aggregation, normalization, and combination.
>>> cm.to_array() array([[3, 0, 0], [0, 1, 2], [2, 1, 3]]) >>> cm.to_array(normalized=True) array([[1. , 0. , 0. ], [0. , 0.33333, 0.66667], [0.33333, 0.16667, 0.5 ]]) >>> cm.to_array(normalized=True,one_vs_all=True, class_name="L1") array([[1. , 0. ], [0.22222, 0.77778]])
Combine
combine
method is added in version 3.0
in order to merge two confusion matrices. This option will be useful in mini-batch learning.
>>> cm_combined = cm2.combine(cm3) >>> cm_combined.print_matrix() Predict Class1 Class2 Actual Class1 2 4 Class2 0 10
Plot
plot
method is added in version 3.0
in order to plot a confusion matrix using Matplotlib or Seaborn.
>>> cm.plot()
>>> from matplotlib import pyplot as plt >>> cm.plot(cmap=plt.cm.Greens,number_label=True,plot_lib="matplotlib")
>>> cm.plot(cmap=plt.cm.Reds,normalized=True,number_label=True,plot_lib="seaborn")
Online help
online_help
function is added in version 1.1
in order to open each statistics definition in web browser
>>> from pycm import online_help >>> online_help("J") >>> online_help("SOA1(Landis & Koch)") >>> online_help(2)
- List of items are available by calling
online_help()
(without argument) - If PyCM website is not available, set
alt_link = True
(new inversion 2.4
)
Parameter recommender
This option has been added in version 1.9
to recommend the most related parameters considering the characteristics of the input dataset. The suggested parameters are selected according to some characteristics of the input such as being balance/imbalance and binary/multi-class. All suggestions can be categorized into three main groups: imbalanced dataset, binary classification for a balanced dataset, and multi-class classification for a balanced dataset. The recommendation lists have been gathered according to the respective paper of each parameter and the capabilities which had been claimed by the paper.
>>> cm.imbalance False >>> cm.binary False >>> cm.recommended_list ['MCC', 'TPR Micro', 'ACC', 'PPV Macro', 'BCD', 'Overall MCC', 'Hamming Loss', 'TPR Macro', 'Zero-one Loss', 'ERR', 'PPV Micro', 'Overall ACC']
Compare
In version 2.0
, a method for comparing several confusion matrices is introduced. This option is a combination of several overall and class-based benchmarks. Each of the benchmarks evaluates the performance of the classification algorithm from good to poor and give them a numeric score. The score of good and poor performances are 1 and 0, respectively.
After that, two scores are calculated for each confusion matrices, overall and class-based. The overall score is the average of the score of six overall benchmarks which are Landis & Koch, Fleiss, Altman, Cicchetti, Cramer, and Matthews. In the same manner, the class-based score is the average of the score of six class-based benchmarks which are Positive Likelihood Ratio Interpretation, Negative Likelihood Ratio Interpretation, Discriminant Power Interpretation, AUC value Interpretation, Matthews Correlation Coefficient Interpretation and Yule's Q Interpretation. It should be noticed that if one of the benchmarks returns none for one of the classes, that benchmarks will be eliminated in total averaging. If the user sets weights for the classes, the averaging over the value of class-based benchmark scores will transform to a weighted average.
If the user sets the value of by_class
boolean input True
, the best confusion matrix is the one with the maximum class-based score. Otherwise, if a confusion matrix obtains the maximum of both overall and class-based scores, that will be reported as the best confusion matrix, but in any other case, the compared object doesn’t select the best confusion matrix.
>>> cm2 = ConfusionMatrix(matrix={0:{0:2,1:50,2:6},1:{0:5,1:50,2:3},2:{0:1,1:7,2:50}}) >>> cm3 = ConfusionMatrix(matrix={0:{0:50,1:2,2:6},1:{0:50,1:5,2:3},2:{0:1,1:55,2:2}}) >>> cp = Compare({"cm2":cm2,"cm3":cm3}) >>> print(cp) Best : cm2 Rank Name Class-Score Overall-Score 1 cm2 9.05 2.55 2 cm3 6.05 1.98333 >>> cp.best pycm.ConfusionMatrix(classes: [0, 1, 2]) >>> cp.sorted ['cm2', 'cm3'] >>> cp.best_name 'cm2'
Acceptable data types
ConfusionMatrix
actual_vector
: pythonlist
or numpyarray
of any stringable objectspredict_vector
: pythonlist
or numpyarray
of any stringable objectsmatrix
:dict
digit
:int
threshold
:FunctionType (function or lambda)
file
:File object
sample_weight
: pythonlist
or numpyarray
of numberstranspose
:bool
classes
: pythonlist
- Run
help(ConfusionMatrix)
forConfusionMatrix
object details
Compare
cm_dict
: pythondict
ofConfusionMatrix
object (str
:ConfusionMatrix
)by_class
:bool
weight
: pythondict
of class weights (class_name
:float
)digit
:int
- Run
help(Compare)
forCompare
object details
For more information visit here
Try PyCM in your browser!
PyCM can be used online in interactive Jupyter Notebooks via the Binder service! Try it out now! :
- Check
Examples
inDocument
folder
Issues & bug reports
Just fill an issue and describe it. We'll check it ASAP! or send an email to info@pycm.ir.
- Please complete the issue template