Naive Bayes

Input

Two datasets, one for training and one for testing

Example training dataset:

id S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 S21 S22 S23 S24 S25 S26 S27 S28 S29 S30 S31 S32 S33 S34 S35 S36 S37 S38
class Type_L Type_L Type_L Type_L Type_CL Type_L Type_L Type_L Type_L Type_B Type_CL Type_L Type_L Type_B Type_B Type_B Type_B Type_L Type_B Type_B Type_B Type_L Type_N Type_N Type_N Type_L Type_L Type_L Type_CL Type_L Type_L Type_B Type_L Type_CL Type_B Type_L Type_B Type_L
ENSG00000000419 5.32 5.3 6.64 5.6 6.61 5.24 6.66 6.62 6.41 5.19 6.16 7.29 4.78 6.14 5.66 5.8 6.2 4.69 6.5 5.49 7.29 5.23 5.63 4.82 5.64 6.07 4.79 6.25 5.19 5.8 4.63 5.98 6.19 4.97 4.15 5.82 6.4 4.67
ENSG00000001036 6.11 4.8 4.47 4.99 6.94 5 5.11 5.42 4.58 4.74 6.03 5.47 4.32 6.16 5.89 5.37 6.01 4.72 5.17 5.96 6.35 5.02 5.31 6.37 5.24 6.29 5.81 4.16 6.56 5.91 5.19 4 5.48 4.14 5.9 5.51 5.14 6.22
ENSG00000001084 5.78 4.01 5.14 5.11 3.63 5.93 4.11 3.68 3.23 5.01 5.81 4.5 5.17 5.61 6.19 5.02 6.79 5.97 4.77 5.85 4.78 4.27 6.8 6.25 4.5 5.08 5.21 4.24 4.12 6 4.16 3.19 4.29 3.95 5.26 6.63 4.38 6.38

Example testing dataset:

id T01 T02 T03 T04 T05 T06 T07 T08 T09 T10 T11 T12 T13 T14
ENSG00000000419 6.06 5.25 5.15 5.36 5.22 5.75 4.79 6.01 6.39 7.71 5.7 5.85 5.34 5.07
ENSG00000001036 4.47 5.39 5.28 5.82 5.75 6.96 6.09 5.07 5.3 5.76 5.4 4.27 5.03 5.68
ENSG00000001084 4.47 6 5.77 5.76 5.76 4.64 4.07 4.52 4.16 3.24 4.35 2.49 5.89 4.37
ENSG00000001497 5.27 4.27 6.58 5.54 5.57 6.4 4.91 4.79 4.95 5.83 4.89 5.33 4.23 6.03

Code

R Code to run Naive Bayes

Output

Result For TestSet

Sample PredictedClass
T01 Type_B
T02 Type_B
T03 Type_B
T04 Type_B
T05 Type_B
T06 Type_B
T07 Type_CL
T08 Type_L
T09 Type_L
T10 Type_L
T11 Type_L
T12 Type_L
T13 Type_L
T14 Type_L

Class Probabilities For TestSet

Sample Type_B Type_CL Type_L Type_N
T01 1 0 0 0
T02 1 0 0 0
T03 1 0 0 0
T04 1 0 0 0
T05 1 0 0 0
T06 1 0 0 0
T07 0 1 0 0
T08 0 0 1 0
T09 0 0 1 0
T10 0 0 1 0
T11 0 0 1 0
T12 0 0 1 0
T13 0 0 1 0
T14 0 0 1 0

Predict_stat

Overall accuracy for the training set: 0.8684211 Confusion matrix:

True predicted Type_B Type_CL Type_L Type_N
Type_B 11 0 0 0
Type_CL 0 4 0 0
Type_L 0 0 20 0
Type_N 1 0 0 3