SMOTE: Synthetic Minority over-Sampling Technique

SMOTE: Synthetic Minority over-Sampling Technique. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. 16:321–357.
doi abstract bibtex

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally rep-resented. Often real-world data sets are predominately composed of " normal " examples with only a small percentage of " abnormal " or " interesting " examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (nor-mal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

@article{chawlaSMOTESyntheticMinority2002,
  title = {{{SMOTE}}: {{Synthetic}} Minority over-Sampling Technique},
  volume = {16},
  issn = {10769757},
  doi = {10.1613/jair.953},
  abstract = {An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally rep-resented. Often real-world data sets are predominately composed of " normal " examples with only a small percentage of " abnormal " or " interesting " examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (nor-mal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.},
  journaltitle = {Journal of Artificial Intelligence Research},
  date = {2002},
  pages = {321--357},
  author = {Chawla, Nitesh V. and Bowyer, Kevin W. and Hall, Lawrence O. and Kegelmeyer, W. Philip},
  file = {/home/dimitri/Nextcloud/Zotero/storage/2U6ALGSG/Chawla et al. - 2002 - SMOTE Synthetic minority over-sampling technique.pdf},
  eprinttype = {pmid},
  eprint = {18190633}
}

Downloads: 0

{"_id":"45xZrRkzknvbckdKh","bibbaseid":"chawla-bowyer-hall-kegelmeyer-smotesyntheticminorityoversamplingtechnique","authorIDs":[],"author_short":["Chawla, N. V.","Bowyer, K. W.","Hall, L. O.","Kegelmeyer, W. P."],"bibdata":{"bibtype":"article","type":"article","title":"SMOTE: Synthetic Minority over-Sampling Technique","volume":"16","issn":"10769757","doi":"10.1613/jair.953","abstract":"An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally rep-resented. Often real-world data sets are predominately composed of \" normal \" examples with only a small percentage of \" abnormal \" or \" interesting \" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (nor-mal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.","journaltitle":"Journal of Artificial Intelligence Research","date":"2002","pages":"321–357","author":[{"propositions":[],"lastnames":["Chawla"],"firstnames":["Nitesh","V."],"suffixes":[]},{"propositions":[],"lastnames":["Bowyer"],"firstnames":["Kevin","W."],"suffixes":[]},{"propositions":[],"lastnames":["Hall"],"firstnames":["Lawrence","O."],"suffixes":[]},{"propositions":[],"lastnames":["Kegelmeyer"],"firstnames":["W.","Philip"],"suffixes":[]}],"file":"/home/dimitri/Nextcloud/Zotero/storage/2U6ALGSG/Chawla et al. - 2002 - SMOTE Synthetic minority over-sampling technique.pdf","eprinttype":"pmid","eprint":"18190633","bibtex":"@article{chawlaSMOTESyntheticMinority2002,\n title = {{{SMOTE}}: {{Synthetic}} Minority over-Sampling Technique},\n volume = {16},\n issn = {10769757},\n doi = {10.1613/jair.953},\n abstract = {An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally rep-resented. Often real-world data sets are predominately composed of \" normal \" examples with only a small percentage of \" abnormal \" or \" interesting \" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (nor-mal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.},\n journaltitle = {Journal of Artificial Intelligence Research},\n date = {2002},\n pages = {321--357},\n author = {Chawla, Nitesh V. and Bowyer, Kevin W. and Hall, Lawrence O. and Kegelmeyer, W. Philip},\n file = {/home/dimitri/Nextcloud/Zotero/storage/2U6ALGSG/Chawla et al. - 2002 - SMOTE Synthetic minority over-sampling technique.pdf},\n eprinttype = {pmid},\n eprint = {18190633}\n}\n\n","author_short":["Chawla, N. V.","Bowyer, K. W.","Hall, L. O.","Kegelmeyer, W. P."],"key":"chawlaSMOTESyntheticMinority2002","id":"chawlaSMOTESyntheticMinority2002","bibbaseid":"chawla-bowyer-hall-kegelmeyer-smotesyntheticminorityoversamplingtechnique","role":"author","urls":{},"downloads":0},"bibtype":"article","biburl":"https://raw.githubusercontent.com/dlozeve/newblog/master/bib/all.bib","creationDate":"2020-01-08T20:39:39.024Z","downloads":0,"keywords":[],"search_terms":["smote","synthetic","minority","over","sampling","technique","chawla","bowyer","hall","kegelmeyer"],"title":"SMOTE: Synthetic Minority over-Sampling Technique","year":null,"dataSources":["3XqdvqRE7zuX4cm8m"]}