Detection of Anomalies in Large Datasets Using an Active Learning Scheme Based on Dirichlet Distributions. Pichara, K., Soto, A., & Araneda, A. In Advances in Artificial Intelligence, Iberamia-08, LNCS 5290, pages 163-172, 2008. Paper abstract bibtex 1 download Today, the detection of anomalous records is a highly valu- able application in the analysis of current huge datasets. In this paper we propose a new algorithm that, with the help of a human expert, effi- ciently explores a dataset with the goal of detecting relevant anomalous records. Under this scheme the computer selectively asks the expert for data labeling, looking for relevant semantic feedback in order to improve its knowledge about what characterizes a relevant anomaly. Our ratio- nale is that while computers can process huge amounts of low level data, an expert has high level semantic knowledge to efficiently lead the search. We build upon our previous work based on Bayesian networks that pro- vides an initial set of potential anomalies. In this paper, we augment this approach with an active learning scheme based on the clustering proper- ties of Dirichlet distributions. We test the performance of our algorithm using synthetic and real datasets. Our results indicate that, under noisy data and anomalies presenting regular patterns, our approach signifi- cantly reduces the rate of false positives, while decreasing the time to reach the relevant anomalies.
@InProceedings{ pichara:etal:2008,
author = {K. Pichara and A. Soto and A. Araneda},
title = {Detection of Anomalies in Large Datasets Using an Active
Learning Scheme Based on Dirichlet Distributions},
booktitle = {Advances in Artificial Intelligence, Iberamia-08, LNCS
5290},
pages = {163-172},
year = {2008},
abstract = {Today, the detection of anomalous records is a highly
valu- able application in the analysis of current huge
datasets. In this paper we propose a new algorithm that,
with the help of a human expert, effi- ciently explores a
dataset with the goal of detecting relevant anomalous
records. Under this scheme the computer selectively asks
the expert for data labeling, looking for relevant semantic
feedback in order to improve its knowledge about what
characterizes a relevant anomaly. Our ratio- nale is that
while computers can process huge amounts of low level data,
an expert has high level semantic knowledge to efficiently
lead the search. We build upon our previous work based on
Bayesian networks that pro- vides an initial set of
potential anomalies. In this paper, we augment this
approach with an active learning scheme based on the
clustering proper- ties of Dirichlet distributions. We test
the performance of our algorithm using synthetic and real
datasets. Our results indicate that, under noisy data and
anomalies presenting regular patterns, our approach
signifi- cantly reduces the rate of false positives, while
decreasing the time to reach the relevant anomalies. },
url = {http://saturno.ing.puc.cl/media/papers_alvaro/ActiveLearning.pdf}
}
Downloads: 1
{"_id":{"_str":"53427a470e946d920a0018b9"},"__v":15,"authorIDs":["32ZR23o2BFySHbtQK","3ear6KFZSRqbj6YeT","4Pq6KLaQ8jKGXHZWH","54578d9a2abc8e9f370004f0","546f0c3f5ac8e5e30d0000f0","5e126ca5a4cabfdf01000053","5e158f76f1f31adf01000118","5e16174bf67f7dde010003ad","5e1f631ae8f5ddde010000eb","5e1f7182e8f5ddde010001ff","5e26da3642065ede01000066","5e3acefaf2a00cdf010001c8","5e62c3aecb259cde010000f9","5e65830c6e5f4cf3010000e7","5e666dfc46e828de010002c9","6cMBYieMJhf6Nd58M","6w6sGsxYSK2Quk6yZ","7xDcntrrtC62vkWM5","ARw5ReidxxZii9TTZ","BjzM7QpRCG7uCF7Zf","DQ4JRTTWkvKXtCNCp","GbYBJvxugXMriQwbi","HhRoRmBvwWfD4oLyK","JFk6x26H6LZMoht2n","JvArGGu5qM6EvSCvB","LpqQBhFH3PxepH9KY","MT4TkSGzAp69M3dGt","QFECgvB5v2i4j2Qzs","RKv56Kes3h6FwEa55","Rb9TkQ3KkhGAaNyXq","RdND8NxcJDsyZdkcK","SpKJ5YujbHKZnHc4v","TSRdcx4bbYKqcGbDg","W8ogS2GJa6sQKy26c","WTi3X2fT8dzBN5d8b","WfZbctNQYDBaiYW6n","XZny8xuqwfoxzhBCB","Xk2Q5qedS5MFHvjEW","bbARiTJLYS79ZMFbk","cBxsyeZ37EucQeBYK","cFyFQps7W3Sa2Wope","dGRBfr8zhMmbwK6eP","eRLgwkrEk7T7Lmzmf","fMYSCX8RMZap548vv","g6iKCQCFnJgKYYHaP","h2hTcQYuf2PB3oF8t","h83jBvZYJPJGutQrs","jAtuJBcGhng4Lq2Nd","pMoo2gotJcdDPwfrw","q5Zunk5Y2ruhw5vyq","rzNGhqxkbt2MvGY29","uC8ATA8AfngWpYLBq","uoJ7BKv28Q6TtPmPp","vMiJzqEKCsBxBEa3v","vQE6iTPpjxpuLip2Z","wQDRsDjhgpMJDGxWX","wbNg79jvDpzX9zHLK","wk86BgRiooBjy323E","zCbPxKnQGgDHiHMWn","zf9HENjsAzdWLMDAu"],"author_short":["Pichara, K.","Soto, A.","Araneda, A."],"bibbaseid":"pichara-soto-araneda-detectionofanomaliesinlargedatasetsusinganactivelearningschemebasedondirichletdistributions-2008","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["K."],"propositions":[],"lastnames":["Pichara"],"suffixes":[]},{"firstnames":["A."],"propositions":[],"lastnames":["Soto"],"suffixes":[]},{"firstnames":["A."],"propositions":[],"lastnames":["Araneda"],"suffixes":[]}],"title":"Detection of Anomalies in Large Datasets Using an Active Learning Scheme Based on Dirichlet Distributions","booktitle":"Advances in Artificial Intelligence, Iberamia-08, LNCS 5290","pages":"163-172","year":"2008","abstract":"Today, the detection of anomalous records is a highly valu- able application in the analysis of current huge datasets. In this paper we propose a new algorithm that, with the help of a human expert, effi- ciently explores a dataset with the goal of detecting relevant anomalous records. Under this scheme the computer selectively asks the expert for data labeling, looking for relevant semantic feedback in order to improve its knowledge about what characterizes a relevant anomaly. Our ratio- nale is that while computers can process huge amounts of low level data, an expert has high level semantic knowledge to efficiently lead the search. We build upon our previous work based on Bayesian networks that pro- vides an initial set of potential anomalies. In this paper, we augment this approach with an active learning scheme based on the clustering proper- ties of Dirichlet distributions. We test the performance of our algorithm using synthetic and real datasets. Our results indicate that, under noisy data and anomalies presenting regular patterns, our approach signifi- cantly reduces the rate of false positives, while decreasing the time to reach the relevant anomalies. ","url":"http://saturno.ing.puc.cl/media/papers_alvaro/ActiveLearning.pdf","bibtex":"@InProceedings{\t pichara:etal:2008,\n author\t= {K. Pichara and A. Soto and A. Araneda},\n title\t\t= {Detection of Anomalies in Large Datasets Using an Active\n\t\t Learning Scheme Based on Dirichlet Distributions},\n booktitle\t= {Advances in Artificial Intelligence, Iberamia-08, LNCS\n\t\t 5290},\n pages\t\t= {163-172},\n year\t\t= {2008},\n abstract\t= {Today, the detection of anomalous records is a highly\n\t\t valu- able application in the analysis of current huge\n\t\t datasets. In this paper we propose a new algorithm that,\n\t\t with the help of a human expert, effi- ciently explores a\n\t\t dataset with the goal of detecting relevant anomalous\n\t\t records. Under this scheme the computer selectively asks\n\t\t the expert for data labeling, looking for relevant semantic\n\t\t feedback in order to improve its knowledge about what\n\t\t characterizes a relevant anomaly. Our ratio- nale is that\n\t\t while computers can process huge amounts of low level data,\n\t\t an expert has high level semantic knowledge to efficiently\n\t\t lead the search. We build upon our previous work based on\n\t\t Bayesian networks that pro- vides an initial set of\n\t\t potential anomalies. In this paper, we augment this\n\t\t approach with an active learning scheme based on the\n\t\t clustering proper- ties of Dirichlet distributions. We test\n\t\t the performance of our algorithm using synthetic and real\n\t\t datasets. Our results indicate that, under noisy data and\n\t\t anomalies presenting regular patterns, our approach\n\t\t signifi- cantly reduces the rate of false positives, while\n\t\t decreasing the time to reach the relevant anomalies. },\n url\t\t= {http://saturno.ing.puc.cl/media/papers_alvaro/ActiveLearning.pdf}\n}\n\n","author_short":["Pichara, K.","Soto, A.","Araneda, A."],"key":"pichara:etal:2008","id":"pichara:etal:2008","bibbaseid":"pichara-soto-araneda-detectionofanomaliesinlargedatasetsusinganactivelearningschemebasedondirichletdistributions-2008","role":"author","urls":{"Paper":"http://saturno.ing.puc.cl/media/papers_alvaro/ActiveLearning.pdf"},"metadata":{"authorlinks":{"soto, a":"https://asoto.ing.puc.cl/publications/"}},"downloads":1},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/ialab-puc/ialab.ing.puc.cl/master/pubs.bib","downloads":1,"keywords":[],"search_terms":["detection","anomalies","large","datasets","using","active","learning","scheme","based","dirichlet","distributions","pichara","soto","araneda"],"title":"Detection of Anomalies in Large Datasets Using an Active Learning Scheme Based on Dirichlet Distributions","year":2008,"dataSources":["baKak4mfPDQ4RS73F","3YPRCmmijLqF4qHXd","sg6yZ29Z2xB5xP79R","sj4fjnZAPkEeYdZqL","m8qFBfFbjk9qWjcmJ","QjT2DEZoWmQYxjHXS","FEMiN7g9QC58jhiYi"]}