Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. Bach, S. H.; Rodriguez, D.; Liu, Y.; Luo, C.; Shao, H.; Xia, C.; Sen, S.; Ratner, A.; Hancock, B.; Alborzi, H.; Kuchhal, R.; Ré, C.; and Malkin, R.
Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale [link]Paper  abstract   bibtex   
Labeling training data is one of the most costly bottlenecks in developing or modifying machine learning-based applications. We survey how resources from across an organization can be used as weak supervision sources for three classification tasks at Google, in order to bring development time and cost down by an order of magnitude. We build on the Snorkel framework, extending it as a new system, Snorkel DryBell, which integrates with Google's distributed production systems and enables engineers to develop and execute weak supervision strategies over millions of examples in less than thirty minutes. We find that Snorkel DryBell creates classifiers of comparable quality to ones trained using up to tens of thousands of hand-labeled examples, in part by leveraging organizational resources not servable in production which contribute an average 52% performance improvement to the weakly supervised classifiers.
@article{bachSnorkelDryBellCase2018,
  archivePrefix = {arXiv},
  eprinttype = {arxiv},
  eprint = {1812.00417},
  primaryClass = {cs, stat},
  title = {Snorkel {{DryBell}}: {{A Case Study}} in {{Deploying Weak Supervision}} at {{Industrial Scale}}},
  url = {http://arxiv.org/abs/1812.00417},
  shorttitle = {Snorkel {{DryBell}}},
  abstract = {Labeling training data is one of the most costly bottlenecks in developing or modifying machine learning-based applications. We survey how resources from across an organization can be used as weak supervision sources for three classification tasks at Google, in order to bring development time and cost down by an order of magnitude. We build on the Snorkel framework, extending it as a new system, Snorkel DryBell, which integrates with Google's distributed production systems and enables engineers to develop and execute weak supervision strategies over millions of examples in less than thirty minutes. We find that Snorkel DryBell creates classifiers of comparable quality to ones trained using up to tens of thousands of hand-labeled examples, in part by leveraging organizational resources not servable in production which contribute an average 52\% performance improvement to the weakly supervised classifiers.},
  urldate = {2019-04-16},
  date = {2018-12-02},
  keywords = {Statistics - Machine Learning,Computer Science - Machine Learning},
  author = {Bach, Stephen H. and Rodriguez, Daniel and Liu, Yintao and Luo, Chong and Shao, Haidong and Xia, Cassandra and Sen, Souvik and Ratner, Alexander and Hancock, Braden and Alborzi, Houman and Kuchhal, Rahul and Ré, Christopher and Malkin, Rob},
  file = {/home/dimitri/Nextcloud/Zotero/storage/HCP7XGTD/Bach et al. - 2018 - Snorkel DryBell A Case Study in Deploying Weak Su.pdf;/home/dimitri/Nextcloud/Zotero/storage/Q44AHQN4/1812.html}
}
Downloads: 0