Datasheets for Datasets. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. arXiv:1803.09010 [cs], December, 2021. 🏷️ /unread、Computer Science - Artificial Intelligence、Computer Science - Machine Learning、Computer Science - Databases
Datasheets for Datasets [link]Paper  abstract   bibtex   
The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability. 【摘要翻译】机器学习领域目前还没有记录数据集的标准化流程,这可能会在高风险领域造成严重后果。为了弥补这一不足,我们提出了数据集的数据表。在电子行业,每个组件,无论简单或复杂,都附有一份数据表,描述其工作特性、测试结果、推荐用途和其他信息。以此类推,我们建议每个数据集都附带一份数据表,记录其动机、组成、收集过程、建议用途等信息。数据集数据表将促进数据集创建者与数据集消费者之间更好的交流,并鼓励机器学习社区优先考虑透明度和问责制。
@article{gebru2021,
	title = {Datasheets for {Datasets}},
	shorttitle = {数据集的数据表},
	url = {http://arxiv.org/abs/1803.09010},
	abstract = {The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics industry, every component, no matter how simple or complex, is accompanied with a datasheet that describes its operating characteristics, test results, recommended uses, and other information. By analogy, we propose that every dataset be accompanied with a datasheet that documents its motivation, composition, collection process, recommended uses, and so on. Datasheets for datasets will facilitate better communication between dataset creators and dataset consumers, and encourage the machine learning community to prioritize transparency and accountability.

【摘要翻译】机器学习领域目前还没有记录数据集的标准化流程,这可能会在高风险领域造成严重后果。为了弥补这一不足,我们提出了数据集的数据表。在电子行业,每个组件,无论简单或复杂,都附有一份数据表,描述其工作特性、测试结果、推荐用途和其他信息。以此类推,我们建议每个数据集都附带一份数据表,记录其动机、组成、收集过程、建议用途等信息。数据集数据表将促进数据集创建者与数据集消费者之间更好的交流,并鼓励机器学习社区优先考虑透明度和问责制。},
	language = {en},
	urldate = {2022-01-24},
	journal = {arXiv:1803.09010 [cs]},
	author = {Gebru, Timnit and Morgenstern, Jamie and Vecchione, Briana and Vaughan, Jennifer Wortman and Wallach, Hanna and Daumé III, Hal and Crawford, Kate},
	month = dec,
	year = {2021},
	note = {🏷️ /unread、Computer Science - Artificial Intelligence、Computer Science - Machine Learning、Computer Science - Databases},
	keywords = {/unread, Computer Science - Artificial Intelligence, Computer Science - Databases, Computer Science - Machine Learning},
}

Downloads: 0