An information-theoretic approach to normal forms for relational and XML data. Arenas, M. & Libkin, L. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, of PODS '03, pages 15–26, New York, NY, USA, June, 2003. Association for Computing Machinery.
An information-theoretic approach to normal forms for relational and XML data [link]Paper  doi  abstract   bibtex   
Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well-designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.
@inproceedings{arenas_information-theoretic_2003,
	address = {New York, NY, USA},
	series = {{PODS} '03},
	title = {An information-theoretic approach to normal forms for relational and {XML} data},
	isbn = {978-1-58113-670-8},
	url = {https://dl.acm.org/doi/10.1145/773153.773155},
	doi = {10.1145/773153.773155},
	abstract = {Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well-designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.},
	urldate = {2025-02-11},
	booktitle = {Proceedings of the twenty-second {ACM} {SIGMOD}-{SIGACT}-{SIGART} symposium on {Principles} of database systems},
	publisher = {Association for Computing Machinery},
	author = {Arenas, Marcelo and Libkin, Leonid},
	month = jun,
	year = {2003},
	pages = {15--26},
}

Downloads: 0