An information-theoretic approach to normal forms for relational and XML data. Arenas, M. & Libkin, L. In Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, of PODS '03, pages 15–26, New York, NY, USA, June, 2003. Association for Computing Machinery.
Paper doi abstract bibtex Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well-designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.
@inproceedings{arenas_information-theoretic_2003,
address = {New York, NY, USA},
series = {{PODS} '03},
title = {An information-theoretic approach to normal forms for relational and {XML} data},
isbn = {978-1-58113-670-8},
url = {https://dl.acm.org/doi/10.1145/773153.773155},
doi = {10.1145/773153.773155},
abstract = {Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well-designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.},
urldate = {2025-02-11},
booktitle = {Proceedings of the twenty-second {ACM} {SIGMOD}-{SIGACT}-{SIGART} symposium on {Principles} of database systems},
publisher = {Association for Computing Machinery},
author = {Arenas, Marcelo and Libkin, Leonid},
month = jun,
year = {2003},
pages = {15--26},
}
Downloads: 0
{"_id":{"_str":"534242800e946d920a0003b8"},"__v":24,"authorIDs":["545779302abc8e9f37000426"],"author_short":["Arenas, M.","Libkin, L."],"bibbaseid":"arenas-libkin-aninformationtheoreticapproachtonormalformsforrelationalandxmldata-2003","bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"New York, NY, USA","series":"PODS '03","title":"An information-theoretic approach to normal forms for relational and XML data","isbn":"978-1-58113-670-8","url":"https://dl.acm.org/doi/10.1145/773153.773155","doi":"10.1145/773153.773155","abstract":"Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well-designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.","urldate":"2025-02-11","booktitle":"Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","publisher":"Association for Computing Machinery","author":[{"propositions":[],"lastnames":["Arenas"],"firstnames":["Marcelo"],"suffixes":[]},{"propositions":[],"lastnames":["Libkin"],"firstnames":["Leonid"],"suffixes":[]}],"month":"June","year":"2003","pages":"15–26","bibtex":"@inproceedings{arenas_information-theoretic_2003,\n\taddress = {New York, NY, USA},\n\tseries = {{PODS} '03},\n\ttitle = {An information-theoretic approach to normal forms for relational and {XML} data},\n\tisbn = {978-1-58113-670-8},\n\turl = {https://dl.acm.org/doi/10.1145/773153.773155},\n\tdoi = {10.1145/773153.773155},\n\tabstract = {Normalization as a way of producing good database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While in the relational world the criteria for being well-designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.},\n\turldate = {2025-02-11},\n\tbooktitle = {Proceedings of the twenty-second {ACM} {SIGMOD}-{SIGACT}-{SIGART} symposium on {Principles} of database systems},\n\tpublisher = {Association for Computing Machinery},\n\tauthor = {Arenas, Marcelo and Libkin, Leonid},\n\tmonth = jun,\n\tyear = {2003},\n\tpages = {15--26},\n}\n\n\n\n\n\n\n\n","author_short":["Arenas, M.","Libkin, L."],"key":"arenas_information-theoretic_2003","id":"arenas_information-theoretic_2003","bibbaseid":"arenas-libkin-aninformationtheoreticapproachtonormalformsforrelationalandxmldata-2003","role":"author","urls":{"Paper":"https://dl.acm.org/doi/10.1145/773153.773155"},"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero/abhishek-p","downloads":0,"keywords":[],"search_terms":["information","theoretic","approach","normal","forms","relational","xml","data","arenas","libkin"],"title":"An information-theoretic approach to normal forms for relational and XML data","year":2003,"dataSources":["D48dwZqPJYE3CuoxQ","h7kKWXpJh2iaX92T5"]}