Identifying gene clusters by discovering common intervals in indeterminate strings

Identifying gene clusters by discovering common intervals in indeterminate strings. Doerr, D., Stoye, J., Böcker, S., & Jahn, K. BMC Genomics, 15(Suppl 6):S2, 2014. Proc. of \emphRECOMB Satelite Workshop on Comparative Genomics (RECOMB-CG 2014)

Paper doi abstract bibtex

Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments between genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Erroneous gene family assignments emerging in this process are amplified in subsequent gene order analyses and thus may deteriorate gene cluster prediction. In this work we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure between the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. Our model is able to detect gene clusters that would be also detected with well-established gene-family based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.

@Article{doerr14identifying,
  author    = {Daniel Doerr and Jens Stoye and Sebastian B\"ocker and Katharina Jahn},
  title     = {Identifying gene clusters by discovering common intervals in indeterminate strings},
  journal   = {BMC Genomics},
  year      = {2014},
  volume    = {15},
  number    = {Suppl 6},
  pages     = {S2},
  note      = {Proc. of \emph{RECOMB Satelite Workshop on Comparative Genomics} (RECOMB-CG 2014)},
  abstract  = {Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments between genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Erroneous gene family assignments emerging in this process are amplified in subsequent gene order analyses and thus may deteriorate gene cluster prediction. In this work we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure between the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. Our model is able to detect gene clusters that would be also detected with well-established gene-family based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.},
  booktitle = {Proc. of},
  doi       = {10.1186/1471-2164-15-S6-S2},
  keywords  = {jena; gene clusters; common intervals;},
  owner     = {Sebastian},
  pmid      = {25571793},
  timestamp = {2014.07.30},
  url       = {http://www.biomedcentral.com/1471-2164/15/S6/S2},
}

Downloads: 0

{"_id":"dayjpmvo3GntkHqLu","bibbaseid":"doerr-stoye-bcker-jahn-identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings-2014","authorIDs":[],"author_short":["Doerr, D.","Stoye, J.","Böcker, S.","Jahn, K."],"bibdata":{"bibtype":"article","type":"article","author":[{"firstnames":["Daniel"],"propositions":[],"lastnames":["Doerr"],"suffixes":[]},{"firstnames":["Jens"],"propositions":[],"lastnames":["Stoye"],"suffixes":[]},{"firstnames":["Sebastian"],"propositions":[],"lastnames":["Böcker"],"suffixes":[]},{"firstnames":["Katharina"],"propositions":[],"lastnames":["Jahn"],"suffixes":[]}],"title":"Identifying gene clusters by discovering common intervals in indeterminate strings","journal":"BMC Genomics","year":"2014","volume":"15","number":"Suppl 6","pages":"S2","note":"Proc. of \\emphRECOMB Satelite Workshop on Comparative Genomics (RECOMB-CG 2014)","abstract":"Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments between genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Erroneous gene family assignments emerging in this process are amplified in subsequent gene order analyses and thus may deteriorate gene cluster prediction. In this work we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure between the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. Our model is able to detect gene clusters that would be also detected with well-established gene-family based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.","booktitle":"Proc. of","doi":"10.1186/1471-2164-15-S6-S2","keywords":"jena; gene clusters; common intervals;","owner":"Sebastian","pmid":"25571793","timestamp":"2014.07.30","url":"http://www.biomedcentral.com/1471-2164/15/S6/S2","bibtex":"@Article{doerr14identifying,\n author = {Daniel Doerr and Jens Stoye and Sebastian B\\\"ocker and Katharina Jahn},\n title = {Identifying gene clusters by discovering common intervals in indeterminate strings},\n journal = {BMC Genomics},\n year = {2014},\n volume = {15},\n number = {Suppl 6},\n pages = {S2},\n note = {Proc. of \\emph{RECOMB Satelite Workshop on Comparative Genomics} (RECOMB-CG 2014)},\n abstract = {Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments between genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Erroneous gene family assignments emerging in this process are amplified in subsequent gene order analyses and thus may deteriorate gene cluster prediction. In this work we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure between the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. Our model is able to detect gene clusters that would be also detected with well-established gene-family based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.},\n booktitle = {Proc. of},\n doi = {10.1186/1471-2164-15-S6-S2},\n keywords = {jena; gene clusters; common intervals;},\n owner = {Sebastian},\n pmid = {25571793},\n timestamp = {2014.07.30},\n url = {http://www.biomedcentral.com/1471-2164/15/S6/S2},\n}\n\n","author_short":["Doerr, D.","Stoye, J.","Böcker, S.","Jahn, K."],"key":"doerr14identifying","id":"doerr14identifying","bibbaseid":"doerr-stoye-bcker-jahn-identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings-2014","role":"author","urls":{"Paper":"http://www.biomedcentral.com/1471-2164/15/S6/S2"},"keyword":["jena; gene clusters; common intervals;"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://git.bio.informatik.uni-jena.de/fleisch/literature/raw/master/group-literature.bib","creationDate":"2019-11-19T16:29:35.104Z","downloads":0,"keywords":["jena; gene clusters; common intervals;"],"search_terms":["identifying","gene","clusters","discovering","common","intervals","indeterminate","strings","doerr","stoye","böcker","jahn"],"title":"Identifying gene clusters by discovering common intervals in indeterminate strings","year":2014,"dataSources":["C5FtkvWWggFfMJTFX"]}