From Legacy Documents to XML: A Conversion Framework. Chanod, J.P., Chidlovskii, B., D́ejean, H., Fambon, O., Fuselier, J.E., Jacquin, T., & Meunier, J. In pages 92-103.
abstract   bibtex   
We present an integrated framework for the document conversion from legacy formats to XML format. We describe the LegDoC project, aimed at automating the conversion of layout annotations layout-oriented formats like PDF, PS and HTML to semantic-oriented annotations. A toolkit of different components covers complementary techniques the logical document analysis and semantic annotations with the methods of machine learning. We use a real case conversion project as a driving example to exemplify different techniques implemented in the project.
@inproceedings{ cha05,
  crossref = {ecdl2005},
  author = {Jean-Pierre Chanod and Boris Chidlovskii and Herv́e D́ejean and Olivier Fambon and J́er̂ome Fuselier and Thierry Jacquin and Jean-Luc Meunier},
  title = {From Legacy Documents to XML: A Conversion Framework},
  pages = {92-103},
  uri = {http://www.springerlink.com/link.asp?id=5xnqptg4hrdqmy3g},
  abstract = {We present an integrated framework for the document conversion from legacy formats to XML format. We describe the LegDoC project, aimed at automating the conversion of layout annotations layout-oriented formats like PDF, PS and HTML to semantic-oriented annotations. A toolkit of different components covers complementary techniques the logical document analysis and semantic annotations with the methods of machine learning. We use a real case conversion project as a driving example to exemplify different techniques implemented in the project.}
}

Downloads: 0