SCIMAT: Dataset of Problems in Science and Mathematics. Chatakonda, S. K., Kollepara, N., & Kumar, P. In Srirama, S. N., Lin, J. C., Bhatnagar, R., Agarwal, S., & Reddy, P. K., editors, Big Data Analytics, of Lecture Notes in Computer Science, pages 211–226, Cham, 2021. Springer International Publishing.
doi  abstract   bibtex   
Datasets play an important role in driving innovation in algorithms and architectures for supervised deep learning tasks. Numerous datasets exist for images, language translation, etc. One of the interesting challenge problems for deep learning is to solve high school problems in mathematics and sciences. To this end, a comprehensive set of dataset containing hundreds of millions of samples, and the generation modules is required that can propel research for these problems. In this paper, a large set of datasets covering mathematics and science problems is proposed, and the dataset generation codes are proposed. Test results on the proposed datasets for character-to-character transformer architecture show promising results with test accuracy above 95%, however, for some datasets it shows test accuracy of below 30%. Dataset will be available at: www.github.com/misterpawan/scimat2.
@inproceedings{chatakonda_scimat_2021,
	address = {Cham},
	series = {Lecture {Notes} in {Computer} {Science}},
	title = {{SCIMAT}: {Dataset} of {Problems} in {Science} and {Mathematics}},
	isbn = {978-3-030-93620-4},
	shorttitle = {{SCIMAT}},
	doi = {10.1007/978-3-030-93620-4_16},
	abstract = {Datasets play an important role in driving innovation in algorithms and architectures for supervised deep learning tasks. Numerous datasets exist for images, language translation, etc. One of the interesting challenge problems for deep learning is to solve high school problems in mathematics and sciences. To this end, a comprehensive set of dataset containing hundreds of millions of samples, and the generation modules is required that can propel research for these problems. In this paper, a large set of datasets covering mathematics and science problems is proposed, and the dataset generation codes are proposed. Test results on the proposed datasets for character-to-character transformer architecture show promising results with test accuracy above 95\%, however, for some datasets it shows test accuracy of below 30\%. Dataset will be available at: www.github.com/misterpawan/scimat2.},
	language = {en},
	booktitle = {Big {Data} {Analytics}},
	publisher = {Springer International Publishing},
	author = {Chatakonda, Snehith Kumar and Kollepara, Neeraj and Kumar, Pawan},
	editor = {Srirama, Satish Narayana and Lin, Jerry Chun-Wei and Bhatnagar, Raj and Agarwal, Sonali and Reddy, P. Krishna},
	year = {2021},
	keywords = {Mathematics, Question-Answering, Science, Transformer, education},
	pages = {211--226},
}

Downloads: 0