Stocks Clustering Based on Textual Embeddings for Price Forecasting

Stocks Clustering Based on Textual Embeddings for Price Forecasting. de Oliveira, A. D. C. M., Pinto, P. F. A., & Colcher, S. In Cerri, R. & Prati, R. C., editors, Intelligent Systems, of Lecture Notes in Computer Science, pages 665–678, 2020. Springer International Publishing.
doi abstract bibtex

Forecasting stock market prices is a hard task. The main reason for that is due to the fact that its environment is highly dynamic, intrinsically complex, and chaotic. Traditional economic theories suggest that trying to forecast short-term stock price movements is a wasted effort because the market is influenced by several external events and its behavior approximates a random walk. Recent studies that address the problem of stock market forecasting usually create specific prediction models for the price behavior of a single stock. In this work we propose a technique to predict price movements based on similar stock sets. Our goal is to build a model to identify whether the price tends to bullishness or bearishness in the near future, considering stock information from similar sets based on two sources of information: historical stock data and Google Trends news. Firstly, the proposed study applies a method to identify similar stock sets and then creates a forecasting model based on a LSTM (long short-term memory) for these sets. More specifically, two experiments were conducted: (1) using the K-Means algorithm to identify similar stock sets and then using a LSTM neural network to forecast stock price movements for these stock sets; (2) using the DBSCAN (Density-based spatial clustering) algorithm to identify similar stock sets and then using the same LSTM neural network to forecast stock price movements. The study was conducted over 51 stocks of the Brazilian stock market. The results show that the use of an algorithm to identify stock clusters yields an improvement of approximately 7% in accuracy and f1-score and 8% in recall and precision when compared to models for a single stock.

@inproceedings{de_oliveira_stocks_2020,
	location = {Cham},
	title = {Stocks Clustering Based on Textual Embeddings for Price Forecasting},
	isbn = {978-3-030-61380-8},
	doi = {10.1007/978-3-030-61380-8_45},
	series = {Lecture Notes in Computer Science},
	abstract = {Forecasting stock market prices is a hard task. The main reason for that is due to the fact that its environment is highly dynamic, intrinsically complex, and chaotic. Traditional economic theories suggest that trying to forecast short-term stock price movements is a wasted effort because the market is influenced by several external events and its behavior approximates a random walk. Recent studies that address the problem of stock market forecasting usually create specific prediction models for the price behavior of a single stock. In this work we propose a technique to predict price movements based on similar stock sets. Our goal is to build a model to identify whether the price tends to bullishness or bearishness in the near future, considering stock information from similar sets based on two sources of information: historical stock data and Google Trends news. Firstly, the proposed study applies a method to identify similar stock sets and then creates a forecasting model based on a {LSTM} (long short-term memory) for these sets. More specifically, two experiments were conducted: (1) using the K-Means algorithm to identify similar stock sets and then using a {LSTM} neural network to forecast stock price movements for these stock sets; (2) using the {DBSCAN} (Density-based spatial clustering) algorithm to identify similar stock sets and then using the same {LSTM} neural network to forecast stock price movements. The study was conducted over 51 stocks of the Brazilian stock market. The results show that the use of an algorithm to identify stock clusters yields an improvement of approximately 7\% in accuracy and f1-score and 8\% in recall and precision when compared to models for a single stock.},
	pages = {665--678},
	booktitle = {Intelligent Systems},
	publisher = {Springer International Publishing},
	author = {de Oliveira, André D. C. M. and Pinto, Pedro F. A. and Colcher, Sergio},
	editor = {Cerri, Ricardo and Prati, Ronaldo C.},
	year = {2020},
	langid = {english},
	keywords = {Forecasting time series, Machine learning, Stock market},
}

Downloads: 0

{"_id":"6E5JhM7LxhZQtbWzP","bibbaseid":"deoliveira-pinto-colcher-stocksclusteringbasedontextualembeddingsforpriceforecasting-2020","authorIDs":[],"author_short":["de Oliveira, A. D. C. M.","Pinto, P. F. A.","Colcher, S."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","location":"Cham","title":"Stocks Clustering Based on Textual Embeddings for Price Forecasting","isbn":"978-3-030-61380-8","doi":"10.1007/978-3-030-61380-8_45","series":"Lecture Notes in Computer Science","abstract":"Forecasting stock market prices is a hard task. The main reason for that is due to the fact that its environment is highly dynamic, intrinsically complex, and chaotic. Traditional economic theories suggest that trying to forecast short-term stock price movements is a wasted effort because the market is influenced by several external events and its behavior approximates a random walk. Recent studies that address the problem of stock market forecasting usually create specific prediction models for the price behavior of a single stock. In this work we propose a technique to predict price movements based on similar stock sets. Our goal is to build a model to identify whether the price tends to bullishness or bearishness in the near future, considering stock information from similar sets based on two sources of information: historical stock data and Google Trends news. Firstly, the proposed study applies a method to identify similar stock sets and then creates a forecasting model based on a LSTM (long short-term memory) for these sets. More specifically, two experiments were conducted: (1) using the K-Means algorithm to identify similar stock sets and then using a LSTM neural network to forecast stock price movements for these stock sets; (2) using the DBSCAN (Density-based spatial clustering) algorithm to identify similar stock sets and then using the same LSTM neural network to forecast stock price movements. The study was conducted over 51 stocks of the Brazilian stock market. The results show that the use of an algorithm to identify stock clusters yields an improvement of approximately 7% in accuracy and f1-score and 8% in recall and precision when compared to models for a single stock.","pages":"665–678","booktitle":"Intelligent Systems","publisher":"Springer International Publishing","author":[{"propositions":["de"],"lastnames":["Oliveira"],"firstnames":["André","D.","C.","M."],"suffixes":[]},{"propositions":[],"lastnames":["Pinto"],"firstnames":["Pedro","F.","A."],"suffixes":[]},{"propositions":[],"lastnames":["Colcher"],"firstnames":["Sergio"],"suffixes":[]}],"editor":[{"propositions":[],"lastnames":["Cerri"],"firstnames":["Ricardo"],"suffixes":[]},{"propositions":[],"lastnames":["Prati"],"firstnames":["Ronaldo","C."],"suffixes":[]}],"year":"2020","langid":"english","keywords":"Forecasting time series, Machine learning, Stock market","bibtex":"@inproceedings{de_oliveira_stocks_2020,\n\tlocation = {Cham},\n\ttitle = {Stocks Clustering Based on Textual Embeddings for Price Forecasting},\n\tisbn = {978-3-030-61380-8},\n\tdoi = {10.1007/978-3-030-61380-8_45},\n\tseries = {Lecture Notes in Computer Science},\n\tabstract = {Forecasting stock market prices is a hard task. The main reason for that is due to the fact that its environment is highly dynamic, intrinsically complex, and chaotic. Traditional economic theories suggest that trying to forecast short-term stock price movements is a wasted effort because the market is influenced by several external events and its behavior approximates a random walk. Recent studies that address the problem of stock market forecasting usually create specific prediction models for the price behavior of a single stock. In this work we propose a technique to predict price movements based on similar stock sets. Our goal is to build a model to identify whether the price tends to bullishness or bearishness in the near future, considering stock information from similar sets based on two sources of information: historical stock data and Google Trends news. Firstly, the proposed study applies a method to identify similar stock sets and then creates a forecasting model based on a {LSTM} (long short-term memory) for these sets. More specifically, two experiments were conducted: (1) using the K-Means algorithm to identify similar stock sets and then using a {LSTM} neural network to forecast stock price movements for these stock sets; (2) using the {DBSCAN} (Density-based spatial clustering) algorithm to identify similar stock sets and then using the same {LSTM} neural network to forecast stock price movements. The study was conducted over 51 stocks of the Brazilian stock market. The results show that the use of an algorithm to identify stock clusters yields an improvement of approximately 7\\% in accuracy and f1-score and 8\\% in recall and precision when compared to models for a single stock.},\n\tpages = {665--678},\n\tbooktitle = {Intelligent Systems},\n\tpublisher = {Springer International Publishing},\n\tauthor = {de Oliveira, André D. C. M. and Pinto, Pedro F. A. and Colcher, Sergio},\n\teditor = {Cerri, Ricardo and Prati, Ronaldo C.},\n\tyear = {2020},\n\tlangid = {english},\n\tkeywords = {Forecasting time series, Machine learning, Stock market},\n}\n\n","author_short":["de Oliveira, A. D. C. M.","Pinto, P. F. A.","Colcher, S."],"editor_short":["Cerri, R.","Prati, R. C."],"key":"de_oliveira_stocks_2020","id":"de_oliveira_stocks_2020","bibbaseid":"deoliveira-pinto-colcher-stocksclusteringbasedontextualembeddingsforpriceforecasting-2020","role":"author","urls":{},"keyword":["Forecasting time series","Machine learning","Stock market"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"http://www.telemidia.puc-rio.br/files/biblio/all.bib","creationDate":"2021-01-15T19:20:27.459Z","downloads":0,"keywords":["forecasting time series","machine learning","stock market"],"search_terms":["stocks","clustering","based","textual","embeddings","price","forecasting","de oliveira","pinto","colcher"],"title":"Stocks Clustering Based on Textual Embeddings for Price Forecasting","year":2020,"dataSources":["gXSBTZhj3xCWydoZF"]}