We are investigating if social listening and sentiment analysis can help predict stock movements. We have focused on the top 100 most ‘popular’ stocks to ensure an adequate base for the social listening. You can see the definition of what we deem ‘popular’ here. We exclude indices and ETFs: https://banks.org/most-popular-stocks/
You will be provided with data sets with stock information (adjusted close, intraday high + low, and volume), and with social listening data sets pertaining to the stocks (mentions, net sentiment, positive sentiment and negative sentiment). Tools to extract data that were used were Thomson Reuters Datastream for stock data and Talkwalker for social listening.
You are tasked with analyzing the data using relevant statistical models and analyses. I think Vector autoregressions and/or machine learning (e.g. long short-term memory RNN) would be a match, but I might be missing some other forms of analysis.
Sections I imagine you will be providing. The sections are not locked and for inspiration. If you find other tests that are more appropriate, then please go ahead:
1: Descriptions of measurements and tests (OLS?, VAR?, LSTM RNN?)
2: Data selection
3: Data analysis and some econometric considerations (Descriptive statistics, Autocorrelation, Heteroscedasticity, Multicollinearity, Outliers, Sample selection bias?, OLS?, Stationarity / Dickey-Fuller, VAR?, AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), LSTM RNN?, Predicting power of the model)
4: Conclusion and suggestions for further research
Here is just the stock data set and current bibliopgrahy for reference: https://we.tl/t-5pHX0HcmrV
The social listening data can be sent tomorrow. It’s still being compile