After many months of hard work, I now have a dataset of thousands of Tweets covering what my colleagues in a research class at San Jose State University determined to be the most significant dates of the Egyptian Revolution that led to the fall of Hozni Mubarak.
These include protests against police following the beating and death of Khaled Said, the most significant days of the Tahrir Square occupation, protests against the Supreme Council of the Armed Forces, and the protests against the murder of Coptic Christians.
Up until now, I’ve been doing this in my spare time, and even gathering the data has been a challenge, especially because of the time limit imposed on Twitter searches. At the moment, my task is to get all the messages translated into English by people with an understanding of colloquialisms and internet culture in Arabic, and then analyze several dimensions of the data:
- Demographics (based on self-reported data)
- Sentiment analysis
- Most retweeted messages and users, and most used phrases
- Frequency of tweets based on real-world events
Here’s the technology I’ve used so far:
- PhantomJS (for gathering tweets)
- Python (for processing the tweets into a human-readable list of messages and metadata)
- Django-nonrel and MongoDB (for storing all the Tweets for later analysis and building a web interface for translation, as well as displaying research results)
None of this is breaking new ground — it’s all “Lego problems”. My issue is that I’m a self-taught programmer trying to do all of this on a tight deadline (there’s an academic paper due March 31, and we’ll be presenting the research in early May), and certain things, while I understand them in theory, will be very difficult to do in practice, like sentiment analysis and “phrase” frequency as opposed to word frequency.
I also have very limited computing resources — I have my own computer, and a 256mb VPS. No funding is available from San Jose State for these resources, mainly because this is the first time anyone in the Journalism and Mass Communications program has attempted this kind of research, and anybody with access to money wouldn’t understand what it is we’re trying to do.
So, in summary: Help me, Internet! I’m a journalism and social science student with only so-so programming chops, attempting to do what I think is very significant research with limited spare time, and so many of you are so much smarter than me that I feel like it’s stupid not to ask for help.
If you can pitch in time, resources, or even random tips on how to do this stuff, send an email here: egypt-social-media-research-sjsu@googlegroups.com.
The work I’ve done so far isn’t on Github yet, mainly because A) it’s embarrassingly ugly and B) my methods for getting the data probably violate some TOS somewhere, but if people show some interest I’ll get everything up on Github ASAP. You can find me on Github here.
Thanks for reading this far, and stay classy, Internet.