Darren's Work Pages

Work Top Page
Darren Cook <darren@dcook.org>

Web Mining/Social Mining (facebook, twitter, mixi, etc.)

Keywords: data mining, social media, data scientist, data analyst, web mining, screenscraping, scraping, mining, twitter, facebook, mixi, sentiment, Firefox, Internet Explorer, Selenium, PHP, R

Mining and scraping web sites (both those that provide an API, and those that make it hard), in various world languages. Also text mining social network sites. For instance: measuring sentiment based on large number of Twitter tweets (again, handling more than just English). The same for various news web sites, in a large number of world languages (English, Japanese, Chinese, German, French, Russian, Korean, Hindi and more).

Automation tasks such as posting the same announcement to Facebook, Twitter and a blog, to save a user repetitive and boring work.

(I've also made some minor contributions to the open source Phirehose project, which is used for getting streaming data from Twitter. I have also contributed some demos for the twitterouath library.)

Tools of the trade? Beyond the APIs provided by Facebook, Twitter, etc., I also use direct calls (typically from PHP or R). I also use Selenium, which allows script control of a browser, and allows the screen-scraping to deal with all the complexities of modern web sites such as javascript, CSS, AJAX, cookies, logins, etc.


Work Top Page   *   Personal Home Page   *   Email me at: darren@dcook.org   *   PGP Public Key

Last updated: 3rd Oct 2011, © Copyright Darren Cook, 2011.