This page lists a sample of projects I've worked on in the past few years. If you have questions about any of these please don't hesitate to get in touch.
Machine Translation/Text Data Mining
Keywords: semantic network, translation, multilingual, thesaurus, data mining, Japanese, Chinese, German, English, Arabic, Wikipedia, context, xml, very large xml files, PHP, C++
I have had a strong interest in machine translation research, on and off, for over fifteen years. A combination of recent developments make me think a dramatic jump in quality will soon be possible: fast hardware, lots of memory and the availability of freely available multilingual databases such as Wikipedia. The open source movement also gives me hope: imagine how quickly a system could learn if a million people checked in translation corrections every single day.
I have a working prototype doing translations between English, Japanese, Chinese and German (with Arabic coming soon), with a very high level of quality on the test documents. The engine is designed to be easy to adapt to a specific domain. If you are interested please get in touch.
In 2005 I started the MLSN project, an open source multi-lingual semantic network. This started out as needing a Japanese thesaurus, but it quickly expanded into a more ambitious project to store all the relations between words in multiple languages. It has recently had a major upgrade:
- Chinese and German added (to the existing Japanese and English)
- Much cleaner user-interface
- 50x quicker searches
At the current time, dictionary sizes are still quite small, so please be forgiving. Quick links to other online dictionaries in all the supported languages are provided.
My machine translation research ties in closely with my intelligent search work, and with my go project. In a sound bite: all three are about understanding context.
|
Intelligent Search/Keyword Suggestion/Multilingual Search Engine
Keywords: semantic network, search engine, Namazu, Chasen, keyword, advertising, synonyms, quick, intelligent, spam classification, open source intelligence, SQL, Japanese, Chinese, German, English, Arabic
Intelligent search is a key interest of mine; it ties in closely with my interests in machine translation and computer go, and artificial intelligence generally.
Most current search technology relies on an exact match. As a simple example if you search for "woodland" you do not get pages about "forests". MLSN, in conjunction with other technology I have developed, can be used to improve search engine results, by discovering the meaning behind the words: both the words in the search query, and the words on the pages being indexed.
The same techniques that apply to searching the web, and searching within a web site, can also apply to realms such as diverse as spam detection, open source intelligence, advertiser keyword suggestion, and data validation.
My work allows a high level of intelligence and usefulness not just in English, but (currently) also in Japanese, Chinese and German. Arabic will be added later in 2007. If you would like to learn more please get in touch.
I have used these intelligent search techniques, the open source MLSN database, Namazu (an open source search engine tailored for Japanese), chasen (a Japanese morphological analyzer) and other tools and data sources, in a number of commercial projects, including some for very large sites, and including dynamic database-backed web sites (where the SQL DB is directly used to build the index, not just the web site's static html content).
|
Flash Charting
Keywords: flash, swf, charts, financial charts, PHP, server/client, statistics, dcflash, actionscript, xml sockets, marketing, data visualization
I have an actionscript charting library (called "dcflash"). You can see some demos of it. (I have some more sophisticated demos not released publicly. If you would like to see them please contact me directly.) This library has generated a lot of interest and parts are gradually being released as open source: late 2006 saw a first release on sourceforge: http://dcflash.sf.net/.
Most applications so far have been in financial charting (described in its own section). The charts have also been used in marketing applications (see the pie chart demo) and in data visualization (try the "graph" links in the MLSN results page).
For connections to the back-end live data source I have typically used XML sockets. I have found these efficient and flexible, and therefore useful. I have also used Flash Remoting, but it has more overhead and is also less flexible. Incidentally my fclib library contains a free PHP xml socket library that works on all of linux, windows and mac, and has been useful for a number of quite diverse projects.
|
Financial Charting
Keywords: flash, swf, charts, finance, financial charts, PHP, server/client, real-time prices, historic data, statistics, technical analysis, dcflash, actionscript, xml sockets
I have worked on a number of financial charting projects, using my flash charting library (see separate entry, and some demos of it).
As one example I developed a sophisticated charting tool for a financial industry start-up. The chart styles include candlestick, bar, area, line, point and figure, market profile. Analysis tools include RSI, stochastics, bollinger bands, volume. This flash application was interactive and it was possible to draw trendlines, gann lines, and fibonacci lines. A live news feed was also included.
In addition to the front-end I wrote the PHP back-end server for this same project. It had the following primary tasks:
- Handle requests from hundred of simultaneous clients, getting data from the database (Microsoft SQL Server), pre-processing results then sending the data to the client. Flash XML sockets were used for communication.
- Receive real-time data from four different exchanges and simultaneously update both the historic data and push the latest data to clients that are watching those symbols.
- Import huge quantities of historical data, from raw tick data.
The entire system was very portable. The back-end ran on either Windows using Microsoft SQL Server, or on linux using MySQL. The front-end ran on all Flash 6 platforms: Windows, Mac, Linux.
Back-End Development Thoughts: Some of the challenges were coping with the enormous amounts of data the markets can generate. Not just historical data, but also the live ticks when the markets get busy. I was shocked to discover Microsoft SQL Server ran more slowly than MySQL on Linux despiting being on a considerably higher spec machine. Another challenge was differences in the way different exchanges distribute their data (not just syntax: the concepts of things like volume vary).
|
Automatic Flash Map Generation
Keywords: ming, swf, flash, map, exhibition hall, C++, flirt, gif, png, swfmill, SQL
For a large exhibition web site I used the ming library in a C++ program to automatically generate over 500 maps.
In the first year the data came from csv files, after that from an SQL database. The maps were interactive and had clickable regions that took users directly to the relevant part of the web site. Map generation was relatively quick, allowing them to be generated frequently during web site development and then, after launch, each time the database was updated.
The application also required generating a static image file from each map. The first year we used a commercial application called swf2video, which worked okay but was windows only and required manual intervention, which was too time-consuming. After that I made a simple utility using flirt (and open source swf display library) to convert the maps to png.
Comments: I found ming hard to use for anything except simple swfs; in particular it was very easy to leak memory or give it a bad pointer that would crash it. If developing such an application from scratch now I would more than likely use swfmill.
|
AJAX
Keywords: ajax, javascript, web applications, flash, client, server, Prototype
I have used ajax techniques to enhance some forms and applications (unfortunately, at the time of writing, they are all behind firewalls). The applications work portably across browsers and Windows/Mac/Linux.
I have used javascript for many years, long before AJAX was the buzzword of the day, and have used very similar client/server techniques in flash applications. I have used the Prototype javascript library. I have also written my own connectivity functions when that suits the job better.
Compared to Flash, what I like about AJAX is that front-end forms are easier to write and smoother (Flash is weak on forms and user interface controls, compared to HTML). Advantages of Flash are that it can do not just data-pull but data-push communication (vital in financial industry applications for instance), and animation and special effects are much easier and smoother.
|
Computer Go
Keywords: igo, baduk, weiqi, search, trees, games, AI, patterns, hashing, indexing, machine learning, algorithms
This is a long-term project (started in 1993 and still far from finished!) to make a program that can play the game of go at the level of a very strong human player. Current programs are not capable of anything more than mediocre amateur level.
This project is a test-bed for my Artificial Intelligence research, in particular data mining, pattern recognition, two-dimensional pattern hashing/indexing and various search algorithms. The necessity for understanding context (in order to make useful patterns) also gives it much in common with my other AI interests of machine translation and intelligent search.
For the past six or seven years I have been working on a specific subset of the computer go problem: solving endgames on 9x9 boards. Concentrating on endgames allows me to get better feedback. Concentrating on 9x9 stops the search tree getting too big. However this turns out to still be a very difficult problem, and current programs are as weak at the endgame as any other stage of the game. In fact I am hopeful that when I eventually crack it that I will have most of the elements needed for a strong player in any stage of the game on any size board. This is because the computer's core weakness in go is life and death, which is just beneath the surface throughout the endgame: if you do not know if a move threatens to live or kill then you cannot know if it is sente or gote.
For more please visit Darren's Computer Go Pages (has articles and open source downloads).
|
Web Site/Flash Activity Tracking
Keywords: tracking, demographics, html, web, flash, analysis, data mining, user experience, marketing, site optimization
For one web site I developed a system that tracks the user from entry in the site, then through the various pages in the site. It ties this information to the demographic information in the database when the user is known; it does not require a login however and can track complete user sessions for users who login halfway through the session or do not login at all. It used PHP sessions which means it works transparently whether the user has cookies enabled or disabled, and also works for mobile phones.
The second half of the system then imports the data regularly and automatically produces daily, weekly and monthly reports on user activity.
Web analysis software such as WebTrends claim to do similar analysis from web server logs but this system is both more accurate (for instance WebTrends uses IP addresses, and has to guess when a session starts and ends), and can also tie the information to a back-end user database (e.g. allowing us to compare most common paths through the sites of users of different gender or in different age groups, etc.).
I have also developed a flash movie that tracked the users actions and reported them back to the server, in a format that integrates with this same web site user tracking system. This was done as far as a proof-of-concept, but was not used on any real sites.
Note: this kind of detailed tracking system tends to be overkill - most clients are just not prepared to do anything with the in-depth analysis it can provide.
|
Computer Languages
Keywords: C, C++, STL, Boost, PHP, SQL, HTML, XML, Javascript, Actionscript 2, Actionscript 3, MySQL, Postgresql, Microsoft SQL server, Java, Ruby, PERL, Python, Bash, Actionscript 1
Main languages: C/C++ (including STL and Boost libraries), PHP, SQL, HTML, XML, Javascript, Actionscript 2 (i.e. Flash 8 and earlier) and Actionscript 3 (Flash 9). I have also done projects in: Java, Ruby, PERL, Python, Bash, Actionscript 1 and have studied Eiffel, Lisp, Prolog and others.
I find PHP the most suitable language for most projects, and C++ for larger, more complex projects or where speed or memory usage is an issue. I have used C++'s Standard Template Library (STL) extensively, and am familiar with most of the Boost libraries.
Wherever possible I write code that will compile and run on both UNIX and Windows.
Most of my SQL experience has been with MySQL, Postgresql and Microsoft SQL server; I am aware of the differences between these and other databases such as Oracle and DB2. Wherever possible I write vendor-neutral SQL.
(For human languages, English is my native tongue, my Japanese is strong, and I understand basic Chinese and German: my reading/writing is notably better than my speaking/listening).
|
Real-Time Credit Card Processing
Keywords: ecommerce, credit card, Veritrans, billing, encryption, gpg
I have worked on a couple of sites that do live credit card billing, meaning the card is authorized and billed automatically and the customer given a success or failure message immediately. For both sites the credit card processing company was Veritrans. My open source fclib library includes functions to assist in interfacing PHP with Veritrans's perl MDK.
For one client, a major drinks company, I created an online order form with real-time credit card verification and charging. A daibiki (cash-on-delivery) option was also offered.
For the other client, a telecommunication company, the needs were more complex. In addition to initial billing we take a credit authorization. This is then used to bill the customer for usage at regular intervals. In addition, the credit card number is recorded in the database. To ensure this is secure the data is encrypted using public-key encryption. gpg is used to do the encryption and decryption. gpg (or PGP) is also used to encrypt customer information when sending order emails, ensuring privacy.
|
CMS/Smarty/QuickForm/XML
Keywords: CMS, content mangagement system, php, XML, Smarty, Pear, Quickform, UTF8, i18n
I have developed a few PHP-based CMSes for various clients. In most cases part of the requirements was that an off-the-shelf CMS would not be flexible enough. Input data usually came from an SQL DB, though in one case the data was both read and written from a very complex XML file (the same file was used for the print version of the data: a prestigious scientific journal). In all cases support for multiple Asian languages - not just Japanese - was a requirement so UTF8 was used.
Smarty was used as the template engine. I have found this easy to use, powerful and easy to extend. Designer resistance seemed common, but no better alternative was ever suggested.
Pear::QuickForm was used to quickly build edit forms on one system. On non-trivial forms this introduced as much complexity as it saved, so I am not convinced of its usefulness. However when I recently tried it again on a slightly simpler project I found a mistake in the examples that come with the documentation. Once that mistake was avoided it went more smoothly.
|
Flash Video Kiosk
Keywords: Flash 8, Actionscript 2, XML, embedded fonts, projector, Screenweaver, mtasc, swfmill, video, flash memory leak workaround, socket server, php, legacy application, SJIS, UTF8
I have recently worked on a generic video kiosk. It was tied to a music playlist coming from a legacy application. On the server side I wrote a socket server (in PHP) to interface with this legacy application and then be an XML socket server for the front-end clients, which used flash 8. Part of the job of the back-end was to convert the Japanese text from Shift-JIS to UTF-8 encoding. Both back and front ends ran on Windows.
On the same project I worked on the actionscript for the front-end flash client. This was primarily for handling parsing of data files and screen transitions, driven by the playlist coming from the back-end. But in fact much of the time spent on the project was fighting flaws in flash: memory leaks, irrational behaviour when loading embedded fonts, and CPU load when running full-screen video.
I used Actionscript 2 for this project, using mtasc and swfmill, on a linux machine. Testing was done partly on the target Windows machine, and partly using the Flash 8 Windows player under wine on linux.
I also used Screenweaver, which is a tool to turn a swf file into a Windows exe; it is more useful than the projector that flash can make as you can add custom actionscript commands to control window size and visibility. The primary use of Screenweaver in this project was to allow the flash window to be invisible initially while the fonts loaded and it connected, and to position itself on a 2nd monitor (a wide-screen TV).
|
Complex CSV Imports into SQL databases
Keywords: csv, SQL, XML, tab separated, data filtering, data mining, Japanese, PHP, C++
I have created many scripts for clients needing to get data (normally in CSV, tab-separated or XML format) into an SQL database (typically for use in web sites, email campaigns, machine translation, keyword suggestion, etc.). These scripts handled fixing dirty data, merge-and-purge (e.g. duplicate email addresses), multiple files in different formats and so on.
Handling different Japanese encodings, and copying with mojibake (illegal characters), is a common requirement.
The simpler scripts have been written in PHP; when dealing with large amounts of data highly optimized C++ has used. I have written scripts for everything from a handful of email addresses in a text file to processing multi-gigabyte XML files.
Frequently this import stage is combined with data filtering, helping to reduce very large files (such as the Wikipedia xml dumps) to manageable slices.
|
Web Site Forms/Data Encryption
Keywords: forms, HTML, SQL, csv, administration, phpMyAdmin, gpg, PGP, encryption
I have written many web site forms that take user data and add or update an entry in a back-end database, and/or write data to CSV. I have also written numerous scripts to rotate log files daily, and to do something with the rotated files (typically email it to the client).
I sometimes also create an admin interface; this can frequently end up a much bigger project than the initial form. On MySQL projects I find much of the administration interface can be done by phpMyAdmin, which is friendly enough even for a non-programmer to use effectively.
I have used gpg and pgp to encrypt data that is sent out, and have a solid understanding of public key encryption.
|
Mail Broadcast System
Keywords: email, HTML, mobile, keitai, tracking, demographic analysis, data mining, marketing, PHP, Java
I designed and implemented a system that takes demographic information on a user from a back-end database and creates and sends a custom email (HTML or text; mobile phone email also supported). It has mostly been used to send Japanese emails.
It tracks email bounces, links (and views in the case of HTML), and produces reports cross-referencing demographics with link clicks and other demographics. It allows the creation of custom reports, data screens and campaign targeting based on not just demographics but also user activity in previous campaigns. It is commonly integrated with the back-end databases for web sites.
The system is called PEP and was developed for and is owned by FlyingColor Group. Most of the system development was in PHP, but there has also been some work in java. The system has been used for a large number of campaigns, promotions and newsletters over the past six years.
|
Batch Video Processing
Keywords: video, ffmpeg, still photos
A personal project to take 30+ hours of home video from down the years and automatically make versions at different resolutions and quality levels, as well as take still snapshots.
I used ffmpeg for this, along with a PHP script.
It worked well and by choosing a high compression rate I was able to squeeze all the video, and stills, on to a single DVD.
|
PHP Open Source Library (FCLIB)
Keywords: PHP, Japanese, UTF8, web forms, CMS, Veritrans, SQL, XML, csv
This is an open-source library of PHP functions that I've developed over a number of years. The first versions were based on library code I took with me when I joined FlyingColor, and it expanded from there. It is not organized around any particular functionality and is more a mixed bag of functions that have been useful on numerous projects. It is fairly well-documented: all documentation is in the source file, javadoc style.
It can be downloaded at either http://www.dcook.org/software/fclib/ or http://www.flyingcolor.com/fclib/.
It is a very casual open source project: I do not actively promote it or try to attract other programmers, though of course I am open to receiving patches.
|
Japanese Site Screen Scraper
Keywords: PHP, screen scraping, POST, curl, Internet Explorer, COM, Mozilla
A project to submit information to two Japanese sites then extract the key information each returns and combine it into a single English page.
This was complicated by the fact that the two sites in question used different encodings (one used Shift JIS one used EUC). This project was done in PHP4, with multibyte extensions and the curl module (for POST-ing the form submission).
Thoughts: If doing this today, and windows platform was an option, I would use the PHP's COM extension for control of Internet Explorer (see the September 05 issue of PHP Architect). I am also investigating similar control of Mozilla but have yet to find something easy. Chickenfoot and Greasemonkey look interesting but are browser plug-ins, so require manual intervention to operate.
|
Web Services and SOAP
Keywords: SOAP, gateway, PHP, Python, Java
I experimented with SOAP as an easy way of making scalable, distributed systems. By using SOAP for the communication between major components it allows them to be moved to a different machine as load increases. In addition SOAP's language-independence means a component can be easily replaced with one written in a different language (maybe for speed, developer familiarity or access to a library not available in the main language of your project).
If interested I wrote an article (now three years old), Pass The SOAP, that shows example SOAP client code in PHP, Python and Java. Example code for the SOAP server is also included.
Since then I have also used Flash Remoting which enables easy interfacing of Flash with web service servers. Commercial gateways are available for Java, .NET and Cold Fusion as well as open source gateways for PHP and Perl. (Update: However I generally prefer Flash XML sockets for server/client communication.)
|
Javascript For Complex User Interfaces
Keywords: Javascript, browser, DHTML, OOP, user interface, Internet Explorer, Mozilla
Modern browsers finally support the complete CSS (style sheet) specification, allowing designers to make their web sites interactive on the client side. I have done a number of such projects, customizing menus to increase ease of use and making effective use of screen space (or "enhanced utilization of screen real estate encouraging streamlined user interactions" if you are in marketing). I have also helped out troubleshooting complex javascript.
Thoughts: Javascript (equivalent to Flash's Actionscript 1) is an interesting language. Objects are very flexible and dynamic: the extreme opposite of C++'s strong typing. The lack of error reporting becomes an issue on larger projects, but dynamic objects also make solving some problems easier, once you get used to it.
Incidentally my dcflash open source library, has a statistics actionscript 1 sub-library which deliberately works equally well as javascript.
|
Open Source Software
Keywords: mlsn, fclib, dcflash, library, semantic network, PHP, actionscript, charting, statistics, javascript
I am a keen supporter of open source software, and make small contributions to numerous projects. I also maintain the following projects:
- FCLIB: a mixed bag of PHP classes and functions
- MLSN: Multi-Lingual Semantic Network (or more modestly, a Japanese thesaurus)
- dcflash: An actionscript utility library, with focus on chart drawing and statistical analysis
Incidentally I always use an MIT or BSD license for my open source projects, believing in the importance of freedom and the natural willingness of users to submit improvements, without the strong-arming of the GPL "virus" and similar licenses. I use to not really care, but over the years having to reinvent libraries that unusable in a project solely because they are GPL has persuaded me.
|
Server/Network Administration
Keywords: web server, cluster, load balancer, mail server, SQL database, database replication, router, firewall, high availability, cisco, linux, redhat, fedora, Windows, Sun, BSD
I have setup and maintained numerous Linux servers, running busy production web sites, as well as DNS, database, mail and other servers. Usually running some version of Redhat or Fedora Core. I have also worked with Sun and BSD machines in the past and have maintained a Cisco 2600 router/firewall for a couple of years. In addition I have setup Linux machines as firewalls and routers when the budget didn't stretch to dedicated hardware. In the distant past I have worked with Windows web servers.
I have setup LAN monitoring and alerts (to email and mobile phone), and have optimized machines and server software for large loads. I have a solid understanding of load balancers, server clusters and database replication. I have studied for the Cisco exams, and have good understanding of router setup and routing protocols (I have not actually taken the Cisco exams).
Note: I am no longer actively chasing server administration work, preferring to focus on software design and coding. My intention with this section is just to show I have practical experience with the internet at all levels.
|
Mobile Development/Brew/Flash Lite
Keywords: mobile, brew, flash lite, flash lite 2, docomo, vodafone, jphone, softbank, kddi, ezweb, Japanese mobile
I have worked on a range of mobile projects, over a number of years, primarily for the Japanese market. I have also worked on mailing list distribution to mobile phones, and have a deep understanding of many of the issues in that.
I have also worked on Brew and Flash Lite (Flash Lite 1 and Flash Lite 2) projects.
|
|