Professional projects

Whereas, during my forming years in the seventies, I was abhorred by computers, in the eighties computers and I found out we could bear each other’s company. From about 1983 I have been kept alife as a software developer by four employers, two commercial and two academic ones. I dare say my background in physics, resulting in a good portion of naïvité, and at times amateurish (Latin amator, “lover”) approach to software projects, helped me propose and implement solutions that others didn’t believe were within reach.

AMEV insurance company

Name recogniser
In my first job as a programmer I developed a program that would take a text string containing the name of one or more persons, the name of a company, of a foundation or a club, or a combination of these. This little piece of free text also could contain one or more persons’ initials and titles.
This project gave me a taste of Language Technology. At AMEV we used PL/1 as the corporate programming language, although the actuarians that lived two floors higher up used Basic. PL/1 is fine, especially compared to its closest competitor: COBOL. Did you know that you can define a string with a negative length in PL/1? Excellent for eating the last characters in a string: just concatenate your string with the negative-length string. It reminded me of antiparticles in physics. Except that concatenating a string and an anti-string doesn’t release a devastating amount of energy.
The program grew in a very fertile cooperation with a systems analyst who came with the test data and pointed out cases on which the program could be sharpened. Together we all the time pushed the quality to a higher level, resulting in a bigger reduction of manual work than foreseen.

Utrecht University, Faculty of Humanities

Celeste Celeste with two Latin texts
A program for collating two texts word-by-word, jumping over text that has no counterpart in the other text and able to jump backwards if a text fragment seems to have been swapped with another.
The program runs under DOS, but in a graphical mode: CGA, EGA, VGA or Hercules. It is almost my first C-program and reflects my being new to DOS. For example, I had heard rumours of a program on one of the University’s computer centre’s computers that could access the Brown Corpus and that program was “multitasking”, so I thought multitasking was the right thing. So Celeste multitasks. You can for example read “help”-text while the program is tugging its way through the text in the background. Or you can move a cursor (it has two cursors!) to a position on the screen and read the small fragments of text using the cursor’s x and y coordinates as offsets in the two documents. The most glorious aspect of the program, though, is that you can peek in what the program is thinking, because it sprinkles little stars over the screen at places were a match of the two texts might be considered, and removes them again where such considerations are discarded. The name of the program is derived from this attention-heightening effect.
Iconclass browser
Iconclass is a subject-specific classification system that only existed as a series of books. I developed the backend software for accessing the classification tree. It has been in use many years after I set the last semicolon on Friday 30 November 1990, the day before I moved to Denmark.

CRI, Computer Resources International (Denmark)

SIMPR Part of CGP++ design
Structured Information Management: Processing and Retrieval.
I rewrote Fred Karlsson’s Constraint Grammar Parser in C++. My version is called the “academic version”, which seems flattering, but isn’t. I have read that the “production version”, written sometime later by Pasi Tapanainen, is several times faster. So either this guy has taken corners somewhere or he simply has outsmarted me. I’m afraid the latter is the case. But I am quite happy with the fact that my version is a few times faster than the original version, which was in LISP.
KAVAS-2
Knowledge Acquisition, Visualization and Assessment System.
My introduction to Windows programming. I took care of integrating all project partners’ Windows programs into one application, so that the user had the feeling of interacting with just one single program with a multiple document interface.

CST, Copenhagen University

Scarrie
Scandinavian Proofreading Tools.
TransRouter
TransRouter is a management tool that will assist translation managers to decide the best approach by which to carry out their translation projects. For this project I developed the repetitiveness checker.
STAGING – Multimodal Communication in a Virtual Farm
Staging flow diagram
For this project I developed, among other things, the communication manager, which keeps track of the dialogue that is going on between the user and the virtual agent on the screen, the farmer. Staging supported speech , touch screen and a data glove as inputs and shows a farm in a very simple graphical interface. The communication manager was mainly written in Bracmat, the programming language described above.
TQPro
Translation Quality for Professionals.
Once more I used Bracmat, this time to do a partial parse of a POS-tagged text in order to find constructs that are notoriously difficult to handle by machine translation software.
Lemmatizer
CST’s lemmatizer is an example of brute force Language Technology. You don’t provide the lemmatizer with a list of lemmatization rules. Instead you let the lemmatizer deduct rules from a full form word list that maps full forms onto the corresponding lemma form (e.g. lasts -> last or children -> child).
The program was originally developed for Danish and could also be used for a number of other languages, such as Greek, English, Swedish, Norwegian and Icelandic. The algorithm only looked at the words’ endings, which was not good for a few languages, among which German and Dutch. Most German and Dutch verbs get a prefix or an infix ‘ge’ in the past participle, in addition to a suffix. In the Tvärsök 2 project, I developed a new training algorithm that handles prefixes, infixes and suffixes alike.
For most inflected languages for which plenty of training data are available the results are quite good.
MELFO
(Mobil e-Læring for Ordblinde – Mobile e-Learning for dyslexics)
At last a Language Technology project for a PDA-platform. My task was the implementation of the software for accessing a bilingual term base.
MELFA
(Mobile E Learning For Africa)
An exciting Danish – South African initiative offering Mobile solutions for Literacy Training and Skills Development.
CLARIN-DK
Common Language Resources and Technology InfrastructureIn this project, I am responsible for the ‘Tools module’. Most worth mentioning is the automatic composition (based on user’s wishes as to output) and execution of web-based tool chains.
CLARIN-DK allows integration of language tools as webservices in the CLARIN-DK infrastructure, without the need to change the program code of the infrastructure for each additional tool. In other words, integration takes place by registration, not coding.
A unique feature is the way in which users choose workflows. The tools module takes the burden of editing workflows from the user’s shoulders and instead lets a user declare a goal, wherafter the tools module computes the workflow(s) that can produce the required result, given the input at hand.

Automatic creation of workflows

Automatic creation of workflows

The aformentioned Bracmat is used for some parts of te tools module, most notably the part that computes workflows. Many of the needed facilities for workflow planning, tool profile matching, normalization, memoization of very complex data structures, backtracking, symbolic computing, are available in computer algebra systems. Indeed, workflow planning is quite similar to theorem proving.