Specifications
- The first evaluation campaign will concentrate
on assessing text translation
algorithms on the tourism domain. Translation
directions will be from Chinese,
Italian, Japanese, and Korean into English, for
the primary condition, and any
other direction for the secondary condition.
- Training data will consist of a fixed amount
of English sentences provided
with translations into the respective
source language. Participants will be allowed
to use any additional monolingual resources, e.g.
text corpora, grammars, word lists,
segmentation tools.
- Test data of the primary condition will
consist of English sentences
taken from phrase-books not included
in the training data. Test data for the
secondary condition will consist of manual translations
of the English sentences
into all the considered source
languages.
- The primary condition will be mandatory for all
participants. Participants will
be invited to submit
more runs for each condition, possibly
corresponding to
different translation directions.
Evaluation Protocol
- Automatic scoring will be carried out with the NIST/BLEU
software. In particular, a
server will be set-up which will permit
participants to remotely score the output of
their system. Hence, for each
translation direction, multiple translations will be used
as references.
- Subjective evaluation on the primary condition will
be distributed across the participant
sites. English native speakers will evaluate
the output of each
systems against one
gold-standard reference. Evaluation will follow
guidelines similar to those applied by LDC
in the NIST MT evaluation campaigns.
- While automatic evaluation will be applied to
all submitted runs, subjective evaluation will
be applied to only one run per participant, namely
the first run submitted under the primary
condition.
- Finally, participants are allowed to discuss their results without
restriction. Disclosure of the
results of other participants is not allowed without
their permission.
Important dates
- Test set release: 02 June 2003
- Begin of run submission: 09 June 2003
- End of run submission: 20 June 2003 12:00 GMT
- Subjective evaluation: 30 July 2003
- Test set release: 02 June 2003
- Begin of run submission: 23 June 2003
- End of run submission: 04 July 2003 12:00 GMT
Steering Committee
H. Blanchon
(CLIPS)
M. Federico (ITC-irst)
H. Nakaiwa
(ATR)
S. Oh (ETRI)
A. Tribble (CMU/UKA)
C. Zong (NLPR)
Action Plan with Responsible
- Coordination of evaluation campaign: ITC-irst
- Maintenance of BTEC corpus (training/test data): ATR
- Translation and multiple references production in Chinese, Italian,
Japanese, and Korean:
NLPR, ITC-irst, ATR, ETRI
- Subjective evaluation responsible: CMU/UKA
- Subjective Evaluators: one English native people per participant
- Automatic evaluation responsible: ATR
Registered participants (28 May, 2003)
- ITC-irst, Italy: Italian
- English, Chinese - English
- ATR, Japan:
Japanese -English
- ETRI, Korea:
Korean - English
- NLPR, China:
Chinese - English
- UKA, Germany: Chinese - English