LREC 2016

LREC 2016 Workshop:

Translation Evaluation:
From Fragmented Tools and Data Sets
to an Integrated Ecosystem

24 May 2016, Portorož, Slovenia

 

About this Workshop

Current MT/HT evaluation approaches, both automatic and manual, are characterised by a high degree of fragmentation, heterogeneity and a lack of interoperability between tools and data sets. As a consequence, it is difficult to reproduce, interpret, and compare evaluation results. The main objective of this workshop is to bring together researchers working on MT and HT evaluation, including providers and users of tools or evaluation approaches (including metrics and methodologies) as well as practitioners (translators, users of MT, LSPs etc.). Topics of interest include but are not limited to:

  • MT/HT evaluation methodologies (incl. scoring mechanisms, integrated metrics)
  • Benchmarks for MT evaluation
  • Data and annotation formats for the evaluation of MT/HT
  • Workbenches, tools, technologies for the evaluation of MT/HT (incl. specialised workflows)
  • Integration of MT/TM, and terminology in industrial evaluation scenarios
  • Evaluation ecosystems
  • Annotation concepts such as MQM, DQF and their implementation in MT evaluation processes

Call for papers

This workshop will take an in-depth look at an area of ever-increasing importance: approaches, tools and data support for the evaluation of human translation (HT) and machine translation (MT), with a focus on MT. Two clear trends have emerged over the past several years. The first trend involves standardising evaluations in research through large shared tasks in which actual translations are compared to reference translations using automatic metrics and/or human ranking. The second trend focuses on achieving high quality translations with the help of increasingly complex data sets that contain many levels of annotation based on sophisticated quality metrics – often organised in the context of smaller shared tasks. In industry, we also observe an increased interest in workflows for high quality outbound translation that combine Translation Memory (TM)/Machine Translation and post-editing. In stark contrast to this trend to quality translation (QT) and its inherent overall approach and complexity, the data and tooling landscapes remain rather heterogeneous, uncoordinated and not interoperable.

The event will bring together MT and HT researchers, users and providers of tools, and users and providers of manual and automatic evaluation methodologies currently used for the purpose of evaluating HT and MT systems. The key objective of the workshop is to initiate a dialogue and discuss whether the current approach involving a diverse and heterogeneous set of data, tools and evaluation methodologies is appropriate enough or if the community should, instead, collaborate towards building an integrated ecosystem that provides better and more sustainable access to data sets, evaluation workflows, approaches and metrics and supporting processes such as annotation, ranking and so on.

Current MT/HT evaluation approaches, both automatic and manual, are characterised by a high degree of fragmentation, heterogeneity and a lack of interoperability between tools and data sets. As a consequence, it is difficult to reproduce, interpret, and compare evaluation results. The main objective of this workshop is to bring together researchers working on MT and HT evaluation, including providers and users of tools or evaluation approaches (including metrics and methodologies) as well as practitioners (translators, users of MT, LSPs etc.).

The workshop is meant to stimulate a dialogue about the commonalities, similarities and differences of the existing solutions in the three areas (1) tools, (2) methodologies, (3) data sets. A key question concerns the high level of flexibility and lack of interoperability of heterogeneous approaches, while a homogeneous approach would provide less flexibility but higher interoperability. How much flexibility and interoperability does the MT/HT research community need? How much does it want?

Another question involves the requirements to establish evaluation methods as actual viable tools. Automatic metrics are used because they are fast, and they correlate sufficiently with human judgement. The manual metric used in WMT (5-way rankings) has been refined to obtain statistically significant results in a reliable and efficient way. However, concerns remain about consistency and agreement in judgements, time needed to carry out judgements, and reliability of sampling methods. Other methods (post-editing impact, understandability tests, (linguistic) error analyses) require even more effort to test their viability. For example, post-editing speed is arguably an intuitive evaluation metric, but inter-translator variability is too high to make a practical use of this metric in a straightforward way.

 

Topics of interest include but are not limited to:

  • MT/HT evaluation methodologies (incl. scoring mechanisms, integrated metrics)
  • Benchmarks for MT evaluation
  • Data and annotation formats for the evaluation of MT/HT
  • Workbenches, tools, technologies for the evaluation of MT/HT (incl. specialised workflows)
  • Integration of MT/TM, and terminology in industrial evaluation scenarios
  • Evaluation ecosystems
  • Annotation concepts such as MQM, DQF and their implementation in MT evaluation processes

We invite contributions on the topics mentioned above and any related topics of interest.

Important dates

  • Publication of the CFP: 10 December 2015
  • Submissions due: 15 February 2016 18 February 2016, 12:00 GMT+1 (deadline extended)
  • Notification of acceptance: 1 March 2016
  • Final version of accepted papers: 31 March 2016
  • Final programme and online proceedings: 15 April 2016
  • Workshop: 24 May 2016 (this event will be a full-day workshop)

Submission

Please submit your papers at https://www.softconf.com/lrec2016/MTEVAL/ before the deadline of 15 February 2016 18 February 2016, 12:00 GMT+1 (deadline extended). Accepted papers will be presented as oral presentations or as posters. All accepted papers will be published in the workshop proceedings.

Papers should be formatted according to the stylesheet soon to be provided on the LREC 2016 website and should not exceed 8 pages, including references and appendices. Papers should be submitted in PDF format through the URL mentioned above.

When submitting a paper, authors will be asked to provide essential information about resources (in a broad sense, i.e., also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).

Programme

09:00
Welcome – introduction – context
Session 1: Tools, Methods and Resources for Research
09:10
Julia Ive, Aurélien Max, François Yvon and Philippe Ravaud:
Diagnosing High-Quality Statistical Machine Translation Using Traces of Post-Edition Operations Abstract 
 
09:30
Ondřej Bojar, Filip Děchtěrenko and Maria Zelenina:
A Pilot Eye-Tracking Study of WMT-Style Ranking Evaluation Abstract 
 
09:50
Anabela Barreiro, Francisco Raposo and Tiago Luís:
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units (short presentation) Abstract 
 
10:00
Zijian Győző Yang, László Laki and Borbála Siklósi:
HuQ: An English-Hungarian Corpus for Quality Estimation (short presentation) Abstract 
 
10:10
Discussion of the papers presented in Session 1
10:30
Coffee break
Session 2: Shared Tasks
11:00
Ondřej Bojar, Christian Federmann, Barry Haddow, Philipp Koehn, Matt Post and Lucia Specia:
Ten Years of WMT Evaluation Campaigns: Lessons Learnt Abstract 
 
11:20
Luisa Bentivogli, Marcello Federico, Sebastian Stüker, Mauro Cettolo, Jan Niehues:
The IWSLT Evaluation Campaign: Challenges, Achievements, Future Directions Abstract 
 
11:40
Discussion of the papers presented in Session 2
Session 3: Evaluation Tools and Metrics (part A)
12:00
Katrin Marheinecke:
Can Quality Metrics Become the Drivers of Machine Translation Uptake? An Industry Perspective. Abstract 
 
12:20
Kim Harris, Aljoscha Burchardt, Georg Rehm, Lucia Specia:
Technology Landscape for Quality Evaluation: Combining the Needs of Research and Industry Abstract 
 
12:40
Eleftherios Avramidis:
Interoperability in MT Quality Estimation or wrapping useful stuff in various ways Abstract 
13:00
Lunch break
Session 3: Evaluation Tools and Metrics (part B)
14:00
Arle Lommel:
Blues for BLEU: Reconsidering the Validity of Reference-Based MT Evaluation Abstract 
 
14:20
Roman Sudarikov, Martin Popel, Ondřej Bojar, Aljoscha Burchardt and Ondřej Klejch:
Using MT-ComparEval Abstract 
 
14:40
Michal Tyszkowski and Dorota Szaszko:
CMT: Predictive Machine Translation Quality Evaluation Metric Abstract 
 
15:00
Aljoscha Burchardt, Kim Harris, Georg Rehm and Hans Uszkoreit:
Towards a Systematic and Human-Informed Paradigm for High-Quality Machine Translation Abstract 
 
15:20
Discussion of the papers presented in Session 3 (part A and part B)
16:00
Coffee break
16:30
Summary – final discussion – next steps: towards an integrated ecosystem?
17:30
End of workshop

Proceedings

Proceedings of the LREC 2016 Workshop
“Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem”

24 May 2016 – Portorož, Slovenia

Edited by Georg Rehm, Aljoscha Burchardt, Ondrej Bojar, Christian Dugast, Marcello Federico, Josef van Genabith, Barry Haddow, Jan Hajic, Kim Harris, Philipp Koehn, Matteo Negri, Martin Popel, Lucia Specia, Marco Turchi, Hans Uszkoreit

Acknowledgments: This work has received funding from the EU’s Horizon 2020 research and inno- vation programme through the contracts CRACKER (grant agreement no.: 645357) and QT21 (grant agreement no.: 645452).

Organising Committee

This workshop is an initiative jointly organised by the EU projects CRACKER and QT21 with support of the Cracking the Language Barrier federation.

Ondřej Bojar
Charles University in Prague, Czech Republic
Aljoscha Burchardt
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Christian Dugast
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Marcello Federico
Fondazione Bruno Kessler (FBK), Italy
Josef van Genabith
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Barry Haddow
University of Edinburgh, UK
Jan Hajič
Charles University in Prague, Czech Republic
Kim Harris
text&form, Germany
Philipp Koehn
Johns Hopkins University, USA, and University of Edinburgh, UK
Matteo Negri
Fondazione Bruno Kessler (FBK), Italy
Martin Popel
Charles University in Prague, Czech Republic
Georg Rehm
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Lucia Specia
University of Sheffield, UK
Marco Turchi
Fondazione Bruno Kessler (FBK), Italy
Hans Uszkoreit
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany

Programme Committee

Nora Aranberri
University of the Basque Country, Spain
Ondřej Bojar
Charles University in Prague, Czech Republic
Aljoscha Burchardt
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Christian Dugast
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Marcello Federico
Fondazione Bruno Kessler (FBK), Italy
Christian Federmann     Microsoft, USA
Rosa Gaudio
Higher Functions, Portugal
Josef van Genabith
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Barry Haddow
University of Edinburgh, UK
Jan Hajič
Charles University in Prague, Czech Republic
Kim Harris
text&form, Germany
Matthias Heyn
SDL, Belgium
Philipp Koehn
Johns Hopkins University, USA, and University of Edinburgh, UK
Christian Lieske
SAP, Germany
Lena Marg
Welocalize, UK
Katrin Marheinecke
text&form, Germany
Matteo Negri
Fondazione Bruno Kessler (FBK), Italy
Martin Popel
Charles University in Prague, Czech Republic
Jörg Porsiel
Volkswagen AG, Germany
Georg Rehm
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany
Rubén Rodriguez de la Fuente   PayPal, Spain
Lucia Specia
University of Sheffield, UK
Marco Turchi
Fondazione Bruno Kessler (FBK), Italy
Hans Uszkoreit
Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany

Contact

Georg Rehm (DFKI, Germany), georg.rehm@dfki.de

LREC 2016

LREC 2016 - 10th International Conference on Language Resources and Evaluation is organised by ELRA with the support of CNR-ILC and will be held at the Grand Hotel Bernardin Conference Center in Portorož, Slovenia, from 23 to 28 May 2016.

LREC 2016 Workshops and Tutorials will be held in the same location on 23, 24 and 28 May 2016.


 

CRACKER_200x60
QT21_200x60
CtLB_200x60