
LREC 2016 Workshop:
Translation Evaluation:
From Fragmented Tools and Data Sets
to an Integrated Ecosystem
24 May 2016, Portorož, Slovenia
About this Workshop
Current MT/HT evaluation approaches, both automatic and manual, are characterised by a high degree of fragmentation, heterogeneity and a lack of interoperability between tools and data sets. As a consequence, it is difficult to reproduce, interpret, and compare evaluation results. The main objective of this workshop is to bring together researchers working on MT and HT evaluation, including providers and users of tools or evaluation approaches (including metrics and methodologies) as well as practitioners (translators, users of MT, LSPs etc.). Topics of interest include but are not limited to:
- MT/HT evaluation methodologies (incl. scoring mechanisms, integrated metrics)
- Benchmarks for MT evaluation
- Data and annotation formats for the evaluation of MT/HT
- Workbenches, tools, technologies for the evaluation of MT/HT (incl. specialised workflows)
- Integration of MT/TM, and terminology in industrial evaluation scenarios
- Evaluation ecosystems
- Annotation concepts such as MQM, DQF and their implementation in MT evaluation processes
Call for papers
This workshop will take an in-depth look at an area of ever-increasing importance: approaches, tools and data support for the evaluation of human translation (HT) and machine translation (MT), with a focus on MT. Two clear trends have emerged over the past several years. The first trend involves standardising evaluations in research through large shared tasks in which actual translations are compared to reference translations using automatic metrics and/or human ranking. The second trend focuses on achieving high quality translations with the help of increasingly complex data sets that contain many levels of annotation based on sophisticated quality metrics – often organised in the context of smaller shared tasks. In industry, we also observe an increased interest in workflows for high quality outbound translation that combine Translation Memory (TM)/Machine Translation and post-editing. In stark contrast to this trend to quality translation (QT) and its inherent overall approach and complexity, the data and tooling landscapes remain rather heterogeneous, uncoordinated and not interoperable.
The event will bring together MT and HT researchers, users and providers of tools, and users and providers of manual and automatic evaluation methodologies currently used for the purpose of evaluating HT and MT systems. The key objective of the workshop is to initiate a dialogue and discuss whether the current approach involving a diverse and heterogeneous set of data, tools and evaluation methodologies is appropriate enough or if the community should, instead, collaborate towards building an integrated ecosystem that provides better and more sustainable access to data sets, evaluation workflows, approaches and metrics and supporting processes such as annotation, ranking and so on.
Current MT/HT evaluation approaches, both automatic and manual, are characterised by a high degree of fragmentation, heterogeneity and a lack of interoperability between tools and data sets. As a consequence, it is difficult to reproduce, interpret, and compare evaluation results. The main objective of this workshop is to bring together researchers working on MT and HT evaluation, including providers and users of tools or evaluation approaches (including metrics and methodologies) as well as practitioners (translators, users of MT, LSPs etc.).
The workshop is meant to stimulate a dialogue about the commonalities, similarities and differences of the existing solutions in the three areas (1) tools, (2) methodologies, (3) data sets. A key question concerns the high level of flexibility and lack of interoperability of heterogeneous approaches, while a homogeneous approach would provide less flexibility but higher interoperability. How much flexibility and interoperability does the MT/HT research community need? How much does it want?
Another question involves the requirements to establish evaluation methods as actual viable tools. Automatic metrics are used because they are fast, and they correlate sufficiently with human judgement. The manual metric used in WMT (5-way rankings) has been refined to obtain statistically significant results in a reliable and efficient way. However, concerns remain about consistency and agreement in judgements, time needed to carry out judgements, and reliability of sampling methods. Other methods (post-editing impact, understandability tests, (linguistic) error analyses) require even more effort to test their viability. For example, post-editing speed is arguably an intuitive evaluation metric, but inter-translator variability is too high to make a practical use of this metric in a straightforward way.
Topics of interest include but are not limited to:
- MT/HT evaluation methodologies (incl. scoring mechanisms, integrated metrics)
- Benchmarks for MT evaluation
- Data and annotation formats for the evaluation of MT/HT
- Workbenches, tools, technologies for the evaluation of MT/HT (incl. specialised workflows)
- Integration of MT/TM, and terminology in industrial evaluation scenarios
- Evaluation ecosystems
- Annotation concepts such as MQM, DQF and their implementation in MT evaluation processes
We invite contributions on the topics mentioned above and any related topics of interest.
Important dates
- Publication of the CFP: 10 December 2015
- Submissions due:
15 February 201618 February 2016, 12:00 GMT+1 (deadline extended) - Notification of acceptance: 1 March 2016
- Final version of accepted papers: 31 March 2016
- Final programme and online proceedings: 15 April 2016
- Workshop: 24 May 2016 (this event will be a full-day workshop)
Submission
Please submit your papers at https://www.softconf.com/lrec2016/MTEVAL/ before the deadline of 15 February 2016 18 February 2016, 12:00 GMT+1 (deadline extended). Accepted papers will be presented as oral presentations or as posters. All accepted papers will be published in the workshop proceedings.
Papers should be formatted according to the stylesheet soon to be provided on the LREC 2016 website and should not exceed 8 pages, including references and appendices. Papers should be submitted in PDF format through the URL mentioned above.
When submitting a paper, authors will be asked to provide essential information about resources (in a broad sense, i.e., also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.) to enable their reuse and replicability of experiments (including evaluation ones).
Programme
Diagnosing High-Quality Statistical Machine Translation Using Traces of Post-Edition Operations Abstract
A Pilot Eye-Tracking Study of WMT-Style Ranking Evaluation Abstract
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units (short presentation) Abstract
HuQ: An English-Hungarian Corpus for Quality Estimation (short presentation) Abstract
Ten Years of WMT Evaluation Campaigns: Lessons Learnt Abstract
The IWSLT Evaluation Campaign: Challenges, Achievements, Future Directions Abstract
Can Quality Metrics Become the Drivers of Machine Translation Uptake? An Industry Perspective. Abstract
Technology Landscape for Quality Evaluation: Combining the Needs of Research and Industry Abstract
Interoperability in MT Quality Estimation or wrapping useful stuff in various ways Abstract
Using MT-ComparEval Abstract
CMT: Predictive Machine Translation Quality Evaluation Metric Abstract
Towards a Systematic and Human-Informed Paradigm for High-Quality Machine Translation Abstract
Proceedings
“Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem”
24 May 2016 – Portorož, Slovenia
Edited by Georg Rehm, Aljoscha Burchardt, Ondrej Bojar, Christian Dugast, Marcello Federico, Josef van Genabith, Barry Haddow, Jan Hajic, Kim Harris, Philipp Koehn, Matteo Negri, Martin Popel, Lucia Specia, Marco Turchi, Hans Uszkoreit
Acknowledgments: This work has received funding from the EU’s Horizon 2020 research and inno- vation programme through the contracts CRACKER (grant agreement no.: 645357) and QT21 (grant agreement no.: 645452).
Translation Evaluation:
From Fragmented Tools and Data Sets to an Integrated Ecosystem
PROCEEDINGS
Organising Committee
This workshop is an initiative jointly organised by the EU projects CRACKER and QT21 with support of the Cracking the Language Barrier federation.
Programme Committee
Contact
LREC 2016
LREC 2016 - 10th International Conference on Language Resources and Evaluation is organised by ELRA with the support of CNR-ILC and will be held at the Grand Hotel Bernardin Conference Center in Portorož, Slovenia, from 23 to 28 May 2016.
LREC 2016 Workshops and Tutorials will be held in the same location on 23, 24 and 28 May 2016.