Available Data Sets

In early March 2017, we are still preparing the data sets and exact configuration of downstream applications; data and software releases to participants will be announced incrementally on the page below as they become available.

March 13, 2017
‘Raw’ Parser Input Texts (Version 0.9)

Baseline Components

We have selected a ‘baseline’ stack of simple, yet state-of-the-art pre-processing tools for sentence splitting and tokenization, which will be made available to participants by early April. At the same time, we will distribute pre-processed variants of the parser inputs, for candidate participant who would like to use these as an additional points of reference.

Technical Infrastructure

The EPE 2017 task generalizes conventional notions of dependency representations somewhat and emphasizes a stand-off perspective rather than token-centric representations (as have frequently been employed for parsing shared tasks). For these reasons, the task needs to define its own textual interchange format to cover a broader range of morhpho-syntactico-semantic analysis into dependency representations. This file format will be formally specified by March 27, 2017, and we will work with parser developers to create a collection of conversion tools from common file formats (like for example ConLL-U), to recover character stand-off pointers into the underlying text as well as accomodate generalized dependency graphs transcending rooted trees.

To enable participants to obtain empirical end-to-end results for the development data while preparing their system submissions, the downstream systems (including support for mostly automated re-training) will be provided to participants. Furthermore, the task organizers will aim to provide an automated upload, re-training, and evaluation interface, such that participants can obtain end-to-end feedback more easily (for at least some of the downstream systems).

