Available Resources

Since early March 2017, we have been preparing the data sets and exact configurations of downstream applications; data and software releases to participants are announced incrementally on the page below as they become available.

April 18, 2017
Corrected Preprocessed Inputs (Version 1.3)
April 13, 2017
‘Raw’ and Preprocessed Inputs (Version 1.2)
April 9, 2017
Trial Version of Converter and Preprocessor
‘Raw’ Parser Input Texts (Version 1.1)
March 28, 2017
‘Raw’ Parser Input Texts (Version 1.0)
March 23, 2017
EPE Interchange Format Example
March 13, 2017
‘Raw’ Parser Input Texts (Version 0.9)

Technical Infrastructure

The EPE 2017 task generalizes conventional notions of dependency representations somewhat and emphasizes a stand-off perspective rather than token-centric representations (as have frequently been employed for parsing shared tasks). For these reasons, the task needs to define its own textual interchange format to cover a broader range of morhpho-syntactico-semantic analysis into dependency representations. This file format has been formally specified in late March, 2017, and we will work with parser developers to create a collection of conversion tools from other common formats (like for example ConLL-U), to recover character stand-off pointers into the underlying text as well as accomodate generalized dependency graphs transcending rooted trees.

A trial release of such a converter was released on April 9, 2017, and is available in binary form for 64-bit x86 Linux environments. The EPE sample file, for example, was produced by converting the native UDPipe parser output for the negation development data and converting to the EPE interchange format as follows:

  ./logon/bin/epe --convert --raw negation/development/raw.txt \
    negation/development/udpipe.conllu /tmp/sample.epe

To enable participants to obtain empirical end-to-end results for the development data while preparing their system submissions, the downstream systems (including support for mostly automated re-training) will be provided to participants. Furthermore, the task organizers will try to establish an automated upload, re-training, and evaluation interface, such that participants can obtain end-to-end feedback more easily (for at least some of the downstream systems).

Baseline Components

We have selected a ‘baseline’ stack of simple, yet state-of-the-art pre-processing tools for sentence splitting, tokenization, part of speech tagging, and lemmatization (Velldal et al. 2012; pp. 370–372).  These are available to participants since early April as part of the trial release of the format converter and text preprocessor.  For example, one might use the following command to prepare the ‘raw’ development text for the negation analysis downstream application for parsing with a system that expects tokenized and morphologically analyzed inputs:

  ./logon/bin/epe --prepare negation/development/raw.txt /tmp/sample.tt

Starting with version 1.2 (and onwards) of the EPE 2017 parser inputs, these automatically pre-processed variants of the ‘raw’ texts are included in the data package.  Candidate participants are welcome to start from either the ‘raw’, running texts or use any part or all of the segmentation and morpholological information provided in the pre-processed files.

XHTML 1.0 | Last updated: 2017-04-18 (09:04)