Registration of Intent

To make your interest in the EPE 2017 task known to the organizers, and to receive updates on data and infrastructure availability, please self-subscribe to the mailing list for (infrequent) EPE announcments.  The mailing list archives are available publicly.  We may ask for a mildly more formal registration of candidate participants in connection with the trial run in late April (see the task schedule and below; more information to come).

Access Information

The ‘raw’ parser inputs representing the training and development data for the various downstream applications have been available since mid-March. Please see the infrastructure overview for download links.

System Submissions

Participation in the shared task requires submission of parser outputs for the ‘raw’ texts comprising the evaluation data. To generalize over a broad variety of different dependency representations and to provide a uniform interface to the various downstream applications, EPE 2017 defines its own interchange format for morpho-syntactico-semantic dependency graphs.  An example file (providing UD-like analyses for the development text from the negation application) demonstrates the required format for system submissions.  Unlike a venerable string of tabular-separated (CoNLL-like) file formats, the EPE serialization of dependency representations is tokenization-agnostic (nodes can correspond to arbitrary and potentially overlapping or empty sub-strings of the underlying document), has no hard-wired assumptions about the range of admissible annotations on nodes, naturally lends itself to graphs transcending rooted trees (including different notions of ‘roots’ or top-level ‘heads’), and straightforwardly allows framework-specific extensions.

The EPE interchange format serializes a sequence of dependency graphs as a stream of JSON objects, using the newline-separated so-called JSON Lines convention.  Each dependency graph has the top-level properties id (an integer) and nodes, with the latter being an array of node objects. Each node, in turn, bears its own id, form (a string, the surface form), and start and end character ranges (integers).  Furthermore, nodes can have properties and edges, where the former is a JSON object representing an arbitrary attribute–value matrix, for example containing properties like pos, lemma, or more specific morpho-syntactic features.

The encoding of graph structure in the EPE interchange format is by virtue of the edges property on nodes, whose value is an array of edge objects, each with at least the following properties: label (a string, the dependency type) and target (an integer, the target node).  Thus, edges in the EPE encoding are directed from the head (or predicate) to the dependent (or argument).  Unlike for nodes, there is no meaningful ordering information among edges, i.e. the value of the edges property is interpreted as a set.  Conversely, encoding each edge as its own JSON object makes possible framework-specific extensions; for example, a future UD parser could output an additional boolean basic property, to distinguish so-called ‘basic’ and ‘enhanced’ dependencies.

Finally, adopting the terminology of Kuhlmann & Oepen (2016), the EPE interchange format supports the optional designation of one or more ‘top’ nodes.  In classic syntactic dependency trees, these would correspond to a (unique and obligatory) root, while in the SDP semantic dependencies, for example, top nodes correspond to a semantic head or highest-scoping predicate and can have incoming edges.  In the JSON encoding, nodes can bear a boolean top property (where absence of the property is considered equivalent to a false value).

Pre-Evaluation Trial Run

To generalize the downstream applications to work with different types of dependency representations, the task co-organizers depend on the availability of the broades possible range of different parser outputs (compatible with the EPE definition of dependency representations) and packaged in the EPE interchange format.  To initiate a working relationship with parser developers and facilitate mutual feedback, the task schedule foresees a ‘trial run’ period in early to mid-April.  Candidate participants are asked to run the training and development data for all downstream applications (in total, about half a million whitespace-separated tokens) through their parsers, serialize parsing results in the EPE interchange format, and make parser outputs available to the task organizers.  In particular, please (a) parse all the ‘.txt’ files in the ‘training/’ and ‘development/’ sub-directories of our most recent parser input package (version 1.2); (b) put parser outputs into a parallel directory structure, using parallel file names but replacing the ‘.txt’ suffix with ‘.epe’; and (c) package all the EPE files up into a compressed archive and email a download link.  Where parsers support interestingly different dependency outputs (e.g. propagating dependencies into coordinate structures, or pushing some lexical information directly onto dependency edges), multiple submissions will be very welcome.

XHTML 1.0 | Last updated: 2017-04-13 (23:04)