Genia event extraction (GE) task, 2013

[ Overview | Details | Online Evaluation ( Devel | Test ) | Results ]

"Linking text-mining and semantic web technology toward knowledge base construction"

  • The GE task in this year explicitly aims to construct a knowledge base, linking text mining and semantic web technology.
    • each submission from participants will be transformed to a SPARQL endpoint, and evaluated as a knowledge base.
  • The coreference resolution (CO) task is integrated in the GE task in this year.

Domain

The GE task began in 2009 [ref] based on the Genia corpus and its event annotation[ref]. The Genia corpus is a collection of documents about NF-kB.

NF-kB is a protein complex that controls the transcription of DNA. ... It plays a key role in regulating the immune response to infection. ... Incorrect regulation of NF-kB has been linked to many diseases and disorders. [wikipedia/NF-kB]

We found the following animations helpful to get a idea about the function of NF-kB:

Data

While the data sets of GE task will be in a similar shape as in previous GE tasks, modifications will be made as follows:

  • The coreference resolution task will be integrated to the GE task. Accordingly the data set will include coreference annotation.
  • The negation/speculation annotation will be re-evaluated and revised.
  • New annotation will be produced to recent papers, so that recent knowledge can be extracted.

Please refer to the sample data in PubAnnotation.

Format and evaluation

Apart from the BioNLP-ST annotation format, the GE task data set is provided in three formats as implemented in PubAnnotation.

  • JSON format will provide a machine-friendly format, so that participants do not need to implement a reader or writer of the data files.
  • RDF format will enable connection to semantic web technology. Participants do not need to know about the detail of the format, as the conversion to RDF will be made automatically. In the end, the submission from each participant will be stored in a SPARQL endpoint (knowledge base), and evaluation will be performed to measure how much each submission can answer to biological questions, e.g., "show me the snippets talking about regulation of NF-kB phosphorylation" (in SPARQL). Note1: RDF conversion in PubAnnotation is experimental, and will be provided for the BioNLP-ST GE task only. Note2: The current conversion is preliminary and incomplete. We are aiming at completing it before the release of the training data.

Participants can choose either of JSON or BioNLP-ST annotation format to get the data files and to submit their results.

For the detail of evaluation, please refer to the page Evaluation.

Sample Data

  • The sample annotation is available from PubAnnotation in three formats: table(TSV), JSON, and RDF.
  • The same data in BioNLP-ST annotation format is also available for download below.
  • Although the representation is different, the content is the same in all the formats, except RDF which is not yet complete.

Relevant Resources

Resources that would be useful to perform the GE task will be provided through PubAnnotation.
Currently, the following is available:

Note: Batch downloading from PubAnnotation will become available soon.

Illustration of annotations for events and coreferences

The usefulness of coreference resolution for IE

References

Organizers

  • Jin-Dong Kim (DBCLS, task chair)
  • Yue Wang (DBCLS, coreference sub-task)
  • Sabine Bergler (UConcordia, negation sub-task)
  • Roser Morante (UAntwerp, negation sub-task)

history

Genia event extraction task

  • began from 2009 as the sole task of BioNLP-ST 2009.
    • was the first community-wide effort for fine-grained, structural information extraction.
    • evaluated the performance of IE from PubMed abstracts.
  • was extended to include PMC full text articles in BioNLP-ST 2011.
    • the protein coreference task (CO) was also organized in 2011 as a supporting task.
  • For 2013, the GE task
    • is re-designed to explicitly support knowledge base construction, based on semantic web technology.