Genia event extraction (GE) task, 2013¶
[ Overview | Details | Online Evaluation ( Devel | Test ) | Results ]¶
"Linking text-mining and semantic web technology toward knowledge base construction"
- The GE task in this year explicitly aims to construct a knowledge base, linking text mining and semantic web technology.
- each submission from participants will be transformed to a SPARQL endpoint, and evaluated as a knowledge base.
- The coreference resolution (CO) task is integrated in the GE task in this year.
- The BioNLP-ST GE task is a fine-grained, structural information extraction from biomedical documents.
Domain¶
The GE task began in 2009 [ref] based on the Genia corpus and its event annotation[ref]. The Genia corpus is a collection of documents about NF-kB.
We found the following animations helpful to get a idea about the function of NF-kB:NF-kB is a protein complex that controls the transcription of DNA. ... It plays a key role in regulating the immune response to infection. ... Incorrect regulation of NF-kB has been linked to many diseases and disorders. [wikipedia/NF-kB]
- Classical and Alternative NF-kappaB Pathways
- Immune Response, Toll Like Receptors (TLR) Pathway - IMGENEX
Data¶
While the data sets of GE task will be in a similar shape as in previous GE tasks, modifications will be made as follows:
- The coreference resolution task will be integrated to the GE task. Accordingly the data set will include coreference annotation.
- The negation/speculation annotation will be re-evaluated and revised.
- New annotation will be produced to recent papers, so that recent knowledge can be extracted.
Please refer to the sample data in PubAnnotation.
Format and evaluation¶
Apart from the BioNLP-ST annotation format, the GE task data set is provided in three formats as implemented in PubAnnotation.
- JSON format will provide a machine-friendly format, so that participants do not need to implement a reader or writer of the data files.
- RDF format will enable connection to semantic web technology. Participants do not need to know about the detail of the format, as the conversion to RDF will be made automatically. In the end, the submission from each participant will be stored in a SPARQL endpoint (knowledge base), and evaluation will be performed to measure how much each submission can answer to biological questions, e.g., "show me the snippets talking about regulation of NF-kB phosphorylation" (in SPARQL). Note1: RDF conversion in PubAnnotation is experimental, and will be provided for the BioNLP-ST GE task only. Note2: The current conversion is preliminary and incomplete. We are aiming at completing it before the release of the training data.
- Table(TSV) format is for human reading. While it looks similar to BioNLP-ST annotation format, it is re-designed to be more compatible with statements in RDF triples. For backward-compatibility, an automatic conversion between the BioNLP-ST annotation format and the PubAnnotation table format will be provided.
Participants can choose either of JSON or BioNLP-ST annotation format to get the data files and to submit their results.
For the detail of evaluation, please refer to the page Evaluation.
Sample Data¶
- The sample annotation is available from PubAnnotation in three formats: table(TSV), JSON, and RDF.
- The same data in BioNLP-ST annotation format is also available for download below.
- Although the representation is different, the content is the same in all the formats, except RDF which is not yet complete.
Relevant Resources¶
Resources that would be useful to perform the GE task will be provided through PubAnnotation.Currently, the following is available:
- The Genia coreference annotation.
- Training and development data sets of BioNLP-ST 2009.
Note: Batch downloading from PubAnnotation will become available soon.
Illustration of annotations for events and coreferences¶
The usefulness of coreference resolution for IE
¶
References¶
- Extracting Bio-molecular events from literature - the BioNLP'09 Shared Task, Computational Intelligence, 2011, 27(4).
- The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011, BMC Bioinformatics, 2012, 13:(Suppl 11).
- Corpus annotation for mining biomedical events from literature, BMC Bioinformatics, 2008, 9(10).
Organizers¶
- Jin-Dong Kim (DBCLS, task chair)
- Yue Wang (DBCLS, coreference sub-task)
- Sabine Bergler (UConcordia, negation sub-task)
- Roser Morante (UAntwerp, negation sub-task)
history¶
Genia event extraction task¶
- began from 2009 as the sole task of BioNLP-ST 2009.
- was the first community-wide effort for fine-grained, structural information extraction.
- evaluated the performance of IE from PubMed abstracts.
- was extended to include PMC full text articles in BioNLP-ST 2011.
- the protein coreference task (CO) was also organized in 2011 as a supporting task.
- For 2013, the GE task
- is re-designed to explicitly support knowledge base construction, based on semantic web technology.