PubAnnotation¶
[ Overview | Features | Format | API ]¶
This page is under improvement. We will appreciate your any comment.
PubAnnotation supports four different types of annotation for modeling of any semantic information expressed in text.
Category annotations (catanns)¶
A category annotation in PubAnnotation consists of specification of a span and its category, identifying the span as a reference to an object of the category. A span is specified by a pair character offsets, begin and end, which are delimited by a colon(':'). Semantically, a category annotation represents an entity identified by the category in the context of surrounding the span. It is also know as a text-bound entity.
[Figure 1]
Id. | span | category | text |
---|---|---|---|
T1 | 193:200 | Gene_expression | produce |
T2 | 201:223 | Protein_family | inflammatory cytokines |
T3 | 232:237 | Protein | IFN-γ |
T4 | 242:245 | Protein | TNF |
In the above example, four entities are identified: gene expression, protein family, and two proteins.
Typical annotations of in this type include text-bound term annotation, named entity annotation, and so on. However, any annotation that may be created by a span selection and category assignment can be classified as a category annotation.
Relation annotations (relanns)¶
A relation annotation consists of specification of a relation type and two objects to be related to each other.
[Figure 2]
Id. | subject | relation | object |
---|---|---|---|
R1 | T2 | themeOf | T1 |
R2 | T2 | coreferenceOf | T3 |
R3 | T2 | coreferenceOf | T4 |
- T2('inflammatory cytokines') is related to T1('produce') by the relationship themeOf, and
- T2 is also related to T3 and T4 by the relationship coreferenceOf.
Instance annotations (insanns)¶
An instance annotation may be used when a text-bound entity needs to be 'instantiated' to different objects.
[Figure 3]
Id. | type | object |
---|---|---|
E1 | instanceOf | T1 |
E2 | instanceOf | T1 |
Id. | subject | relation | object |
---|---|---|---|
R4 | T3 | themeOf | T1 |
R5 | T4 | themeOf | T1 |
Below is the difference of the two annotations:
- in Figure 2, a gene expression event, of which inflammatory cytokines is the theme object, is identified and annotated as such, whereas
- in Figure 3, two gene expression events are identified and annotated as such:
- one is related to IFN-γ, and
- the other is related to TNF.
Another alternative is shown below:
[Figure 4]
In this example, two category annotations are created to the term 'produce'.
Semantically, the annotations in Figure 3 and 4 are almost the same.
A slight different is that Figure 3 represents a natural steps of annotation - term annotation first and then relation annotation -, thus the type of instances (events) are dependent on the category annotation.
The annotation examples shown in Figure 2, 3, and 4 are all possible, and the semantics are more or less similar to each other. It is a matter of modeling, rather than one is right and the others are wrong.
PubAnnotation support all the alternatives, leaving the choice up to the user's decision.
Note that the BioNLP-ST GE task takes on the modeling in Figure 3.
Modification annotations (modanns)¶
A modification annotation is used to represent a relation or instantiation that is negated or speculated.
[Figure 5]
[Category annotations]Id. | span | category | text |
---|---|---|---|
T25 | 1793:1798 | Protein | Runx3 |
T66 | 1806:1815 | Gene_expression | expressed |
T26 | 1793:1798 | Protein | CD4 |
Id. | type | object |
---|---|---|
E1 | subClassOf | T66 |
Id. | type | subject | object |
---|---|---|---|
R19 | themeOf | T25 | E11 |
Id. | type | object |
---|---|---|
M3 | Negation | E11 |
In the above example, the event "gene expression of Runx3" is negated in the text, which is represented as a negation (M3) of the instantiation (E1) of the event gene expression (T66), that is related to the protein 'Runx3' (T25) by the relationship 'themeOf' (R19).
Figure 6 shows an alternative approach:
[Figure 6]
Here, the protein 'Runx' (T25) is directly related to the gene expression event (T66), and the relationship is negated (M3).
Either of above approaches would be possible (a bit different semantics of each component would be required). Again, PubAnnotation is neutral to any approach, leaving the choice up to the user's decision.