Since the release of event-annotated corpora [1, 2], and due to the BioNLP shared task in 2009  and 2011 , many event extraction tools for biological literature have become publicly available. While such tools provide useful functionalities individually, there are several obstacles hindering non-expert users from finding and utilizing the best tools for their specific challenges. First, such tools are not easy to use especially when they need to be customized, e.g. when used with a particular named entity recognizer. Second, individual tools are developed with different user interfaces, and it is often time-consuming to get accustomed with the various usages of tools, especially when multiple systems need to be tested for e.g. comparison. Thus, the interoperability and accessibility are crucial issues to improve the usability.
A similar case can be found with the BioCreative challenge [5, 6] and MetaServer . BioCreative has been particularly concerned with extracting protein-protein interactions (PPIs). In the BioCreative II.5 challenge [5, 6], participants provided PPI extraction tools as web services through the BioCreative MetaServer . Providing a unified interface to the input and output of the various PPI extraction tools, the BioCreative MetaServer enabled easy access to those tools, and showed the necessity of a meta-level service of information systems. In the BioNLP '09 shared task on event extraction , participants presented tools which extract biological events with richer and more fine grained information than the BioCreative challenges. However, the shared task required static files of processed data on a given corpus; event extraction tools themselves were not available. To resolve this issue, our event extraction meta-service now provides interactive event extraction services in the fine grained BioNLP shared task style.
Roughly speaking, the goal of the BioNLP shared task is to extract biological events from literature, given their raw text and protein annotations. The BioNLP shared task defines "txt", "a1" and "a2" formats for this event extraction task. A "txt" format file contains raw text of a biomedical paper, while the corresponding "a1" format file includes protein named entity boundaries annotated on that paper. Participants of the shared task were required to submit "a2" format files, which define extracted events and may refer to protein annotations in the corresponding "a1" files. In the shared task evaluation, submitted "a2" files were compared with the gold standard "a2" files which were manually annotated by human curators.
Our services are interoperable with other UIMA/U-Compare services, which allow users to create customized workflows easily. UIMA, Unstructured Information Management Architecture, is an interoperability framework for unstructured information in general. UIMA is provided as an Apache open source project and is widely used in the NLP domain. A UIMA component can either be a local service or a web service, and both types can be freely mixed to create a UIMA workflow.
U-Compare provides a broad range of UIMA compliant components including BioNLP components such as protein taggers and annotated corpus readers. Compatibility of these components is guaranteed by sharing data type definitions. U-Compare also provides a UIMA compliant integrated NLP platform. The U-Compare platform provides direct access to the U-Compare components, where local components are automatically downloaded and executed on demand. A local component has the advantage of portability although users are required to install the original tool in case of a non-Java implementation. On the other hand, a web service component can have limitations in its computational capacities. The U-Compare platform allows easy workflow creation from these components or any third party UIMA components. Additionally, U-Compare provides a comparison and evaluation feature implemented in a UIMA compliant way . U-Compare shows the results of workflow runs both statistically and visually. All of these features are available without any programming necessity.
We have integrated the bio-event meta-service, which we describe in this paper, to the U-Compare platform. This integration could accelerate developments of text mining in the bioinformatics area. The most straightforward usage of our system would be to combine a few text mining tools and run the resulting pipeline on any text relevant to a specific biological use-case. Our system makes such a usage dramatically easier compared with the existing systems.
While our ready-to-use services themselves are very useful especially for the end users of text mining, comparison of the various bio-event services is critical when the users need to develop a state-of-the-art application. For example, developers need deeper analysis of the behaviours of the event extraction systems in order to select the most suitable service among available services. However, even the original service developers do not know the behaviour of their services, because those services are black-box and different text input would cause unknown behaviours. Therefore, users need to analyse comparisons of the service outputs by inputting text for a specific domain of interest. Our system is the first system to allow such a comparison of the event extraction services that output complex event structures. Our comparison system does not just calculate statistical scores but also helps users to analyse the comparisons by visualization features.
Furthermore, ensemble of the services has large potential to improve the individual performance. It is known in general that an ensemble of the text mining services could improve the performance significantly. Our system allows the creation of such an ensemble for end users.
All of the above use cases require the meta-service, which can provide compatible, interoperable, and ready-to-use bio-event services. As our system supports such usages, users can create text mining applications for their individual purposes in an efficient and effective way.