CHEMDNER-patents CPD subtask sample text data (Version 27th March 2015)
------------------------------------------------------------------------

This directory contains the sample set text for the CHEMDNER-patents CPD subtask.

1) chemdner_patents_sample_200.txt : Sample set

This file contains plain-text, UTF8-encoded Patent abstracts in a 
tab-separated format with the following three columns:

1- Patent identifier
2- Title of the patent
3- Abstract of the patent

In total 200 abstracts are provided in this sample set (200 titles and 200 abstracts)

3: Patent Abstract


2) chemdner_cpd_gold_standard_sample.tsv

For the CPD (chemical passage detection, text classification task) we will distribute  manually classified patents (title and abstracts) into those that do mention chemical entities and those that do not. The CPD annotations will consists of tab-separated fields containing:

1- Article identifier *
2- Manual classification (1: does contain chemical entities/positive hits, 0: does not contain chemical entities/negative hits)

* The article identifier in this case is composed by the patent identifier 
followed by a qualifier standing for text type separated by '_'. T: for 
title and A. for abstracts).

Note: For this task the participants have to classify each patent title and each patent abstracts 
whether they do mention chemicals (label 1) or they do not mention chemicals (label 0).

The evaluation will be done using the BioCreative Evaluation script available at:

http://www.biocreative.org/resources/biocreative-ii5/evaluation-library/

In this case the ACT - article classification format option will be used.

