The IndiaPoliceEvents Corpus Covering the 2002 Gujarat Violence

A team of political and computational scientists recently trained students to annotate 21,000+ sentences from 1,257 contemporaneous articles about the events published in the Times of India, asking them to categorize whether police officers used force, killed someone, made arrests, failed to intervene, and/or took any other action. The resulting dataset includes the raw annotations as well as final sentence- and document-level classifications.


Automated event extraction in social science applications often requires corpus-level evaluations: for example, aggregating text predictions across metadata and unbiased estimates of recall.We combine corpus-level evaluation requirements with a real-world, social science setting and introduce the INDIAPOLICEEVENTS corpus—all 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002.