                              ImCor 1.0 
                        =====================
                           November 20, 2005 
                   Matthew Johnson, mj293@cam.ac.uk
                       University of Cambridge 
                       
ImCor is a corpus linking document, linking the Corel Image database to the 
SemCor sense-annotated text corpus.  The details of the process by which it 
was created can be read in the paper available at 
http://mi.eng.cam.ac.uk/~mj293/publications/barnard04word.pdf.  The files 
in the archive come in two formats.  

The first is an XML file which is self-explanatory (by the very nature of 
XML).  However, the terms it uses deserve some explanation.  First, the 
"pos" attribute for each "word" element is the part of speech abbreviation 
as found in SemCor and assigned by Brill's PoS tagger.  These are
CC  	Coordinating conjunction
CD 		Cardinal number
DT 		Determiner
EX 		Existential "there"
FW 		Foreign word
IN 		Preposition or subordinating conjunction
JJ 		Adjective
JJR 	Adjective, comparative
JJS 	Adjective, superlative
LS 		List item marker
MD 		Modal
NN 		Noun, singular or mass
NNP 	Proper noun, singular
NNPS 	Proper noun, plural
NNS 	Noun, plural
NP 		Proper noun, singular
NPS 	Proper noun, plural
PDT 	Predeterminer
POS 	Possessive ending
PP 		Personal pronoun
PR 		Pronoun
PRP 	Pronoun
PRPS 	Pronoun, plural
RB 		Adverb
RBR 	Adverb, comparative
RBS 	Adverb, superlative
RP 		Particle
SYM 	Symbol
TO 		"to"
UH 		Interjection
VB 		Verb, base form
VBD 	Verb, past tense
VBG 	Verb, gerund or present participle
VBN 	Verb, past participle
VBP 	Verb, non-3rd person singular present
VBZ 	Verb, 3rd person singular present
WDT 	Wh-determiner
WP  	Wh-pronoun
WPS 	Possessive wh-pronoun
WRB 	Wh-adverb
The "wnsn" attribute stands for "WordNet sense number".  For more information 
on WordNet, visit http://wordnet.princeton.edu/.  If two or more sense numbers 
are equally appropriate for a word, they are separated by a semi-colon, ';'.  
Finally, the "lemma" attribute is the basic form of the word (i.e. the word 
as it appears in WordNet).  For example, the basic form of "birds" is "bird", 
the basic form of "Greg" is "person", and so forth.

The second is a text file.  Each line follows the format:
<image number> <lemma>_<wnsn> ...
In cases where the same image has several annotations, 1,000,000 has been
added to the image number for each (thus if image 50000 had 3 annotations,
they would be present as 50000, 1050000, and 2050000).  Only singular and
plural nouns are included in this file.

Use of ImCor is regulated under the GNU Public License, included in this 
archive.