Home   |   Structure   |   Research   |   Resources   |   Members   |   Training   |   Activities   |   Contact


Corpus de Português Escrito em Periódicos (CoPEP)

CoPEP - Corpus de Português Escrito em Periódicos (Corpus of Written Portuguese from Journals) (Kuhn and Ferreira, 2016) was especially compiled for the lexicographic project designing an online corpus-driven dictionary of Portuguese for university students (PhD research of author X). Data for this dictionary should be representative of the way language is used by expert writers from Brazil and Portugal in academic written productions in different areas of knowledge. CoPEP was built to comply with this demand.

CoPEP contains around 10.000 texts extracted from journals published on the Brazilian and Portuguese national collections of SciELO (Scientific Electronic Library Online), distributed among six Great Areas, which in turn are grouped in three Schools of Knowledge, totalling over 40 M words. It is a synchronic corpus, the vast majority of its texts having been published between 2000 and 2016 (only 2% of texts are from the 1990s). The supcorpora for each language variety have almost exactly the same size and a similar number of words per both Great Areas and Schools, making it evenly balanced.

Metadata have been carefully recorded in order to allow advanced corpus search options, e.g. year of publication, or Great Area of Knowledge. Besides, interoperability with SciELO is available through the journals’ ISSN numbers, which were also retained as metadata