ChroniclItaly

ChoniclItaly is a corpus of Italian language newspapers published in the United States between 1898 and 1922. Specifically, it gathers 12,918,917 digitized pages of seven Italian language newspapers, for a total of 4,810 issues published in seven States; California, Massachusetts, Pennsylvania, District of Columbia, Piedmont, Vermont, and West Virginia. The newspapers titles are: L’Italia, Cronaca sovversiva, La libera parola, The patriot, La ragione, La rassegna, and La sentinella del West Virginia. The pages were downloaded in txt format from Chronicling America (CA) (https://chroniclingamerica.loc.gov/), an Internet-based, searchable database of U.S. newspapers published in the United States from 1789 to 1963 and digitised by the Library of Congress with descriptive information and digitization of historic pages. CA is publicly visible and access is not restricted in any way. The txt format allows for the linguistic data processing. ChroniclItaly totals up to 16,624,571 words. The files are arranged in two types: chronological and by newspaper's title to allow for either diachronic analysis or comparative.


Descriptive

Discipline
Humanities - Other humanities (6.5)
Version
1.0
Language
it - Italian
Tag(s)
Italian language newspapers Ethnic press Historical newspapers Ethnic press in the United States

Administrative

Data Classification
Public

System

Persistent Identifier
DOI: 10.24416/UU01-T4YMOW
Publication Date
June 21, 2018
Last Modification
June 21, 2018, 10:03 GMT+0200

Rights

Creator
Lorella Viola
Person Identifier
ORCID: 0000-0001-9994-0841
Affiliation
Utrecht University
Contributor
Jaap Verheul
Affiliation
Utrecht University
License
Open Data Commons Attribution License (ODC-By) v1.0

Data Access

The data is open access. Use this link https://i-lab.public.data.uu.nl/vault-ocex/ChroniclItaly - Italian American newspapers corpus from 1898 to 1920[1529330521] to access this data package.