CORD-19 research data in different formats – Avro, Parquet, JSONL

We welcome researchers who are leveraging data to fight the Coronavirus. A rich set of data has been main available in the  COVID-19 Open Research Dataset (CORD-19) provided by Allen Institute for AI. We are helping researchers by making data available in multiple formats. To learn more about different data formats and their benefits read our white paper, Introduction to Big Data Formats: Understanding Avro, Parquet, and ORC.

AVRO Download  : cord19_all_data_in_avro.tgz          517.4MB

Parquet Download : cord19_all_data_in_parquet.tgz    485.2MB

JSONL Downloads : cord19_all_data_in_jsonl.tgz         660.1MB

Would it be helpful for you to get data in any other format, such as ORC? Email us and let us know at