This page collects links to resources for academics and other investigators, including websites of our collaborators and academic partners, and datasets pertinent to the COVID-19 pandemic.

Contact info@socialmediaforpublichealth.org to add resources to this list.

Our Resources

Twitter Social Mobility Index
This site hosts the Twitter Social Mobility Index, a measure of social distancing and travel derived from Twitter data. We use public geolocated Twitter data to produce a metric on social mobility for the US.
http://socialmobility.covid19dataresources.org/index

Coronavirus Twitter Data
We distribute this Twitter data to support research efforts in the fight against COVID-19. The dataset contains Twitter ids, from which you can download the original data directly from Twitter. Additionally, we include the date, keywords related to COVID-19 and the inferred geolocation.
http://twitterdata.covid19dataresources.org/

Related Initiatives

The Pandemic Project
The study of people during COVID-19
https://utpsyc.org/covid19/

The Johns Hopkins Coronavirus Resource Center
Johns Hopkins experts in global public health, infectious disease, and emergency preparedness have been at the forefront of the international response to COVID-19.
https://coronavirus.jhu.edu

Crowdfight COVID-19
An initiative from the scientific community to put all available resources at the service of the fight against COVID-19
https://crowdfightcovid19.org

COVID Act Now
Created by a team of data scientists, engineers, and designers in partnership with epidemiologists, public health officials, and political leaders to help understand how the COVID-19 pandemic will affect their region.
https://covidactnow.org

COVID-19 Social Science Research Tracker
This international list tracks new research about COVID 19, including published findings, pre-prints, projects underway, and projects at least at proposal stage related to social science.
https://github.com/natematias/covid-19-social-science-research/

COVID-19 Open Research Dataset Challenge (CORD-19)
In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19).
https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Neural Covidex
Neural Covidex applies state-of-the-art neural network models and artificial intelligence (AI) techniques to answer questions using the COVID-19 Open Research Dataset (CORD-19) provided by the Allen Institute for AI (data release of April 10, 2020), which currently contains over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19 and coronavirus-related research, drawn from a variety of sources including PubMed, a curated list of articles from the WHO, as well as preprints from bioRxiv and medRxiv. In addition, we also support search on 100+ randomized controlled trials (published and ongoing) related to COVID-19 provided by Trialstreamer.
https://covidex.ai/

Awesome Coronavirus19 Dataset
A repository containing resources related to COVID-19.
https://github.com/bigheiniu/awesome-coronavirus19-dataset

CORD-19 Information Aggregator
This is a tool to browse answers the scientific literature may provide regarding various questions about the novel coronavirus and COVID-19.
http://phontron.com/misc/cord19_report/

Center for Informed Democracy & Social – cybersecurity (IDeaS)- Coronavirus Misinformation
A list of identified misinformation regarding coronavirus.
https://www.cmu.edu/ideas-social-cybersecurity/research/coronavirus.html

Coronavirus Tech Handbook
The Coronavirus Tech Handbook provides a library for technologists, civic organisations, public and private institutions, researchers, educators and specialists of all kinds to collaborate on an agile and sophisticated response to the coronavirus outbreak and sequential impacts.
https://coronavirustechhandbook.com/home

COVIDSearch: Making Sense of [Lots of] Open Data Related to COVID-19
A TREC style shared task with the goal of evaluating search algorithms and systems for helping scientists, clinicians, policy makers, and others manage the existing and the rapidly growing corpus of scientific literature related to COVID-19
https://dmice.ohsu.edu/hersh/COVIDSearch.html

Data Against COVID-19
A clearinghouse for matching requests for data cleaning of such datasets with volunteers willing to perform this clearing.
https://www.data-against-covid.org/

Academic Data Science Alliance
The Academic Data Science Alliance is working with partners to pull together data and data science resources related to the COVID-19 pandemic. This is a living list of resources and we welcome additions, suggestions, and collaborations.
https://www.academicdatascience.org/covid

COVID-19 Data Collaboratives
A list of COVID-19 related data projects.
https://docs.google.com/document/d/1JWeD1AaIGKMPry_EN8GjIqwX4J4KLQIAqP09exZ-ENI/mobilebasic

Estimating COVID-19’s $R_t$ in Real-Time
A modified version of a solution created by Bettencourt & Ribeiro 2008 to estimate real-time $R_t$ using a Bayesian approach. While this paper estimates a static $R$ value, here we introduce a process model with Gaussian noise to estimate a time-varying $R_t$.
https://github.com/k-sys/covid-19/blob/master/Realtime%20R0.ipynb

Poynter: The CoronaVirusFacts/DatosCoronaVirus Alliance Database
A database that gathers all of the falsehoods that have been detected by the CoronaVirusFacts/DatosCoronaVirus alliance. This database unites fact-checkers in more than 70 countries and includes articles published in at least 40 languages. 
https://www.poynter.org/ifcn-covid-19-misinformation/


Twitter Data

The following is a list of datasets containing social media and web data related to COVID-19.

COVID-19: The First Public Coronavirus Twitter Dataset
A multilingual coronavirus Twitter dataset starting in January 22, 2020.
https://arxiv.org/abs/2003.07372
https://medium.com/@isiminds/twitter-dataset-related-to-covid-19-coronavirus-released-b0610c718910
https://medium.com/@isiminds/twitter-covid-19-preliminary-geo-analysis-83f43fb4e0c3

GWU Libraries Dataverse: Coronavirus Tweet Ids
This dataset contains the tweet ids of 51,798,932 tweets related to Coronavirus or COVID-19. They were collected between March 3, 2020 and March 19, 2020 (midnight UTC-0) from the Twitter API using Social Feed Manager.
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/LW0BTB

TweetSets from GWU
Twitter datasets for research and archiving. Create your own Twitter dataset from existing datasets.
https://tweetsets.library.gwu.edu/

Crowdbreak Twitter Dataset
The data related to COVID-19 and vaccines and has been collected through the Twitter filter stream API using a list of keywords and languages.
https://www.crowdbreaks.org/en/data_sharing

Covid-19 Twitter chatter dataset for scientific use
Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. Since our first release we have received additional data from our new collaborators, allowing this resource to grow to its current size. Dedicated data gathering started from March 11th yielding over 4 million tweets a day.
http://www.ipanacealab.org/covid19/

Corona Virus (COVID-19) Tweets Dataset
This dataset includes CSV files which contain the tweet IDs. The tweets have been collected by the LSTM model deployed here at sentiment.live. The model monitors the real-time Twitter feed for corona virus-related tweets, using filters: language “en”, and keywords “corona”, “coronavirus”, “covid”, “covid19” and variants of “sarscov2”. As per the Twitter Developer Policy, it is not possible for me to provide information other than the Tweet IDs (this dataset has been completely re-designed on March 20, 2020, to comply with data sharing policies set by Twitter).

https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset

COVID-19-TweetIDs
The repository contains an ongoing collection of tweets IDs associated with the novel coronavirus COVID-19 (SARS-CoV-2), which commenced on January 28, 2020.

https://github.com/echen102/COVID-19-TweetIDs

Coronavirus Data Resource Hub
We’ve collected the most up-to-date and trusted open data related to COVID-19. Scientists, analysts, researchers, entire businesses, and others all over the world are working together in data.world to track trends, find clues, and share insights. And now, so can you. Welcome to the largest, most diverse team seeking a global solution.
https://data.world/resources/coronavirus/


Data from Natural Language Processing

Knowledge Extraction to Assist Scientific Discovery from Corona Virus Literature
Knowledge extraction NLP algorithms applied to the CORD-19 dataset of Coronavirus related scientific papers
http://blender.cs.illinois.edu/covid19/

COVID-QA
A collection of COVID-19 Q&A pairs and transformer baselines for evaluating question-answering models
Description: https://www.reddit.com/r/MachineLearning/comments/g2meuf/p_we_are_releasing_a_new_dataset_of_questions_and/
Link: https://www.kaggle.com/xhlulu/covidqa

CLEF2020 Twitter Data Task
Enabling Automatic Identification and Verification of Claims in Social Media:
The mission of the lab is to foster the development of technology that would enable the automatic verification of claims. Automated systems for claim identification and verification can be very useful as supportive technology for investigative journalism, as they could provide help and guidance, thus saving time.
https://sites.google.com/view/clef2020-checkthat/

The Coronavirus Corpus
The Coronavirus Corpus is designed to be the definitive record of the social, cultural, and economic impact of the coronavirus (COVID-19) in 2020 and beyond. Unlike resources like Google Trends (which just show what people are searching for), the corpus shows what people are actually saying in online newspapers and magazines in 20 different English-speaking countries.
https://www.english-corpora.org/corona/

NLP for COVID-19 Research Data
This website is dedicated to collecting and sharing available NLP resources for COVID-19, including publications, datasets, tools, vocabularies, and events. Our ultimate goal is to promote re-use of existing NLP resources to facilitate COVID-19 research.
http://www.covid19nlp.org/

Projects with Search Data

Tracking COVID-19 using online search
This work develops an unsupervised model for COVID-19 using search data
https://github.com/vlampos/covid-19-online-search

Coronavirus Google Searches Could Save Lives
An analysis of search data for COVID-19 based on location.
https://onezero.medium.com/google-needs-to-share-the-data-from-coronavirus-searches-62e6f60cc363

Text Data

COVID-19 Open Research Dataset (CORD-19)
Over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses
https://pages.semanticscholar.org/coronavirus-research

TREC-COVID: Building a Pandemic Retrieval Test Collection
Researchers, clinicians, and policy makers involved with the response to COVID-19 are constantly searching for reliable information on the virus and its impact. This presents a unique opportunity for the information retrieval (IR) and text processing communities to contribute to the response to this pandemic, as well as to study methods for quickly standing up information systems for similar future events. The results of the TREC-COVID Challenge will identify answers for some of today’s questions while building infrastructure to improve tomorrow’s search systems.
TREC-COVID will follow the TREC model for building IR test collections through community evaluations of search systems. The document set to be used in the challenge is the COVID-19 Open Research Dataset (CORD-19). This is a collection of biomedical literature articles that will be updated weekly. Accordingly, TREC-COVID will consist of a series of rounds, with each round using a later version of the document set and a larger set of COVID-related topics.
https://www.nist.gov/news-events/news/2020/04/nist-and-ostp-launch-effort-improve-search-engines-covid-19-research
https://ir.nist.gov/covidSubmit/

Social Mobility Data

Unacast Social Distancing Scoreboard
A tool to provide organizations fighting COVID-19 with an understanding of the efficacy of social distancing initiatives — currently seen as the most effective way of slowing the spread of the virus.
https://www.unacast.com/covid19/social-distancing-scoreboard

Google COVID-19 Community Mobility Reports
Community Mobility Reports aim to provide insights into what has changed in response to policies aimed at combating COVID-19.
https://www.google.com/covid19/mobility/

Other Data

Collection of COVID-19 Data APIs
Postman COVID-19 API Resource Center. During the present novel coronavirus (COVID-19) pandemic, those on the front lines—including health care professionals, researchers, and government experts—need quick, easy access to real-time critical data. This type of information exchange is what APIs do best, and as an API-first company, Postman is committed to providing whatever assistance we can in this area.
https://covid-19-apis.postman.com/

New York Times COVID-19 Dataset
An ongoing repository of data on coronavirus cases and deaths in the U.S. – https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
https://github.com/nytimes/covid-19-data

COVID-19 Dataset Clearinghouse
This is a repository for public data sets relating to the COVID-19 pandemic.
http://michaelnielsen.org/polymath1/index.php?title=COVID-19_dataset_clearinghouse
https://terrytao.wordpress.com/2020/03/25/polymath-proposal-clearinghouse-for-crowdsourcing-covid-19-data-and-data-cleaning-requests/

Facebook & Carnegie Mellon University COVID-19 Symptom Map
This map shows an estimated percentage of people with COVID-19 symptoms, not confirmed cases. Facebook uses aggregated public data from a survey conducted by Carnegie Mellon University Delphi Research Center. Facebook doesn’t receive, collect or store individual survey responses. This map is not intended for diagnostic or treatment purposes, or for guidance on any type of travel.
https://covid-survey.dataforgood.fb.com/

County-level Socioeconomic Data for Predictive Modeling of Epidemiological Effects
We aim to gather a machine readable dataset related to socioeconomic factors that may affect the spread and/or consequences of epidemiological outbreaks, particularly the novel coronavirus (COVID-19).
https://github.com/JieYingWu/COVID-19_US_County-level_Summaries

COVID-19 Epidemiological Data Repository by Johns Hopkins University Center for Systems Science & Engineering (JHU CCSE)
This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL).
https://github.com/CSSEGISandData/COVID-19

New York City COVID-19 Dataset
The data presented below reflect the most recent information collected about people who have tested positive for COVID-19 in NYC. We are discouraging people with mild to moderate symptoms from being tested at this time, so the data primarily represent people with more severe illness. All data included below are preliminary and subject to change. Unless otherwise noted, all of the below information was collected by the NYC Health Department. This page will be updated daily.
https://www1.nyc.gov/site/doh/covid/covid-19-data.page

Synthetic COVID-19 EHR Dataset from MITRE / Veterans Health Administration
NOTE: Synthetic data are most suitable for methods development & proof-of-concept analyses; avoid use for direct clinical & critical operational applications.

Datasets – CSV files (#1 Civilian Population; #2 Veteran Population): https://www.dropbox.com/sh/3xcsz3bzb7rjjwy/AADPY4gmouSKDa8XaC200g8za?dl=0
Guide to Synthetic Dataset: https://github.com/synthetichealth/synthea/wiki/Getting-Started
Methods Article: https://doi.org/10.1093/jamia/ocx079

European Center for Disease Control & Prevention (ECDC) – COVID-19 Epidemiological Data
Data on the geographic distribution of COVID-19 cases worldwide
https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide

U.S. Hospital Capacity Estimates (Harvard Global Health Institute)
HGHI launches regionalized capacity estimates. | Counting every bed, in every major hospital market, to inform community response
https://globalepidemics.org/2020/03/17/caring-for-covid-19-patients/

U.S. State COVID-19 Testing Data: The COVID Tracking Project
The COVID Tracking Project collects information from 50 US states, the District of Columbia, and 5 other US territories to provide the most comprehensive testing data we can collect for the novel coronavirus, SARS-CoV-2. We attempt to include positive and negative results, pending tests, and total people tested for each state or district currently reporting that data.
https://covidtracking.com/

Italy COVID-19 Data
COVID-19 Italia – Monitoraggio situazione
https://github.com/pcm-dpc/COVID-19

Healthcare U.S. hospital capacity data
(number of beds, ICU beds, ventilator capacity by state/county)
Version 1 (GitHub): https://github.com/rsowers-dhc/covid19
Version 2 (AWS): https://aws.amazon.com/marketplace/pp/USA-Hospital-Beds-COVID-19-Definitive-Healthcare/prodview-yivxd2owkloha

Amazon Web Services (AWS) Data Lake with Public COVID-19 Datasets
includes several datasets on this list which are stored, updated, and ready-for-analysis on AWS
https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/

U.S. State-Specific Projections for Hospital Resource Utilization
Model by Institute for Health Metrics and Evaluation / University of Washington. Data and projections can be downloaded
http://www.healthdata.org/covid/

WHO COVID-19 Data – Cases & Deaths in China (by province) and other countries
Coronavirus COVID-19 cummulative cases and deaths by province for China and aggregated by country for the rest of the World.
https://data.humdata.org/dataset/coronavirus-covid-19-cases-data-for-china-and-the-rest-of-the-world

ACAPS COVID-19: Government Measures Dataset
The COVID-19 Government Measures Dataset puts together all the measures implemented by governments worldwide in response to the Coronavirus pandemic. Data collection includes secondary data review. The researched information available falls into five categories: – Social distancing – Movement restrictions – Public health measures – Social and economic measures – Human rights implications Each category is broken down into several types of measures. ACAPS consulted government, media, United Nations, and other organisations sources.
https://data.humdata.org/dataset/acaps-covid19-government-measures-dataset

World Bank Indicators of Interest to the COVID-19 Outbreak
World Bank Indicators of Interest to the COVID-19 Outbreak. This link is to a collection in the World Bank data catalog that contains datasets that may be useful for analysis, response or modelling.
https://data.humdata.org/dataset/world-bank-indicators-of-interest-to-the-covid-19-outbreak

GeneBank COVID-19 Genetic Sequences
SARS-CoV-2 (Severe acute respiratory syndrome coronavirus 2) Sequences.
https://www.ncbi.nlm.nih.gov/genbank/sars-cov-2-seqs/

Next Strain COVID-19 Genomics Database
Genomic epidemiology of novel coronavirus
https://nextstrain.org/ncov

India COVID-19 Tracker
A tracker for the spread of COVID-19 in India
https://www.covid19india.org/