Searching for the number of COVID-19 cases in NYC schools in my neighborhood shouldn’t be this difficult.
In a sea of dashboards and trackers following COVID-19's path, the lack of aggregated data for schools in the state of New York led me to create my own.
In creating this new dashboard, my goal is to help parents in the Empire State easily access the data and find the neighborhoods and specific schools where pre-K through Grade 12 schools have been the most impacted by cases of the virus.
With the knowledge of Python and R, along with a keen interest in the field of Data Science/Business Analytics, I was able to solve this ‘Data Problem’ and now provide the public with an easy-to-navigate dashboard to view aggregated statistics across the state of New York, and New York City.
The (Data) Problem
As part of Governor Cuomo’s Executive Order signed on September 8th, designed to ensure the proper collection and reporting of COVID-19 data daily, New York State launched its “COVID-19 Report Card” — a dashboard designed to track “real-time COVID-19 infections and testing operations of every New York school and school district.”
The Report Card includes positive tests by region and individual data on each of the 6,300+ schools in the state of New York. Data on individual schools needs to be looked up separately across the following six categories, all with separate search functions:
- Public Schools
- Charter Schools
- Private Schools
- BOCES Programs
- Higher Education Institutions
- State Universities
Once you have selected a category on the Report Card, you can search for a school. As a result, it is relatively cumbersome and time-consuming to see the impact of the virus within the school system for a particular zip code by searching for one school at a time.
The Report Card also includes a summary page, which contains a breakdown of the number of positive tests across ten broad geographical regions across the state of New York, the type of school (Public or Private/Charter), students vs. teachers/staff, and whether the tests are administered on-site or off-site.
For parents interested in tracking the virus at post-secondary schools, several other sites provide a detailed analysis of the cases across the US, including an excellent tracker published by the New York Times.
Tracking the Coronavirus at U.S. Colleges and Universities
The New York Times is tracking coronavirus cases on campuses through a rolling survey. This page will be updated…
Unfortunately, there is no equivalent for elementary, middle, and high schools in New York State, at least until now.
This article and the new dashboard I have made available for the public focuses on three of the categories listed above: Public, Charter and Private schools within New York State, effectively capturing the 3.3 million students (Pre-K through 12), teachers and staff at the roughly 6,300+ schools across the state.
As of writing, over 28,780¹ or 0.86%² of New York State’s school population had tested positive for COVID-19 since September 8th, 2020, when the data started being tracked. These figures include students, teachers, and other staff who have been tested both on-site and off-site. This compares to an incremental increase for all of New York State’s population of approximately 1.81%³ over the same period (i.e., from September 8th to December 14th).
The New York City School District, the largest school system in the United States, has over 1.1 million students taught in more than 1,800 separate schools and includes public and charter schools. Adding in the number of private schools (independent, religious, etc.) brings the total to over 2,600 with a total population of over 1.3 million students. Within this school district, over 7,050¹ cases have been reported since the beginning of the school year, an infection rate of 0.53%². This compares to the incremental number of cases in all of New York City during the same period of 1.56% ³
While both NYS and NYC school figures compare favorably to the overall state statistics, the school population's current rate of change continues to rise, as the following chart illustrates. Note that the week ending November 27th included only 3 days of data as a result of the Thanksgiving holiday.
Over the past seven days alone, New York State cases within the school system have increased by 5,981, implying an increase of 26%.
Steps to solving the data problem
First off, it should be noted that I am not a programmer by trade. As you can see from my LinkedIn profile, I have spent most of my career working in the investment management industry as a fundamental equity analyst and portfolio manager. These roles require a significant amount of time to analyze data. And while the spreadsheet has been the primary tool used by the investment community for well over 30 years, the role of Data Science and tools like Python and R are becoming a more critical part of the process.
Creating the dashboard
The underlying data for each of the 6,300+ schools in New York State can be found on the state’s website (i.e., the Report Card). To analyze the aggregate data, I first needed to (legally) “scrape” or extract the available data from the site for all the individual public, charter, and public schools in the state using Python. The data was then cleaned, processed, matched up with geo-coordinates, organized, and aggregated daily to explore the different regions’ trends. As the last step, the data was exported into a curated dashboard I created (using R Studio’s Shiny), so the public can now easily access and visualize the relevant information.
The underlying data from New York State is usually published each school day morning with data from the end of the prior school day, and my own dashboard is updated shortly after.
A more detailed description of the process and the underlying Python code can be found on my GitHub repository.
- Data as of Monday, Dec 14th, 2020, at 04:09 PM. Source: New York State COVID-19 Report Card.
- Rate defined as the number of cumulative cases since Sept.8, 2020, as a % of the total on-site and off-site school population in a given region. Underlying data sourced from the New York State COVID-19 Report Card.
- Source: CDC, using the change in cumulative cases between Sep. 8th and December 14th as a percent of the overall population.