Posts for Tag: data

Wildfire smoke impacts solar panel generation

Posted In: Energy | Environment

On September 9th, 2020, the entire San Francisco Bay Area, we had a crazy combination of wildfire smoke and low clouds that darkened the sky and turned everything orange. At 9am, it looked like it was nighttime and at noon, it was so dark, that it looked like dusk.

Here is a plot of 8+ years of solar panel generation from our panels. If you click on the legend, you can toggle whether that data is shown. Total generation for the day was only 93 watt hours (as opposed to a summer median of 13300 watt hours, 13.3 kWh) and peak power was only 32 watts (vs a median summer peak of 2000 watts (2.0 kW)).

The solar generation was even worse than the next worst day in winter (typically when it rains all day). Clicking on the legend will toggle whether certain seasons are shown and you can view how solar generation varies by season.

Here is a google image search of photos showing the crazy, apocalyptic scenes with the orange color.

Source and Tools:
Data on solar generation is downloaded from our solar panel inverter provider (enphase) and cleaned with a python script. Graph is made using the plotly open source javascript library.

stock market drop

2020 Stock Market Drop Compared to other Bear Markets

Posted In: Economics

2020’s stock market drop was unprecedented for the speed of the drop and also the speed of the recovery

This graph shows the stock market drops from the 2020 and other bear markets normalized so that the peak is at 100% at day 0. This lets you see the severity and duration of different bear markets from the Great Depression (1929), the Dot Com Bust (2000), and the Financial Crisis (2008) and other drops over 30%.

The coronavirus pandemic has significantly disrupted the global economy. Q2 GDP in the United States declined at an annualized rate of 32% and US unemployment reaching 15% due to coronavirus induced business shutdowns.

However, the stock market drop (represented by the S&P500 index) in late February and early March 2020 has somewhat surprisingly rebounded and reached a new all-time-high in August 2020, even as unemployment and GDP output has continued to falter. There certainly seems to be a disconnect between the fundamentals of the economy and the stock market.

Will the recovery in the stock markets continue or will it begin to align more closely with the fundamentals of the economy?

There are many proposed reasons why this disconnect is happening. The Federal Reserve actions to increase liquidity and prop up the stock market. The heavy weighting of tech in the S&P500 and the pandemic’s boost to many tech company’s business (i.e. Amazon, Zoom, Apple). Whatever the reason, the question of whether the market can continue at this pace or will have a correction is important and one to watch.

Data for the S&P500 price is daily from 1950 onward but before 1950, the data I had available was on a monthly basis. I interpolated this monthly data to create daily data, so not all the data is 100% accurate for any given day before 1950. Data for 2020 will continue to be updated daily.

Source and Tools:
Data on historical S&P500 prices is from Yahoo! Finance and downloaded and cleaned with a python script. Graph is made using the plotly open source javascript library.

stock market drop

US Postal Service vs Private Delivery

Posted In: Government

The US Postal Service mail volume is enormous and can’t easily be replaced by private delivery services

The US Postal Service (USPS) has been getting a good deal of press recently because of Trump’s attacks on the security of mail in voting and recent moves by political appointees to reduce the capability of the agency to delivery mail in a timely fashion. These changes reportedly include removing mail sorting equipment and changing overtime hours.

Some have suggested privatizing the postal service but currently the volume of mail and packages through private delivery services is far smaller than that carried by the federal agency.

Note that the USPS carries about 55 billion pieces of first class mail annually out of the reported 143 billion pieces of total mail.

Source and Tools:
Data on Fedex, UPS and Amazon deliveries is from this theverge.com article. Data for the USPS comes from usps.com. Graph is made using the plotly open source javascript library.

How much will masks reduce coronavirus transmission rate R0?

Posted In: Health

It depends on their effectiveness and how many people wear them

R0 is the transmission rate which is defined as the average number of cases that are expected to be produced from a single case in an uninfected population. R0 is dependent on a number of different factors that include transmissibility of a disease (how infectious it is), the amount of social contact and the duration of social contact. We have learned that variants of the coronavirus (such as delta or omicron) can greatly influence the transmissibility of the disease.

A baseline level of social contact is related to the population density (how often you come into contact with other people) and social distancing (limiting gatherings, not going in to work or school, etc) will reduce the amount of social contact with different people. Given what we know about coronavirus and its transmission, the amount of “contact” can also be influenced by mask wearing. This interactive graph shows the effect of mask wearing and effectiveness on reducing R0 even further. Because the effectiveness of existing vaccines is as of yet unknown against Omicron, this visualization does not take into account vaccines and their effectiveness of reducing R0, which is a very important limitation.

A very important caveat to this visualization: This visualization was initially created before COVID-19 vaccines were available and does not currently take their ability to prevent infection (and lower R0) into account because the effectiveness of each vaccines differs and the protection against infection wanes over time

This graph is a work-in-progress so please feel free to provide suggestions and feedback on issues of scientific concepts as well as for improvements in conveying the concepts/ideas.

Methodology

R0 values for different regions and population densities are estimated from Youyang Gu’s machine learning model for spread in Feb and early-March (i.e. before social distancing and mask wearing).

Baseline R0,variant based on variant transmissibility – R0 value ranges from an early estimate of 8 for Omicron to 5 for Delta and 2.5 for the original Alpha strain.

Population density factor (PDF) – this can increase or decrease the R0 value based on how much close contact you have. It ranges from about 2.4 in very high density places like New York City with lots of transit use where you are in close contact with other people for long periods of time to 0.8 in rural areas with much less contact. A value of 1 represents average US population density.

Social distancing factor (SDF) – this is simply a reduction on the baseline R0 based on the amount of social distancing (ranges from 100% (no social distancing) to 33% (high levels of social distancing). This is a reduction in the amount of time and number of people the average person is exposed to compared to baseline levels.

Mask effectiveness (Kmaskeff) – is defined as the percentage reduction in transmission of coronavirus that mask wearing can provide. An N95 mask is at least 95% effective at blocking most particles, but because it also reduces the speed at which your exhalation can travel outward (providing more time for droplets and aerosols to spread and diffuse to low concentration), an N95 can be much more than 95% effective in reducing coronavirus droplet and aerosol spread compared to the unmasked case. I’ve seen estimates for things like bandanas and homemade cloth mask having lower effectiveness maybe around 50% but I don’t know how scientifically they were estimated/calculated. Also depending on how mask are worn, this can also affect the effectiveness parameter. For example if an N95 mask does not fit tightly against the face and there are large gaps for air to flow, this will reduce the effectiveness of the mask. This parameter is shown on the x-axis.

Percent wearing masks (Kmaskfreq) – is simply the percentage of people wearing masks (varies from 0% to 100%). This parameter is shown on the y-axis.

The formula for effective Reffective is:

$R_\mathit{eff}=R_0,variant \times PDF \times SDF \times (1-K_{mask\mathit{eff}} \times K_{maskfreq})^2$

where $R_\mathit{eff}$ is the final average transmission value, $R_0,variant$ is the $R_0$ value based on the coronavirus variant type, PDF is the population density factor, SDF is the social distancing factor, $K_{mask\mathit{eff}}$ is the average mask effectiveness and $K_{maskfreq}$ is the percentage of people wearing masks. The squared parameter on the right side of the equation is essentially the average reduction in transmission that is likely due to mask usage and is from a preprint from Howard et al.

As you move up and to the right of the graph, mask use and effectiveness become very high and the transmission of coronavirus declines significantly. If you hover over the graph (on a desktop) or click on the graph (on mobile) you will see a popup that shows the Reff value that results. The lower the Reff value is the better as it dramatically affects the rate of transmission. High numbers will lead to explosive exponential growth while values below 1.0 will eventually reduce coronavirus transmissions to near 0.

For example at R0 of 6 and no social distancing or mask usage, one initial case can lead to approximately 56,000 cases in only 30 days. Whereas an Reff of 0.5 will only lead to a total of ~1 additional case in 30 days.

I am not an epidemiologist so some of the linear relationships and assumptions may be incorrect. Please let me know if I got anything terribly wrong or if you have any questions or suggestions on how the tool works, is structured or presented.

Source and Tools:
The reduction in R0 due to mask effectiveness and usage based on a model from a preprint from Howard et al. Baseline R0 are from Youyang Gu’s machine learning model. Calculations are done in javascript and visualization is done with the open source Plotly javascript graphing library.

mask usage reduces coronavirus transmission

Number of Electoral Votes by State in the 59 US Presidential Elections

Posted In: Elections

How many electoral votes did each state have across two centuries of elections?

This animation shows the number of electoral votes each state had during each of the 59 presidential elections in US history between 1788 and 2020. It’s interesting to see the number of US states and their relative population sizes (in terms of electoral votes) over many different presidential elections. The population is counted every 10 years in the census so if a presidential election occurs between a census, it likely will not see any difference in numbers of electoral votes, unless something else happens (such as addition of a new state to the country).

Instructions
You can use the slider to control the election year to focus on a specific election and toggle the animation by hitting the Start/Stop button. Hovering over each state will tell you the number of electoral votes and the percentage of the total number of electoral votes in that election.

In the elections during and immediately after the US Civil War, we also see some states whose electoral votes for president are not counted (shown in purple). Wyoming, the state with the lowest population in the US, has the highest number of electoral votes per person in the state, while the three most populous states, California, Florida and Texas have the least number of electoral votes per person. Wyoming has four times the number of electors per capita than these 3 states have (i.e. accounting for their population sizes). That will be the subject of another map dataviz.

Here is another map that resizes the US states (i.e. shrink or grow) based on the number of electoral vote so that their electoral power is reflected in its size.

Sources and Tools:

Data on number of electoral votes by state for each election is from Wikipedia. And the visualization was created using javascript and the open source leaflet javascript mapping library.

state borders

Bay Area Coronavirus Cases

Posted In: Health
coronavirus bay area

Compare the Bay Area coronavirus cases with Los Angeles and the rest of California

I wanted to better understand the coronavirus situation in my home region, the Bay Area, and I hadn’t seen any good resources that compared what was happening here to other regions in California. So I decided to make this graph. This page will be updated daily so you can come back regularly to see how the situation is changing (and hopefully improving sometime soon).

The coronavirus lockdowns began in mid-March 2020 and things have been opening up in late May, which corresponded to an uptick in coronavirus cases in the Bay Area and throughout California. While the cases in the Bay Area are increasing, it’s clear that there’s a big difference between the Bay Area and much of the rest of California. Los Angeles is currently leading the state with a large increase in the number of new cases in June as the economy tries to reopen restaurants, bars, gyms and other businesses.

You can toggle between coronavirus cases and deaths and look at the absolute numbers or on a per capita basis (per one million inhabitants). California has 39.5 million residents, while greater LA has 18.7 million residents and the Bay Area has 7.7 million residents. The daily data is shown as well as a five day moving average so you can get a better sense of the trends.

The San Francisco Bay Area was among the first regions to impose restrictions on gatherings and encourage people to stay home to fight the virus. In late February, the city of San Francisco declared an emergency in preparation for the upcoming pandemic and by early March, things became clear that life would not continue on as before.

The Bay Area is defined as the nine-county region consisting of Alameda, Contra Costa, Marin, Napa, San Francisco, San Mateo, Santa Clara, Solano and Sonoma counties.

Greater Los Angeles is defined as the 5 county region consisting of Los Angeles, Orange, Ventura, San Bernardino and Riverside counties.

Data and Tools:
County level data on coronavirus cases and deaths is from the New York Times github. Data is processed in python and javascript and graphed using the plotly open source graphing library.
bay area coronavirus cases