sat, 19-nov-2016, 15:50

Introduction

So far this winter we’ve gotten only 4.1 inches of snow, well below the normal 19.7 inches, and there is only 2 inches of snow on the ground. At this point last year we had 8 inches and I’d been biking and skiing on the trail to work for two weeks. In his North Pacific Temperature Update blog post, Richard James mentions that winters like this one, with a combined strongly positive Pacific Decadal Oscillation phase and strongly negative North Pacific Mode phase tend to be a “distinctly dry” pattern for interior Alaska. I don’t pretend to understand these large scale climate patterns, but I thought it would be interesting to look at snowfall and snow depth in years with very little mid-November snow. In other years like this one do we eventually get enough snow that the trails fill in and we can fully participate in winter sports like skiing, dog mushing, and fat biking?

Data

We will use daily data from the Global Historical Climate Data set for the Fairbanks International Airport station. Data prior to 1950 is excluded because of poor quality snowfall and snow depth data and because there’s a good chance that our climate has changed since then and patterns from that era aren’t a good model for the current climate in Alaska.

We will look at both snow depth and the cumulative winter snowfall.

Results

The following tables show the ten years with the lowest cumulative snowfall and snow depth values from 1950 to the present on November 18th.

Year	Cumulative Snowfall (inches)
1953	1.5
2016	4.1
1954	4.3
2014	6.0
2006	6.4
1962	7.5
1998	7.8
1960	8.5
1995	8.8
1979	10.2

Year	Snow depth (inches)
1953	1
1954	1
1962	1
2016	2
2014	2
1998	3
1964	3
1976	3
1971	3
2006	4

2016 has the second-lowest cumulative snowfall behind 1953 and is tied for second with 2014 for snow depth with 1953, 1954 and 1962 all having only 1 inch of snow on November 18th.

It also seems like recent years appear in these tables more frequently than would be expected. Grouping by decade and averaging cumulative snowfall and snow depth yields the pattern in the chart below. The error bars (not shown) are fairly large, so the differences between decades aren’t likely to be statistically significant, but there is a pattern of lower snowfall amounts in recent decades.

Decadal average cumulative snowfall and snow depth

Now let’s see what happened in those years with low snowfall and snow depth values in mid-November starting with cumulative snowfall. The following plot (and the subsequent snow depth plot) shows the data for the low-value years (and one very high snowfall year—1990), with each year’s data as a separate line. The smooth dark cyan line through the middle of each plot is the smoothed line through the values for all years; a sort of “average” snowfall and snow depth curve.

Cumulative snowfall, years with low snow on November 18

In all four mid-November low-snowfall years, the cumulative snowfall values remain below average throughout the winter, but snow did continue to fall as the season went on. Even the lowest winter year here, 2006–2007, still ended the winter with 15 inches of snow on the groud.

The following plot shows snow depth for the four years with the lowest snow depth on November 18th. The data is formatted the same as in the previous plot except we’ve jittered the values slightly to make the plot easier to read.

Snow depth, years with low snow on November 18

The pattern here is similar, but the snow depths get much closer to the average values. Snow depth for all four low snow years remain low throughout November, but start rising in December, dramatically in 1954 and 2014.

One of the highest snowfall years between 1950 and 2016 was 1990–1991 (shown on both plots). An impressive 32.8 inches of snow fell in eight days between December 21st and December 28th, accounting for the sharp increase in cumulative snowfall and snow depth shown on both plots. There are five years in the record where the cumulative total for the entire winter was lower than these eight days in 1990.

Conclusion

Despite the lack of snow on the ground to this point in the year, the record shows that we are still likely to get enough snow to fill in the trails. We may need to wait until mid to late December, but it’s even possible we’ll eventually reach the long term average depth before spring.

Appendix

Here’s the R code used to generate the statistics, tables and plots from this post:

library(tidyverse)
library(lubridate)
library(scales)
library(knitr)

noaa <- src_postgres(host="localhost", dbname="noaa")

snow <- tbl(noaa, build_sql(
   "WITH wdoy_data AS (
         SELECT dte, dte - interval '120 days' as wdte,
            tmin_c, tmax_c, (tmin_c+tmax_c)/2.0 AS tavg_c,
            prcp_mm, snow_mm, snwd_mm
         FROM ghcnd_pivot
         WHERE station_name = 'FAIRBANKS INTL AP'
         AND dte > '1950-09-01')
   SELECT dte, date_part('year', wdte) AS wyear, date_part('doy', wdte) AS wdoy,
         to_char(dte, 'Mon DD') AS mmdd,
         tmin_c, tmax_c, tavg_c, prcp_mm, snow_mm, snwd_mm
   FROM wdoy_data")) %>%
   mutate(wyear=as.integer(wyear),
            wdoy=as.integer(wdoy),
            snwd_mm=as.integer(snwd_mm)) %>%
   select(dte, wyear, wdoy, mmdd,
            tmin_c, tmax_c, tavg_c, prcp_mm, snow_mm, snwd_mm) %>% collect()

write_csv(snow, "pafa_data_with_wyear_post_1950.csv")
save(snow, file="pafa_data_with_wyear_post_1950.rdata")

cum_snow <- snow %>%
   mutate(snow_na=ifelse(is.na(snow_mm),1,0),
         snow_mm=ifelse(is.na(snow_mm),0,snow_mm)) %>%
   group_by(wyear) %>%
   mutate(snow_mm_cum=cumsum(snow_mm),
         snow_na=cumsum(snow_na)) %>%
   ungroup() %>%
   mutate(snow_in_cum=round(snow_mm_cum/25.4, 1),
         snwd_in=round(snwd_mm/25.4, 0))

nov_18_snow <- cum_snow %>%
   filter(mmdd=='Nov 18') %>%
   select(wyear, snow_in_cum, snwd_in) %>%
   arrange(snow_in_cum)

decadal_avg <- nov_18_snow %>%
   mutate(decade=as.integer(wyear/10)*10) %>%
   group_by(decade) %>%
   summarize(`Snow depth`=mean(snwd_in),
            snwd_sd=sd(snwd_in),
            `Cumulative Snowfall`=mean(snow_in_cum),
            snow_cum_sd=sd(snow_in_cum))

decadal_averages <- ggplot(decadal_avg %>%
                              gather(variable, value, -decade) %>%
                              filter(variable %in% c("Cumulative Snowfall",
                                                      "Snow depth")),
                           aes(x=as.factor(decade), y=value, fill=variable)) +
            theme_bw() +
            geom_bar(stat="identity", position="dodge") +
            scale_x_discrete(name="Decade", breaks=c(1950, 1960, 1970, 1980,
                                                   1990, 2000, 2010)) +
            scale_y_continuous(name="Inches", breaks=pretty_breaks(n=10)) +
            scale_fill_discrete(name="Measurement")

print(decadal_averages)

date_x_scale <- cum_snow %>%
   filter(grepl(' (01|15)', mmdd), wyear=='1994') %>%
   select(wdoy, mmdd)

cumulative_snowfall <-
   ggplot(cum_snow %>% filter(wyear %in% c(1953, 1954, 2014, 2006, 1990),
                              wdoy>183,
                              wdoy<320),
            aes(x=wdoy, y=snow_in_cum, colour=as.factor(wyear))) +
   theme_bw() +
   geom_smooth(data=cum_snow %>% filter(wdoy>183, wdoy<320),
               aes(x=wdoy, y=snow_in_cum),
               size=0.5, colour="darkcyan",
               inherit.aes=FALSE,
               se=FALSE) +
   geom_line(position="jitter") +
   scale_x_continuous(name="",
                     breaks=date_x_scale$wdoy,
                     labels=date_x_scale$mmdd) +
   scale_y_continuous(name="Cumulative snowfall (in)",
                     breaks=pretty_breaks(n=10)) +
   scale_color_discrete(name="Winter year")

print(cumulative_snowfall)

snow_depth <-
   ggplot(cum_snow %>% filter(wyear %in% c(1953, 1954, 1962, 2014, 1990),
                              wdoy>183,
                              wdoy<320),
            aes(x=wdoy, y=snwd_in, colour=as.factor(wyear))) +
   theme_bw() +
   geom_smooth(data=cum_snow %>% filter(wdoy>183, wdoy<320),
               aes(x=wdoy, y=snwd_in),
               size=0.5, colour="darkcyan",
               inherit.aes=FALSE,
               se=FALSE) +
   geom_line(position="jitter") +
   scale_x_continuous(name="",
                     breaks=date_x_scale$wdoy,
                     labels=date_x_scale$mmdd) +
   scale_y_continuous(name="Snow Depth (in)",
                     breaks=pretty_breaks(n=10)) +
   scale_color_discrete(name="Winter year")

print(snow_depth)

tags: R snowfall weather snow depth climate

Equinox Marathon Weather

tue, 13-sep-2016, 18:31

Introduction

Update: An update that includes 2016—2020 data is here.

Andrea and I are running the Equinox Marathon relay this Saturday with Norwegian dog musher Halvor Hoveid. He’s running the first leg, I’m running the second, and Andrea finishes the race. I ran the second leg as a training run a couple weeks ago and feel good about my physical conditioning, but the weather is always a concern this late in the fall, especially up on top of Ester Dome, where it can be dramatically different than the valley floor where the race starts and ends.

Andrea ran the full marathon in 2009—2012 and the relay in 2008 and 2013—2015. I ran the full marathon in 2013. There was snow on the trail when I ran it, making the out and back section slippery and treacherous, and the cold temperatures at the start meant my feet were frozen until I got off of the single-track, nine or ten miles into the course. In other years, rain turned the powerline section to sloppy mud, or cold temperatures and freezing rain up on the Dome made it unpleasant for runners and supporters.

In this post we will examine the available weather data, looking at the range of conditions we could experience this weekend. The current forecast from the National Weather Service is calling for mostly cloudy skies with highs in the 50s. Low temperatures the night before are predicted to be in the 40s, with rain in the forecast between now and then.

Methods

There is no long term climate data for Ester Dome, but there are several valley-level stations with data going back to the start of the race in 1963. The best data comes from the Fairbanks Airport station and includes daily temperature, precipitation, and snowfall for all years, and wind speed and direction since 1984. I also looked at the data from the College Observatory station (FAOA2) behind the GI on campus and the University Experimental Farm, also on campus, but neither of these stations have a complete record. The daily data is part of the Global Historical Climatology Network - Daily dataset.

I also have hourly data from 2008—2013 for both the Fairbanks Airport and a station located on Ester Dome that is no longer operational. We’ll use this to get a sense of what the possible temperatures on Ester Dome might have been based on the Fairbanks Airport data. Hourly data comes from the Meterological Assimilation Data Ingest System (MADIS).

The R code used for this post appears at the bottom, and all the data used is available from here.

Results

Ester Dome temperatures

Since there isn’t a long-running weather station on Ester Dome (at least not one that’s publicly available), we’ll use the September data from an hourly Ester Dome station that was operational until 2014. If we join the Fairbanks Airport station data with this data wherever the observations are within 30 minutes of each other, we can see the relationship between Ester Dome temperature and temperature at the Fairbanks Airport.

Here’s what that relationship looks like, including a linear regression line between the two. The shaded area in the lower left corner shows the region where the temperatures on Ester Dome are below freezing.

Ester Dome and Fairbanks Airport temperatures

And the regression:

##
## Call:
## lm(formula = ester_dome_temp_f ~ pafa_temp_f, data = pafa_fbsa)
##
## Residuals:
##    Min     1Q Median     3Q    Max
## -9.649 -3.618 -1.224  2.486 22.138
##
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.69737    0.77993  -3.458 0.000572 ***
## pafa_temp_f  0.94268    0.01696  55.567  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.048 on 803 degrees of freedom
## Multiple R-squared:  0.7936, Adjusted R-squared:  0.7934
## F-statistic:  3088 on 1 and 803 DF,  p-value: < 2.2e-16

The regression model is highly significant, as are both coefficients, and the relationship explains almost 80% of the variation in the data. According to the model, in the month of September, Ester Dome average temperature is almost three degrees colder than at the airport. And whenever temperature at the airport drops below 37 degrees, it’s probably below freezing on the Dome.

Race day weather

Temperatures at the airport on race day ranged from 19.9 °F in 1972 to 68 °F in 1969, and the range of average temperatures is 34.2 and 53 °F. Using our model of Ester Dome temperatures, we get an average range of 29.5 and 47 °F and an overall min / max of 16.1 / 61.4 °F. Generally speaking, in most years it will be below freezing on Ester Dome, but possibly before most of the runners get up there.

Precipitation (rain, sleet, or snow) has fallen on 15 out of 53 race days, or 28% of the time, and measurable snowfall has been recorded on four of those fifteen. The highest amount fell in 2014 with 0.36 inches of liquid precipitation (no snow was recorded and the temperatures were between 45 and 51 °F so it was almost certainly all rain, even on Ester Dome). More than a quarter of an inch of precipitation fell in three of the fifteen years (1990, 1992, and 2014), but most rainfall totals are much smaller.

Measurable snow fell at the airport in four years, or seven percent of the time: 4.1 inches in 1993, 2.1 inches in 1985, 1.2 inches in 1996 and 0.4 inches in 1992. But that’s at the airport station. Four of the 15 years where measurable precipitation fell at the airport, but no snow fell, had possible minimum temperatures on Ester Dome that were below freezing. It’s likely that some of the precipitation recorded at the airport in those years was coming down as snow up on Ester Dome. If so, that means snow may have fallen on eight race days, bringing the percentage up to fifteen percent.

Wind data from the airport has only been recorded since 1984, but from those years the average wind speed at the airport on race day is 4.9 miles per hour. Peak 2-minute winds during Equinox race day was 21 miles per hour in 2003. Unfortunately, no wind data is available for Ester Dome, but it’s likely to be higher than what is recorded at the airport. We do have wind speed data from the hourly Ester Dome station from 2008 through 2013, but the linear relationship between Ester Dome winds and winds at the Fairbanks airport only explain about a quarter of the variation in the data, and a look at the plot doesn’t give me much confidence in the relationship shown (see below).

Ester Dome and Fairbanks Airport wind speeds

Weather from the week prior

It’s also useful to look at the weather from the week before the race, since excessive pre-race rain or snow can make conditions on race day very different, even if the race day weather is pleasant. The year I ran the full marathon (2013), it had snowed the week before and much of the trail in the woods before the water stop near Henderson and all of the out and back were covered in snow.

The most dramatic example of this was 1992 where 23 inches of snow fell at the airport in the week prior to the race, with much higher totals up on the summit of Ester Dome. Measurable snow has been recorded at the airport in the week prior to six races, but all the weekly totals are under an inch except for the snow year of 1992.

Precipitation has fallen in 42 of 53 pre-race weeks (79% of the time). Three years have had more than an inch of precipitation prior to the race: 1.49 inches in 2015, 1.26 inches in 1992 (which fell as snow), and 1.05 inches in 2007. On average, just over two tenths of an inch of precipitation falls in the week before the race.

Summary

The following stacked plots shows the weather for all 53 runnings of the Equinox marathon. The top panel shows the range of temperatures on race day from the airport station (wide bars) and estimated on Ester Dome (thin lines below bars). The shaded area at the bottom shows where temperatures are below freezing. Dashed orange horizonal lines represent the average high and low temperature at the airport on race day; solid orange horizonal lines indicate estimated average high and low temperature on Ester Dome.

The middle panel shows race day liquid precipitation (rain, melted snow). Bars marked with an asterisk indicate years where snow was also recorded at the airport, but remember that four of the other years with liquid precipitation probably experienced snow on Ester Dome (1977, 1986, 1991, and 1994) because the temperatures were likely to be below freezing at elevation.

The bottom panel shows precipitation totals from the week prior to the race. Bars marked with an asterisk indicate weeks where snow was also recorded at the airport.

Here’s a table with most of the data from the analysis. Record values for each variable are in bold.

	Fairbanks Airport Station							Ester Dome (estimated)
	Race Day					Previous Week		Race Day
Date	min t	max t	wind	prcp	snow	prcp	snow	min t	max t
1963‑09‑21	32.0	54.0		0.00	0.0	0.01	0.0	27.5	48.2
1964‑09‑19	34.0	57.9		0.00	0.0	0.03	0.0	29.4	51.9
1965‑09‑25	37.9	60.1		0.00	0.0	0.80	0.0	33.0	54.0
1966‑09‑24	36.0	62.1		0.00	0.0	0.01	0.0	31.2	55.8
1967‑09‑23	35.1	57.9		0.00	0.0	0.00	0.0	30.4	51.9
1968‑09‑21	23.0	44.1		0.00	0.0	0.04	0.0	19.0	38.9
1969‑09‑20	35.1	68.0		0.00	0.0	0.00	0.0	30.4	61.4
1970‑09‑19	24.1	39.9		0.00	0.0	0.42	0.0	20.0	34.9
1971‑09‑18	35.1	55.9		0.00	0.0	0.14	0.0	30.4	50.0
1972‑09‑23	19.9	42.1		0.00	0.0	0.01	0.2	16.1	38.0
1973‑09‑22	30.0	44.1		0.00	0.0	0.05	0.0	25.6	38.9
1974‑09‑21	48.0	60.1		0.08	0.0	0.00	0.0	42.6	54.0
1975‑09‑20	37.9	55.9		0.02	0.0	0.02	0.0	33.0	50.0
1976‑09‑18	34.0	59.0		0.00	0.0	0.54	0.0	29.4	52.9
1977‑09‑24	36.0	48.9		0.06	0.0	0.20	0.0	31.2	43.4
1978‑09‑23	30.0	42.1		0.00	0.0	0.10	0.3	25.6	37.0
1979‑09‑22	35.1	62.1		0.00	0.0	0.17	0.0	30.4	55.8
1980‑09‑20	30.9	43.0		0.00	0.0	0.35	0.0	26.4	37.8
1981‑09‑19	37.0	43.0		0.15	0.0	0.04	0.0	32.2	37.8
1982‑09‑18	42.1	61.0		0.02	0.0	0.22	0.0	37.0	54.8
1983‑09‑17	39.9	46.9		0.00	0.0	0.05	0.0	34.9	41.5
1984‑09‑22	28.9	60.1	5.8	0.00	0.0	0.08	0.0	24.5	54.0
1985‑09‑21	30.9	42.1	6.5	0.14	2.1	0.57	0.0	26.4	37.0
1986‑09‑20	36.0	52.0	8.3	0.07	0.0	0.21	0.0	31.2	46.3
1987‑09‑19	37.9	61.0	6.3	0.00	0.0	0.00	0.0	33.0	54.8
1988‑09‑24	37.0	45.0	4.0	0.00	0.0	0.11	0.0	32.2	39.7
1989‑09‑23	36.0	61.0	8.5	0.00	0.0	0.07	0.5	31.2	54.8
1990‑09‑22	37.9	50.0	7.8	0.26	0.0	0.00	0.0	33.0	44.4
1991‑09‑21	36.0	57.0	4.5	0.04	0.0	0.03	0.0	31.2	51.0
1992‑09‑19	24.1	33.1	6.7	0.01	0.4	1.26	23.0	20.0	28.5
1993‑09‑18	28.0	37.0	4.9	0.29	4.1	0.37	0.3	23.7	32.2
1994‑09‑24	27.0	51.1	6.0	0.02	0.0	0.08	0.0	22.8	45.5
1995‑09‑23	43.0	66.9	4.0	0.00	0.0	0.00	0.0	37.8	60.4
1996‑09‑21	28.9	37.9	6.9	0.06	1.2	0.26	0.0	24.5	33.0
1997‑09‑20	27.0	55.0	3.8	0.00	0.0	0.03	0.0	22.8	49.2
1998‑09‑19	42.1	60.1	4.9	0.00	0.0	0.37	0.0	37.0	54.0
1999‑09‑18	39.0	64.9	3.8	0.00	0.0	0.26	0.0	34.1	58.5
2000‑09‑16	28.9	50.0	5.6	0.00	0.0	0.30	0.0	24.5	44.4
2001‑09‑22	33.1	57.0	1.6	0.00	0.0	0.00	0.0	28.5	51.0
2002‑09‑21	33.1	48.9	3.8	0.00	0.0	0.03	0.0	28.5	43.4
2003‑09‑20	26.1	46.0	9.6	0.00	0.0	0.00	0.0	21.9	40.7
2004‑09‑18	26.1	48.0	4.3	0.00	0.0	0.25	0.0	21.9	42.6
2005‑09‑17	37.0	63.0	0.9	0.00	0.0	0.09	0.0	32.2	56.7
2006‑09‑16	46.0	64.0	4.3	0.00	0.0	0.00	0.0	40.7	57.6
2007‑09‑22	25.0	45.0	4.7	0.00	0.0	1.05	0.0	20.9	39.7
2008‑09‑20	34.0	51.1	4.5	0.00	0.0	0.08	0.0	29.4	45.5
2009‑09‑19	39.0	50.0	5.8	0.00	0.0	0.25	0.0	34.1	44.4
2010‑09‑18	35.1	64.9	2.5	0.00	0.0	0.00	0.0	30.4	58.5
2011‑09‑17	39.9	57.9	1.3	0.00	0.0	0.44	0.0	34.9	51.9
2012‑09‑22	46.9	66.9	6.0	0.00	0.0	0.33	0.0	41.5	60.4
2013‑09‑21	24.3	44.1	5.1	0.00	0.0	0.13	0.6	20.2	38.9
2014‑09‑20	45.0	51.1	1.6	0.36	0.0	0.00	0.0	39.7	45.5
2015‑09‑19	37.9	44.1	2.9	0.01	0.0	1.49	0.0	33.0	38.9

Postscript

The weather for the 2016 race was just about perfect with temperatures ranging from 34 to 58 °F and no precipitation during the race. The airport did record 0.01 inches for the day, but this fell in the evening, after the race had finished.

Appendix: R code

 library(dplyr)
 library(readr)
 library(lubridate)
 library(ggplot2)
 library(scales)
 library(grid)
 library(gtable)

 race_dates <- read_fwf("equinox_marathon_dates.rst", skip=5, n_max=54,
                        fwf_positions(c(4, 6), c(9, 19), c("number", "race_date")))

 noaa <- src_postgres(host="localhost", dbname="noaa")
 # pivot <- tbl(noaa, build_sql("SELECT * FROM ghcnd_pivot
 #                               WHERE station_name = 'UNIVERSITY EXP STN'"))
 # pivot <- tbl(noaa, build_sql("SELECT * FROM ghcnd_pivot
 #                               WHERE station_name = 'COLLEGE OBSY'"))
 pivot <- tbl(noaa, build_sql("SELECT * FROM ghcnd_pivot
                               WHERE station_name = 'FAIRBANKS INTL AP'"))

 race_day_wx <- pivot %>%
     inner_join(race_dates, by=c("dte"="race_date"), copy=TRUE) %>%
     collect() %>%
     mutate(tmin_f=round((tmin_c*9/5.0)+32, 1), tmax_f=round((tmax_c*9/5.0)+32, 1),
            prcp_in=round(prcp_mm/25.4, 2),
            snow_in=round(snow_mm/25.4, 1), snwd_in=round(snow_mm/25.4, 1),
            awnd_mph=round(awnd_mps*2.2369, 1),
            wsf2_mph=round(wsf2_mps*2.2369), 1) %>%
     select(number, race_date, tmin_f, tmax_f, prcp_in, snow_in,
            snwd_in, awnd_mph, wsf2_mph)

 week_before_race_day_wx <- pivot %>%
     mutate(year=date_part("year", dte)) %>%
     inner_join(race_dates %>%
                    mutate(year=year(race_date)),
                copy=TRUE) %>%
     collect() %>%
     mutate(tmin_f=round((tmin_c*9/5.0)+32, 1), tmax_f=round((tmax_c*9/5.0)+32, 1),
            prcp_in=round(prcp_mm/25.4, 2),
            snow_in=round(snow_mm/25.4, 1), snwd_in=round(snow_mm/25.4, 1),
            awnd_mph=round(awnd_mps*2.2369, 1), wsf2_mph=round(wsf2_mps*2.2369, 1)) %>%
     select(number, year, race_date, dte, prcp_in, snow_in) %>%
     mutate(week_before=race_date-days(7)) %>%
     filter(dte<race_date, dte>=week_before) %>%
     group_by(number, year, race_date) %>%
     summarize(pweek_prcp_in=sum(prcp_in),
               pweek_snow_in=sum(snow_in))

 all_wx <- race_day_wx %>%
     inner_join(week_before_race_day_wx) %>%
     mutate(tavg_f=(tmin_f+tmax_f)/2.0,
            snow_label=ifelse(snow_in>0, '*', NA),
            pweek_snow_label=ifelse(pweek_snow_in>0, '*', NA)) %>%
     select(number, year, race_date, tmin_f, tmax_f, tavg_f,
            prcp_in, snow_in, snwd_in, awnd_mph, wsf2_mph,
            pweek_prcp_in, pweek_snow_in,
            snow_label, pweek_snow_label);

 write_csv(all_wx, "all_wx.csv")

 madis <- src_postgres(host="localhost", dbname="madis")

 pafa_fbsa <- tbl(madis,
                  build_sql("
   WITH pafa AS (
     SELECT dt_local, temp_f, wspd_mph
     FROM observations
     WHERE station_id = 'PAFA' AND date_part('month', dt_local) = 9),
   fbsa AS (
     SELECT dt_local, temp_f, wspd_mph
     FROM observations
     WHERE station_id = 'FBSA2' AND date_part('month', dt_local) = 9)
   SELECT pafa.dt_local, pafa.temp_f AS pafa_temp_f, pafa.wspd_mph as pafa_wspd_mph,
     fbsa.temp_f AS ester_dome_temp_f, fbsa.wspd_mph as ester_dome_wspd_mph
   FROM pafa
     INNER JOIN fbsa ON
       pafa.dt_local BETWEEN fbsa.dt_local - interval '15 minutes'
         AND fbsa.dt_local + interval '15 minutes'")) %>% collect()

 write_csv(pafa_fbsa, "pafa_fbsa.csv")

 ester_dome_temps <- lm(data=pafa_fbsa,
                        ester_dome_temp_f ~ pafa_temp_f)

 summary(ester_dome_temps)
 # Model and coefficients are significant, r2 = 0.794
 # intercept = -2.69737, slope = 0.94268

 all_wx_with_ed <- all_wx %>%
   mutate(ed_min_temp_f=round(ester_dome_temps$coefficients[1]+
                              tmin_f*ester_dome_temps$coefficients[2], 1),
          ed_max_temp_f=round(ester_dome_temps$coefficients[1]+
                              tmax_f*ester_dome_temps$coefficients[2], 1))

 make_gt <- function(outside, instruments, chamber, width, heights) {
     gt1 <- ggplot_gtable(ggplot_build(outside))
     gt2 <- ggplot_gtable(ggplot_build(instruments))
     gt3 <- ggplot_gtable(ggplot_build(chamber))
     max_width <- unit.pmax(gt1$widths[2:3], gt2$widths[2:3], gt3$widths[2:3])
     gt1$widths[2:3] <- max_width
     gt2$widths[2:3] <- max_width
     gt3$widths[2:3] <- max_width
     gt <- gtable(widths = unit(c(width), "in"), heights = unit(heights, "in"))
     gt <- gtable_add_grob(gt, gt1, 1, 1)
     gt <- gtable_add_grob(gt, gt2, 2, 1)
     gt <- gtable_add_grob(gt, gt3, 3, 1)

     gt
 }

temps <- ggplot(data=all_wx_with_ed, aes(x=year, ymin=tmin_f, ymax=tmax_f, y=tavg_f)) +
   # geom_abline(intercept=32, slope=0, color="blue", alpha=0.25) +
   geom_rect(data=all_wx_with_ed %>% head(n=1),
            aes(xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=32),
            fill="darkcyan", alpha=0.25) +
   geom_abline(aes(slope=0,
                  intercept=mean(all_wx_with_ed$tmin_f)),
               color="darkorange", alpha=0.50, linetype=2) +
   geom_abline(aes(slope=0,
                  intercept=mean(all_wx_with_ed$tmax_f)),
               color="darkorange", alpha=0.50, linetype=2) +
   geom_abline(aes(slope=0,
                  intercept=mean(all_wx_with_ed$ed_min_temp_f)),
               color="darkorange", alpha=0.50, linetype=1) +
   geom_abline(aes(slope=0,
                  intercept=mean(all_wx_with_ed$ed_max_temp_f)),
               color="darkorange", alpha=0.50, linetype=1) +
   geom_linerange(aes(ymin=ed_min_temp_f, ymax=ed_max_temp_f)) +
   # geom_smooth(method="lm", se=FALSE) +
   geom_linerange(size=3, color="grey30") +
   scale_x_continuous(name="", limits=c(1963, 2015), breaks=seq(1963, 2015, 2)) +
   scale_y_continuous(name="Temperature (deg F)", breaks=pretty_breaks(n=10)) +
   theme_bw() +
   theme(plot.margin=unit(c(1, 1, 0, 0.5), 'lines')) +  # t, r, b, l
   theme(axis.text.x=element_blank(), axis.title.x=element_blank(),
         axis.ticks.x=element_blank(), panel.grid.minor.x=element_blank()) +
   ggtitle("Weather during and in the week prior to the Equinox Marathon
            Fairbanks Airport Station")

 prcp <- ggplot(data=all_wx, aes(x=year, y=prcp_in)) +
     geom_bar(stat="identity") +
     geom_text(aes(y=prcp_in+0.025, label=snow_label)) +
     scale_x_continuous(name="", limits=c(1963, 2015), breaks=seq(1963, 2015)) +
     scale_y_continuous(name="Precipitation (inches)", breaks=pretty_breaks(n=5)) +
     theme_bw() +
     theme(plot.margin=unit(c(0, 1, 0, 0.5), 'lines')) +  # t, r, b, l
     theme(axis.text.x=element_blank(), axis.title.x=element_blank(),
           axis.ticks.x=element_blank(), panel.grid.minor.x=element_blank())

 pweek_prcp <- ggplot(data=all_wx, aes(x=year, y=pweek_prcp_in)) +
     geom_bar(stat="identity") +
     geom_text(aes(y=pweek_prcp_in+0.1, label=pweek_snow_label)) +
     scale_x_continuous(name="", limits=c(1963, 2015), breaks=seq(1963, 2015)) +
     scale_y_continuous(name="Pre-week precip (inches)", breaks=pretty_breaks(n=5)) +
     theme_bw() +
     theme(plot.margin=unit(c(0, 1, 0.5, 0.5), 'lines'),
           axis.text.x=element_text(angle=45, hjust=1, vjust=1),
           panel.grid.minor.x=element_blank())

 rescale <- 0.75
 full_plot <- make_gt(temps, prcp, pweek_prcp,
                      16*rescale,
                      c(7.5*rescale, 2.5*rescale, 3.0*rescale))
 pdf("equinox_weather_grid.pdf", height=13*rescale, width=16*rescale)
 grid.newpage()
 grid.draw(full_plot)
 dev.off()

 fai_ed_temps <- ggplot(data=pafa_fbsa, aes(x=pafa_temp_f, y=ester_dome_temp_f)) +
   geom_rect(data=pafa_fbsa %>% head(n=1),
               aes(xmin=-Inf, ymin=-Inf, xmax=(32+2.69737)/0.94268, ymax=32),
               color="black", fill="darkcyan", alpha=0.25) +
   geom_point(position=position_jitter()) +
   geom_smooth(method="lm", se=FALSE) +
   scale_x_continuous(name="Fairbanks Airport Temperature (degrees F)") +
   scale_y_continuous(name="Ester Dome Temperature (degrees F)") +
   theme_bw() +
   ggtitle("Relationship between Fairbanks Airport and Ester Dome Temperatures
           September, 2008-2013")

 pdf("pafa_fbsa_sept_temps.pdf", height=10.5, width=10.5)
 print(fai_ed_temps)
 dev.off()

 fai_ed_wspds <- ggplot(data=pafa_fbsa, aes(x=pafa_wspd_mph, y=ester_dome_wspd_mph)) +
   geom_point(position=position_jitter()) +
   geom_smooth(method="lm", se=FALSE) +
   scale_x_continuous(name="Fairbanks Airport Wind Speed (MPH)") +
   scale_y_continuous(name="Ester Dome Wind (MPH)") +
   theme_bw() +
   ggtitle("Relationship between Fairbanks Airport and Ester Dome Wind Speeds
           September, 2008-2013")

 pdf("pafa_fbsa_sept_wspds.pdf", height=10.5, width=10.5)
 print(fai_ed_wspds)
 dev.off()

tags: Equinox Marathon running weather

Earliest 80+ degree daily maximum temperature in Fairbanks

fri, 13-may-2016, 06:02

This morning’s weather forecast:

SUNNY. HIGHS IN THE UPPER 70S TO LOWER 80S. LIGHT WINDS.

May 13th seems very early in the year to hit 80 degrees in Fairbanks, so I decided to check it out. What I’m doing here is selecting all the dates where the temperature is above 80°F, then ranking those dates by year and date, and extracting the “winner” for each year (where rank is 1).

WITH warm AS (
   SELECT extract(year from dte) AS year, dte,
      c_to_f(tmax_c) AS tmax_f
   FROM ghcnd_pivot
   WHERE station_name = 'FAIRBANKS INTL AP'
      AND c_to_f(tmax_c) >= 80.0),
ranked AS (
   SELECT year, dte, tmax_f,
      row_number() OVER (PARTITION BY year
                         ORDER BY dte) AS rank
   FROM warm)
SELECT dte,
   extract(doy from dte) AS doy,
   round(tmax_f, 1) as tmax_f
FROM ranked
WHERE rank = 1
ORDER BY doy;

And the results:

Earliest 80 degree dates, Fairbanks Airport
Date	Day of year	High temperature (°F)
1995-05-09	129	80.1
1975-05-11	131	80.1
1942-05-12	132	81.0
1915-05-14	134	80.1
1993-05-16	136	82.0
2002-05-20	140	80.1
2015-05-22	142	80.1
1963-05-22	142	84.0
1960-05-23	144	80.1
2009-05-24	144	80.1
…	…	…

If we hit 80°F today, it’ll be the fourth earliest day of year to hit that temperature since records started being kept in 1904.

Update: We didn’t reach 80°F on the 13th, but got to 82°F on May 14th, tied with that date in 1915 for the fourth earliest 80 degree temperature.

tags: Fairbanks temperature weather climate

Winter Flooding on Goldstream Creek

sun, 14-feb-2016, 10:02

Since the middle of 2010 we’ve been monitoring the level of Goldstream Creek for the National Weather Service by measuring the distance from the top of our bridge to the surface of the water or ice. In 2012 the Creek flooded and washed the bridge downstream. We eventually raised the bridge logs back up onto the banks and resumed our measurements.

This winter the Creek had been relatively quiet, with the level hovering around eight feet below the bridge. But last Friday, we awoke to more than four feet of water over the ice, and since then it's continued to rise. This morning’s reading had the ice only 3.17 feet below the surface of the bridge.

//media.swingleydev.com/img/blog/2016/02/bridge_1024.jpg

Overflow within a few feet of the bridge

Water also entered the far side of the slough, and is making it’s way around the loop, melting the snow covering the old surface. Even as the main channel stops rising and freezes, water moves closer to the dog yard from the slough.

//media.swingleydev.com/img/blog/2016/02/slough_1024.jpg

Water entering the slough

One of my longer commutes to work involves riding east on the Goldstream Valley trails, crossing the Creek by Ballaine Road, then riding back toward the house on the north side of the Creek. From there, I can cross Goldstream Creek again where the trail at the end of Miller Hill Road and the Miller Hill Extension trail meet, and ride the trails the rest of the way to work. That crossing is also covered with several feet of water and ice.

//media.swingleydev.com/img/blog/2016/02/mh_crossing_1024.jpg

Trail crossing at the end of Miller Hill

Yesterday one of my neighbors sent email with the subject line, “Are we doomed?,” so I took a look at the heigh data from past years. The plot below shows the height of the Creek, as measured from the surface of the bridge (click on the plot to view or download a PDF, R code used to generate the plot appears at the bottom of this post).

The orange region is the region where the Creek is flowing; between my reporting of 0% ice in spring and 100% ice-covered in fall. The data gap in July 2014 was due to the flood washing the bridge downstream. Because the bridge isn’t in the same location, the height measurements before and after the flood aren’t completely comparable, but I don’t have the data for the difference in elevation between the old and new bridge locations, so this is the best we’ve got.

//media.swingleydev.com/img/blog/2016/02/creek_heights_2010-2016_by_year.svgz

Creek heights by year

The light blue line across all the plots shows the current height of the Creek (3.17 feet) for all years of data. 2012 is probably the closest year to our current situation where the Creek rose to around five feet below the bridge in early January. But really nothing is completely comparable to the situation we’re in right now. Breakup won’t come for another two or three months, and in most years, the Creek rises several feet between February and breakup.

Time will tell, of course, but here’s why I’m not too worried about it. There’s another bridge crossing several miles downstream, and last Friday there was no water on the surface, and the Creek was easily ten feet below the banks. That means that there is a lot of space within the banks of the Creek downstream that can absorb the melting water as breakup happens. I also think that there is a lot of liquid water trapped beneath the ice on the surface in our neighborhood and that water is likely to slowly drain out downstream, leaving a lot of empty space below the surface ice that can accommodate further overflow as the winter progresses. In past years of walking on the Creek I’ve come across huge areas where the top layer of ice dropped as much as six feet when the water underneath drained away. I’m hoping that this happens here, with a lot of the subsurface water draining downstream.

The Creek is always reminding us of how little we really understand what’s going on and how even a small amount of flowing water can become a huge force when that water accumulates more rapidly than the Creek can handle it. Never a dull moment!

Code

library(readr)
library(dplyr)
library(tidyr)
library(lubridate)
library(ggplot2)
library(scales)

wxcoder <- read_csv("data/wxcoder.csv", na=c("-9999"))
feb_2016_incomplete <- read_csv("data/2016_02_incomplete.csv",
                                na=c("-9999"))

wxcoder <- rbind(wxcoder, feb_2016_incomplete)

wxcoder <- wxcoder %>%
   transmute(dte=as.Date(ymd(DATE)), tmin_f=TN, tmax_f=TX, tobs_f=TA,
             tavg_f=(tmin_f+tmax_f)/2.0,
             prcp_in=ifelse(PP=='T', 0.005, as.numeric(PP)),
             snow_in=ifelse(SF=='T', 0.05, as.numeric(SF)),
             snwd_in=SD, below_bridge_ft=HG,
             ice_cover_pct=IC)

creek <- wxcoder %>% filter(dte>as.Date(ymd("2010-05-27")))

creek_w_year <- creek %>%
   mutate(year=year(dte),
         doy=yday(dte))

ice_free_date <- creek_w_year %>%
   group_by(year) %>%
   filter(ice_cover_pct==0) %>%
   summarize(ice_free_dte=min(dte), ice_free_doy=min(doy))

ice_covered_date <- creek_w_year %>%
   group_by(year) %>%
   filter(ice_cover_pct==100, doy>182) %>%
   summarize(ice_covered_dte=min(dte), ice_covered_doy=min(doy))

flowing_creek_dates <- ice_free_date %>%
   inner_join(ice_covered_date, by="year") %>%
   mutate(ymin=Inf, ymax=-Inf)

latest_obs <- creek_w_year %>%
   mutate(rank=rank(desc(dte))) %>%
   filter(rank==1)

current_height_df <- data.frame(
      year=c(2011, 2012, 2013, 2014, 2015, 2016),
      below_bridge_ft=latest_obs$below_bridge_ft)

q <- ggplot(data=creek_w_year %>% filter(year>2010),
            aes(x=doy, y=below_bridge_ft)) +
   theme_bw() +
   geom_rect(data=flowing_creek_dates %>% filter(year>2010),
             aes(xmin=ice_free_doy, xmax=ice_covered_doy, ymin=ymin, ymax=ymax),
             fill="darkorange", alpha=0.4,
             inherit.aes=FALSE) +
   # geom_point(size=0.5) +
   geom_line() +
   geom_hline(data=current_height_df,
              aes(yintercept=below_bridge_ft),
              colour="darkcyan", alpha=0.4) +
   scale_x_continuous(name="",
                      breaks=c(1,32,60,91,
                               121,152,182,213,
                               244,274,305,335,
                               365),
                      labels=c("Jan", "Feb", "Mar", "Apr",
                               "May", "Jun", "Jul", "Aug",
                               "Sep", "Oct", "Nov", "Dec",
                               "Jan")) +
   scale_y_reverse(name="Creek height, feet below bridge",
                   breaks=pretty_breaks(n=5)) +
   facet_wrap(~ year, ncol=1)

width <- 16
height <- 16
rescale <- 0.75
pdf("creek_heights_2010-2016_by_year.pdf",
    width=width*rescale, height=height*rescale)
print(q)
dev.off()
svg("creek_heights_2010-2016_by_year.svg",
    width=width*rescale, height=height*rescale)
print(q)
dev.off()

tags: Goldstream Creek R weather bridge flood

Average daily temperature calculation, CRN data

sat, 25-apr-2015, 10:21

Introduction

One of the best sources of weather data in the United States comes from the National Weather Service's Cooperative Observer Network (COOP), which is available from NCDC. It's daily data, collected by volunteers at more than 10,000 locations. We participate in this program at our house (station id DW1454 / GHCND:USC00503368), collecting daily minimum and maximum temperature, liquid precipitation, snowfall and snow depth. We also collect river heights for Goldstream Creek as part of the Alaska Pacific River Forecast Center (station GSCA2). Traditionally, daily temperature measurements were collecting using a minimum maximum thermometer, which meant that the only way to calculate average daily temperature was by averaging the minimum and maximum temperature. Even though COOP observers typically have an electronic instrument that could calculate average daily temperature from continuous observations, the daily minimum and maximum data is still what is reported.

In an earlier post we looked at methods used to calculate average daily temperature, and if there are any biases present in the way the National Weather Service calculates this using the average of the minimum and maximum daily temperature. We looked at five years of data collected at my house every five minutes, comparing the average of these temperatures against the average of the daily minimum and maximum. Here, we will be repeating this analysis using data from the Climate Reference Network stations in the United States.

The US Climate Reference Network is a collection of 132 weather stations that are properly sited, maintained, and include multiple redundant measures of temperature and precipitation. Data is available from http://www1.ncdc.noaa.gov/pub/data/uscrn/products/ and includes monthly, daily, and hourly statistics, and sub-hourly (5-minute) observations. We’ll be focusing on the sub-hourly data, since it closely matches the data collected at my weather station.

A similar analysis using daily and hourly CRN data appears here.

Getting the raw data

I downloaded all the data using the following Unix commands:

$ wget http://www1.ncdc.noaa.gov/pub/data/uscrn/products/stations.tsv
$ wget -np -m http://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/
$ find www1.ncdc.noaa.gov/ -type f -name 'CRN*.txt' -exec gzip {} \;

The code to insert all of this data into a database can be found here. Once inserted, I have a table named crn_stations that has the station data, and one named crn_subhourly with the five minute observation data.

Methods

Once again, we’ll use R to read the data, process it, and produce plots.

Libraries

Load the libraries we need:

library(dplyr)
library(lubridate)
library(ggplot2)
library(scales)
library(grid)

Connect to the database and load the data tables.

noaa_db <- src_postgres(dbname="noaa", host="mason")

crn_stations <- tbl(noaa_db, "crn_stations") %>%
    collect()

crn_subhourly <- tbl(noaa_db, "crn_subhourly")

Remove observations without temperature data, group by station and date, calculate average daily temperature using the two methods, remove any daily data without a full set of data, and collect the results into an R data frame. This looks very similar to the code used to analyze the data from my weather station.

crn_daily <-
    crn_subhourly %>%
        filter(!is.na(air_temperature)) %>%
        mutate(date=date(timestamp)) %>%
        group_by(wbanno, date) %>%
        summarize(t_mean=mean(air_temperature),
                  t_minmax_avg=(min(air_temperature)+
                                max(air_temperature))/2.0,
                  n=n()) %>%
        filter(n==24*12) %>%
        mutate(anomaly=t_minmax_avg-t_mean) %>%
        select(wbanno, date, t_mean, t_minmax_avg, anomaly) %>%
        collect()

The two types of daily average temperatures are calculated in this step:

summarize(t_mean=mean(air_temperature),
            t_minmax_avg=(min(air_temperature)+
                        max(air_temperature))/2.0)

Where t_mean is the value calculated from all 288 five minute observations, and t_minmax_avg is the value from the daily minimum and maximum.

Now we join the observation data with the station data. This attaches station information such as the name and latitude of the station to each record.

crn_daily_stations <-
    crn_daily %>%
        inner_join(crn_stations, by="wbanno") %>%
        select(wbanno, date, state, location, latitude, longitude,
               t_mean, t_minmax_avg, anomaly)

Finally, save the data so we don’t have to do these steps again.

save(crn_daily_stations, file="crn_daily_averages.rdata")

Results

Here are the overall results of the analysis.

summary(crn_daily_stations$anomaly)

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
## -11.9000  -0.1028   0.4441   0.4641   1.0190  10.7900

The average anomaly across all stations and all dates is 0.44 degrees Celsius (0.79 degrees Farenheit). That’s a pretty significant error. Half the data is between −0.1 and 1.0°C (−0.23 and +1.8°F) and the full range is −11.9 to +10.8°C (−21.4 to +19.4°F).

Plots

Let’s look at some plots.

Raw data by latitude

To start, we’ll look at all the anomalies by station latitude. The plot only shows one percent of the actual anomalies because plotting 512,460 points would take a long time and the general pattern is clear from the reduced data set.

set.seed(43)
p <- ggplot(data=crn_daily_stations %>% sample_frac(0.01),
            aes(x=latitude, y=anomaly)) +
    geom_point(position="jitter", alpha="0.2") +
    geom_smooth(method="lm", se=FALSE) +
    theme_bw() +
    scale_x_continuous(name="Station latitude", breaks=pretty_breaks(n=10)) +
    scale_y_continuous(name="Temperature anomaly (degrees C)",
                       breaks=pretty_breaks(n=10))

print(p)

The clouds of points show the differences between the min/max daily average and the actual daily average temperature, where numbers above zero represent cases where the min/max calculation overestimates daily average temperature. The blue line is the fit of a linear model relating latitude with temperature anomaly. We can see that the anomaly is always positive, averaging around half a degree at lower latitudes and drops somewhat as we proceed northward. You also get a sense from the actual data of how variable the anomaly is, and at what latitudes most of the stations are found.

Here are the regression results:

summary(lm(anomaly ~ latitude, data=crn_daily_stations))

##
## Call:
## lm(formula = anomaly ~ latitude, data = crn_daily_stations)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -12.3738  -0.5625  -0.0199   0.5499  10.3485
##
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.7403021  0.0070381  105.19   <2e-16 ***
## latitude    -0.0071276  0.0001783  -39.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9632 on 512458 degrees of freedom
## Multiple R-squared:  0.00311,    Adjusted R-squared:  0.003108
## F-statistic:  1599 on 1 and 512458 DF,  p-value: < 2.2e-16

The overall model and coefficients are highly significant, and show a slight decrease in the positive anomaly as we move farther north. Perhaps this is part of the reason why the analysis of my station (at a latitude of 64.89) showed an average anomaly close to zero (−0.07°C / −0.13°F).

Anomalies by month and latitude

One of the results of our earlier analysis was a seasonal pattern in the anomalies at our station. Since we also know there is a latitudinal pattern, in the data, let’s combine the two, plotting anomaly by month, and faceting by latitude.

Station latitude are binned into groups for plotting, and the plots themselves show the range that cover half of all anomalies for that latitude category × month. Including the full range of anomalies in each group tends to obscure the overall pattern, and the plot of the raw data didn’t show an obvious skew to the rarer anomalies.

Here’s how we set up the data frames for the plot.

crn_daily_by_month <-
    crn_daily_stations %>%
        mutate(month=month(date),
               lat_bin=factor(ifelse(latitude<30, '<30',
                                     ifelse(latitude>60, '>60',
                                            paste(floor(latitude/10)*10,
                                                  (floor(latitude/10)+1)*10,
                                                  sep='-'))),
                              levels=c('<30', '30-40', '40-50',
                                       '50-60', '>60')))

summary_stats <- function(l) {
    s <- summary(l)
    data.frame(min=s['Min.'],
               first=s['1st Qu.'],
               median=s['Median'],
               mean=s['Mean'],
               third=s['3rd Qu.'],
               max=s['Max.'])
}

crn_by_month_lat_bin <-
    crn_daily_by_month %>%
        group_by(month, lat_bin) %>%
        do(summary_stats(.$anomaly)) %>%
        ungroup()

station_years <-
    crn_daily_by_month %>%
        mutate(year=year(date)) %>%
        group_by(wbanno, lat_bin) %>%
        summarize() %>%
        group_by(lat_bin) %>%
        summarize(station_years=n())

And the plot itself. At the end, we’re using a function called facet_adjust, which adds x-axis tick labels to the facet on the right that wouldn't ordinarily have them. The code comes from this stack overflow post.

p <- ggplot(data=crn_by_month_lat_bin,
            aes(x=month, ymin=first, ymax=third, y=mean)) +
    geom_hline(yintercept=0, alpha=0.2) +
    geom_hline(data=crn_by_month_lat_bin %>%
                        group_by(lat_bin) %>%
                        summarize(mean=mean(mean)),
               aes(yintercept=mean), colour="darkorange", alpha=0.5) +
    geom_pointrange() +
    facet_wrap(~ lat_bin, ncol=3) +
    geom_text(data=station_years, size=4,
              aes(x=2.25, y=-0.5, ymin=0, ymax=0,
                  label=paste('n =', station_years))) +
    scale_y_continuous(name="Range including 50% of temperature anomalies") +
    scale_x_discrete(breaks=1:12,
                     labels=c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')) +
    theme_bw() +
    theme(axis.text.x=element_text(angle=45, hjust=1, vjust=1.25),
          axis.title.x=element_blank())
facet_adjust(p)

Each plot shows the range of anomalies from the first to the third quartile (50% of the observed anomalies) by month, with the dot near the middle of the line at the mean anomaly. The orange horizontal line shows the overall mean anomaly for that latitude category, and the count at the bottom of the plot indicates the number of “station years” for that latitude category.

It’s clear that there are seasonal patterns in the differences between the mean daily temperature and the min/max estimate. But each plot looks so different from the next that it’s not clear if the patterns we are seeing in each latitude category are real or artificial. It is also problematic that three of our latitude categories have very little data compared with the other two. It may be worth performing this analysis in a few years when the lower and higher latitude stations have a bit more data.

Conclusion

This analysis shows that there is a clear bias in using the average of minimum and maximum daily temperature to estimate average daily temperature. Across all of the CRN stations, the min/max estimator overestimates daily average temperature by almost a half a degree Celsius (0.8°F).

We also found that this error is larger at lower latitudes, and that there are seasonal patterns to the anomalies, although the seasonal patterns don’t seem to have clear transitions moving from lower to higher latitudes.

The current length of the CRN record is quite short, especially for the sub-hourly data used here, so the patterns may not be representative of the true situation.

tags: R temperature weather climate CRN COOP ggplot

<< 0 1 2 3 4 5 6 7 8 9 10 11 12 13 >>