Contents

1 - Statistical bootstrapping - developed climate model and observations statistical assessments (2020-2022, academic work)

2 - Neural network regression - linked multiple modes of climate variability to Arctic sea ice  (2022-2023, academic work)

3 - Database: building, querying, analyzing - an opportunity to learn SQL and Go, in the context of building a database, querying, analyses, and visualization of personal running data (2023-present, personal project)

1) Statistical Bootstrapping

Aim: To compare interannual variability in climate models and observational data for Arctic sea ice concentration. 

Problem: Observations have a short time period of observation, models have coarse resolutions. Statistical bootstrapping is required to generate sufficiently long time period to enable comparisons. 

Methods: Resampling  observational and modeled sea ice concentration anomalies 10,000 times (Fig 1.1). Comparing the distribution of standard deviations to determine whether models and observations were consistent. 

Conclusions: In general, models agree well with observations, but as no model is within observational uncertainty for all months and locations, choosing the right model for a given task is crucial (Fig 1.2). It is a relatively low bar for models and observations to be consistent as observational uncertainty is high, if this were reduced models would likely be considered biased for more regions and seasons. 

Tools: Python - Dask, Scipy, Xarray, Matplotlib, Numpy, CartoPy. Shell scripting with - Cheyenne super computer, Climate Data Operators, Data downloading.

Journal arcticle: Wyburn-Powell et al. (2022) Modeled Interannual Variability of Arctic Sea Ice Cover is within Observational Uncertainty. DOI:10.1175/JCLI-D-21-0958.1.

Published data: At the Arctic Data Center - DOI:10.18739/A2H98ZF3T

Published code: synthetic-enemble Github repo, archived with Zenodo, DOI:10.5281/zenodo.6687725.

Figure 1.1 - Resampling methodology for the observations

Figure 1.2 - Consistency between models and observations

2) Neural network regression

Aim: Link climate variability modes to regional Arctic sea ice anomalies. Assess the effect of each of the modes at different lag times and for different regions. 

Problem: Sparse data and until recently, a small ensemble of climate modes has not allowed sufficient training data to detect the small signal of multiple climate modes' remote Arctic impacts. 

Methods: Dimensional reduction by using a subset of input climate variability modes as features.  Input features of climate variability modes at different lag times to regress onto regional Arctic sea ice concentration anomalies.  Select the best of 4 different complexity ML models. Remove variables sequentially and remove the worst performing variables. Select only the relationships that provide an increase in validation r2 in excess of 0.2 compared with a persistence forecast.

Conclusions: The dominant climate variability modes are global surface temperature anomaly and Nino 3.4 Index which have strong negative/positive correlations with regional Arctic sea ice (Fig 2.1). Despite the many nonlinearities in the climate system, at least with constrained available data, nonlinearites are not important to include in our regression model to produce a high predictability (Fig 2.2).

Tools: Python - PyTorch, SciPy, Numpy, Pandas,  Xarray, Matplotlib, Shell scripting frequently incorporating Climate Data Operators software on the Cheyenne supercomputer.

Journal article: Large-scale Climate Modes Drive Low-Frequency Regional Arctic Sea Ice Variability. Preprint https://doi.org/10.31223/X56D59

Data archiving: in progress

Code: low-frequency-variability Github repo

Figure 2.1 - Comparison of 4 ML methods' validation coefficients of determination with lag time.

Figure 2.2 - Linear coefficients linking specific climate variability modes with sea ice concentration anomalies in the Chukchi Sea in October, by climate model.

3)  Analyze running data with a database

Aim: Learn and practice several skills which my time as a PhD student did not cover. This includes: accessing data using an API, coding in a new language Go,  use of new data structures such as JSON, building a database in MySQL and running it in docker, querying the database with SQL, analysis of the data with a new ML method e.g. clustering.

Code: GitHub repo, also collaborated on this Github repo

Phase 1: Summer 2023 - Learning Go and SQL, building and querying a database

Phase 2: Fall 2023 - Build dynamic content for this website, use JavaScript and Tableau,  automate monthly updates with cloud services.


Figure 3 - An example activity from Strava that will from part of the database and analysis. Left shows the web page for the activity and on the right shows the data from Strava's API in JSON format.