Contents

1 - Statistical bootstrapping - developed climate mode and observations statistical assessments (2020-2022, academic work)

2 - Neural network regression - linked multiple modes of climate variability to Arctic sea ice  (2022-2023, academic work)

3 - Database: building, querying, analyzing - an opportunity to learn SQL, Spark, Go, JavaScript, AWS, in the context of building a database, querying, analysis, and visualization of personal running data (2023-present, personal project)

1) Statistical Bootstrapping

Aim: To compare interannual variability in climate models and observational data for Arctic sea ice concentration. 

Problem: Observations have a short time period of observation, models have coarse resolutions. Statistical bootstrapping is required to generate sufficiently long time period to enable comparisons. 

Methods: Resampling  observational and modeled sea ice concentration anomalies 10,000 times (Fig 1.1). Comparing the distribution of standard deviations to determine whether models and observations were consistent. 

Conclusions: In general, models agree well with observations, but as no model is within observational uncertainty for all months and locations, choosing the right model for a given task is crucial (Fig 1.2). It is a relatively low bar for models and observations to be consistent as observational uncertainty is high, if this were reduced models would likely be considered biased for more regions and seasons. 

Tools: Python - Dask, Scipy, Xarray, Matplotlib, Numpy, CartoPy. Shell scripting with - Cheyenne super computer, Climate Data Operators, Data downloading.

Journal arcticle: Wyburn-Powell et al. (2022) Modeled Interannual Variability of Arctic Sea Ice Cover is within Observational Uncertainty. DOI:10.1175/JCLI-D-21-0958.1.

Published data: At the Arctic Data Center - DOI:10.18739/A2H98ZF3T

Published code: synthetic-enemble Github repo, archived with Zenodo, DOI:10.5281/zenodo.6687725.

Figure 1.1 - Resampling methodology for the observations

Figure 1.2 - Consistency between models and observations

2) Neural network regression

Aim: Link climate variability modes to regional Arctic sea ice anomalies. Assess the effect of each of the modes at different lag times and for different regions. 

Problem: Sparse data and until recently, a small ensemble of climate modes has not allowed sufficient training data to detect the small signal of multiple climate modes' remote Arctic impacts. 

Methods: Dimensional reduction by using a subset of input climate variability modes as features.  Input features of climate variability modes at different lag times to regress onto regional Arctic sea ice concentration anomalies.  Select the best of 4 different complexity ML models. Remove variables sequentially and remove the worst performing variables. Select only the relationships that provide an increase in validation r2 in excess of 0.2 compared with a persistence forecast.

Conclusions: The dominant climate variability modes are global surface temperature anomaly and Nino 3.4 Index which have strong negative/positive correlations with regional Arctic sea ice (Fig 2.1). Despite the many nonlinearities in the climate system, at least with constrained available data, nonlinearites are not important to include in our regression model to produce a high predictability (Fig 2.2).

Tools: Python - PyTorch, SciPy, Numpy, Pandas,  Xarray, Matplotlib, Shell scripting with - Cheyenne super computer, Climate Data Operators, Data downloading.

Journal article: working title: Large-scale Climate Modes Drive Low-Frequency Regional Arctic Sea Ice Variability.

Data archiving: in progress

Code: low-frequency-variability Github repo

Figure 2.1 - Comparison of 4 ML methods' validation coefficients of determination with lag time.

Figure 2.2 - Linear coefficients linking specific climate variability modes with sea ice concentration anomalies in the Chukchi Sea in October, by climate model.

3)  Database: building, querying, analyzing

Aim: Learn and practice several skills which my time as a PhD student did not cover. This includes: accessing data using an API, coding in a new language Go,  building a database in MySQL, use of new data structures such as JSON, querying the database with SQL, analysis of the data with a new ML method e.g. clustering, 

Phase 1: Feb-Sep 2023 - Learning Go, building and querying a database, run some analyses in Python to generate static content

Phase 2: Oct-Dec 2023 - Build dynamic content for this website, use JavaScript and Tableau to navigate through data which already analyzed.

Figure 4.1 - Example activity from Strava that will from part of the database and analysis. 



Phase 3: Jan-Jun 2024 - Enable automated monthly updating of the database using AWS, automate the updating of analysis in Python and visualization of dynamic data on this website:


Code: GitHub repo will be created shortly