Welcome to BEE 4850/5850!


Lecture 01

January 22, 2024

Course Overview

About Me

Instructor: Prof. Vivek Srikrishnan,

Interests:

  • Bridging Earth science, data science, and decision science to improve climate risk management;
  • Unintended consequences which result from neglecting uncertainty or system dynamics.

Meet My Supervisors

My Supervisors

What Do You Hope To Get Out Of This Course?

Take a moment, write it down, and we’ll share!

Course Motivation

Why Does Data Analysis Matter?

  • Scientific insight;
  • Decision-making;
  • Understanding uncertainty

The Ideal

![]{https://imgs.xkcd.com/comics/statistics.png}

Source: XKCD 2400

Unique/Challenging Features Of Data

There are many features of environmental (and biological!) data which make data analysis interesting and hard.

Extreme Events

Source: Doss-Gollin & Keller (2023)

Extreme Events

Source: XKCD 2107

Correlated Uncertainties

Source: Errickson et al. (2021)

Non-Stationarity

Source: Fagnant et al. (2020)

Forcing & Structural Uncertainty

Source: Doss-Gollin & Keller (2023)

Deep Uncertainty

Source: Srikrishnan et al. (2022)

Modes of Data Analysis

Misspecification Can Bias Inferences…

Source: Ruckert et al. (2017)

…And Projections

Source: Ruckert et al. (2017)

Some Problems With The “Standard” Data Analysis Toolkit

  • Statistical assumptions may not be valid;
  • “Null” vs “Alternative” hypotheses and tests may be chosen for computational convenience, not scientific relevance.

Important: “Big” data doesn’t solve the problem!

Advantages of Model-Based Data Analysis

We can:

  • Examine logical implications of model assumptions.
  • Assess evidence for multiple hypotheses by generating simulated data.
  • Identify opportunities to design future experiments or observations to distinguish between competing hypotheses.

Workflow/Course Organization

Course Policies

Background Knowledge: Computing

  • Basics (at the level of CS 111x)
  • Some extra work/effort may be needed if you haven’t coded in a while.
  • May need some additional familiarity with statistical packages (and “light” optimization)

Background Knowledge: Probability/Statistics

  • ENGRD 2700/CEE 3040
  • Summary statistics of data
  • Probability distributions
  • Basic visualizations
  • Monte Carlo basics

Grades

Assessment % of Grade
Exercises 10%
Readings 10%
Literature Critique 15%
Homework Assignments 30%
Term Project 35%

Overall Guidelines

  • Collaboration highly encouraged, but all work must reflect your own understanding
  • Submit PDFs on Gradescope
  • 50% penalty for late submission (up to 24 hours)
  • Standard rubric available on website
  • Always cite external references

Exercises

  • (Mostly) weekly problem sets
  • Focus on conceptual material/small data analysis exercises
  • Will drop one.

Readings

  • Several readings assigned for discussion throughout the semester.
  • One student responsible for leading the discussion (Ed/in class)

Literature Critique

  • Select a peer-reviewed journal article which analyzes data;
  • Short discussion paper analyzing:
    • Scientific hypotheses;
    • Modeling and statistical choices
  • In-class presentation before spring break
  • 5850 Students: Write a referee report

Homework Assignments

  • More in-depth problems
  • Roughly 2 weeks to complete
  • Will not drop any by default
  • Regrade requests must be made within one week
  • 5850 Students: Some extra problems

Term Project

  • Analyze a data set of interest using model(s) of your choice
  • Can work individually or groups of 2
  • Several deliverables throughout the semester
  • Final in-class presentation and report

Attendance

Not required, but students tend to do better when they’re actively engaged in class.

Office Hours

  • MW 10-11 AM, 318 Riley-Robb
  • Almost impossible to find a time that works for all (or even most); please feel free to make appointments as/if needed.

Accomodations

If you have any access barriers in this class, please seek out any helpful accomodations.

  • Get an SDS letter.
  • If you need an accomodation before you have an official letter, please reach out to me ASAP!

Academic Integrity

Hopefully not a concern…

  • Collaboration is great and is encouraged!
  • Knowing how to find and use helpful resources is a skill we want to develop.
  • Don’t just copy…learn from others and give credit.
  • Submit your own original work.

Academic Integrity

Obviously, just copying down answers from Chegg or ChatGPT and passing them off as your own is not ok.

ChatGPT: The Stochastic Parrot

Think about ChatGPT as a drunk who tells stories for drinks.

It will give you plausible-looking text or code on any topic, but it doesn’t know anything beyond what it “overheard.”

Caution

ChatGPT can be useful for certain tasks (e.g. understanding code errors), but may neglect context for why/when certain information or solutions work.

Just think about it as an unreliable Google search.

Class Tools

Communications

Use Ed Discussion for questions and discussions about class, homework assignments, etc.

  • Try to use public posts so others can benefit from questions and can weigh in.
  • I will make announcements through Ed.

Email

When urgency or privacy is required, email is ok.

Important

Please include BEE4850 in your email subject line! This will ensure it doesn’t get lost in the shuffle.

Better: Use Ed Discussion and reserve email for matters that are particular urgent and/or require privacy.

Course Website

https://viveks.me/simulation-data-analysis

  • Central hub for information, schedule, and policies
  • Will add link and some information to Canvas (assignment due dates, etc)

Computing Tools

  • Course is programming language-agnostic.
  • Assignments will have notebooks set up for Julia (environments, etc) on GitHub.

Some Tips For Success

  • Start the homeworks early; this gives time to sort out conceptual problems and debug.
  • Ask questions (in class and online) and try to help each other.
  • Give me feedback!

Upcoming Schedule

Next Classes

Wednesday: Hypothesis testing and data analysis

Next Week: Review of uncertainty and probability

Assessments

Homework 1 available; due next Friday.

Exercise 1 due this Friday.

References

References (Scroll for Full List)

Doss-Gollin, J., & Keller, K. (2023). A subjective Bayesian framework for synthesizing deep uncertainties in climate risk management. Earths Future, 11, e2022EF003044. https://doi.org/10.1029/2022ef003044
Errickson, F. C., Keller, K., Collins, W. D., Srikrishnan, V., & Anthoff, D. (2021). Equity is more important for the social cost of methane than climate uncertainty. Nature, 592, 564–570. https://doi.org/10.1038/s41586-021-03386-6
Fagnant, C., Gori, A., Sebastian, A., Bedient, P. B., & Ensor, K. B. (2020). Characterizing spatiotemporal trends in extreme precipitation in Southeast Texas. Nat. Hazards, 104, 1597–1621. https://doi.org/10.1007/s11069-020-04235-x
Ruckert, K. L., Guan, Y., Bakker, A. M. R., Forest, C. E., & Keller, K. (2017). The effects of time-varying observation errors on semi-empirical sea-level projections. Clim. Change, 140, 349–360. https://doi.org/10.1007/s10584-016-1858-z
Srikrishnan, V., Guan, Y., Tol, R. S. J., & Keller, K. (2022). Probabilistic projections of baseline twenty-first century CO2 emissions using a simple calibrated integrated assessment model. Clim. Change, 170, 37. https://doi.org/10.1007/s10584-021-03279-7