Grown-up Study Abroad Synthesis (part 1)

Cross-sectional data class' synthesis

Jul 15, 2023

I’ve been trying to process just how much new content in three unfamiliar languages I’ve been exposed to in the past three weeks— Econometrics, Catalan, and Spanish (less unfamiliar but…) When I leave Monday, I’ll have been in Barcelona for a total of 26 days and will have been away from home for 27 days. This is the longest I’ve been abroad in a single city in my entire life. It seems apt to call this grown-up study abroad, as this is what it has been: focused, intense study; a budget decent enough to stay in a place with a pool but still doing my own laundry; and trying to keep up with the energy of a city that doesn’t start partying proper until 12:45 am.

But it’s probably more on point to think about the highlights from the courses I took over the past three weeks and what I took away from them in terms of content but also from pedagogy of the professors.

In-Person Week 1:

Econometrics of cross-section data with applications, Prof. Jaume Garcia

What do we do when we have survey data that isn’t actually numerical? While I would never call this “qualitative” data, economists would. Garcia started from the premise that we knew basic regression, but maybe hadn’t seen it for a while. He also was not focused on showing us elaborate algebraic proofs or differencing/transformations unless they were key to understanding some sort of important mechanism of the model at hand.

What is the end goal of all of this analysis in the end? Figuring out the probability of observing what we observe. That’s it. Garcia is just shy of retirement, but he says he keeps teaching this class because it continues to be fun for him - it’s like these models have stories and lives of their own in the rationale for their use. For a first in-person class, Garcia was an ideal prof - his focus was on developing good habits for analysis —What are the steps along the way: dude, just do some descriptive analysis first, before you do anything else. Start with the simplest model. Understand how to read the output of your program (in this case Stata) and understand what assumptions are built into the commands you are using. Here is a gentle introduction to the Maximum Likelihood Equation and the idea that individuals are going to have all sorts of reasons for ultimately making one of two choices - and we cannot know them all.

And I sort of love that approach because it really makes the idea of an error term into a human choice that we don’t pretend to know.

The big takeaway from this class was - be realistic about the data that you have. Remember the data is in the real world and use that data to inform your choices

So what did I learn: so much. We spent two days on discrete choice models - when you have a probability, essentially, or 0 or 1 option (or a 0, 1, 2, 3 option, like metro, bus, plane, car). The problem with a linear probability model is that in the equation Yi =A + BX, a fitted value could be greater than one, and once you add in the error term, if Yi=0, you are in a situation where you might have a negative slope, especially once you account for an error term, which you cannot have in a probability. So yeah, it gets messy. So this Maximum Likelihood Estimation is really - the probability of observing what we observe!

So if we have two options, using the metro (U1 = metro, U2 = walking) (with J =1, 2, or the choices of metro or walking), then the probability of Ui,j = X1iBJ+ Ei,j —> in other words, the choice to use the metro or walk could depend on sorts of things we can observe (X) and sorts of things we can’t observe (E) — and ultimately, to get that probability (and that coefficient of B), you just need to know Xi (the explanatory variable) and the difference between picking the metro or walking (Bj1- Bj2), not necessarily the estimate of the choice itself: what is the probability of picking one over the other choice 1 vs. choice 2.

And thus, we move on to the bread and butter of nonlinear binary choice models: logit and probit. The realization that Garcia tried to emphasize was that logit and probit are close relatives, and really, in the end, you don’t really need to go buckwild trying to do magic. Each element built on something that followed: Probit, Logit, marginal effects, multinomial models, conditional logit, nested logit/mixed logit, ordered models (like when you have education), count data models and everyone’s favorite way to deal with it, the Poisson model, Tobit models to estimate two different choices, continuing on to limited DVs that have double-hurdles/choices: (first choice, pick or not pick. Once you’ve picked to do the thing, what is the probability that you do the thing — Willingness to pay! Also, just because you picked no NOW doesn’t mean you will ALWAYS pick no given other conditions), and then finally looking at some basic concepts for dealing with change over time, such as the distributions for exponential, Weibul, log-logistic continuous time models and dealing with different types of data. My favorite favorite new term are protest zeros, the people who will never, never, ever pick, given any other conditions, the other option (the people who will never, ever, subscribe to the NYT, ever).

I did not mention the labs yet. So revising this entry to include this discussion. Jaume led the labs himself, which I learned was not necessarily to be expected. This gave him the ability to catch up on content not covered in our two hour lecture but also to know how to connect the concepts he was talking about in the lecture to ones in the labs - and while I know TAs can technically do this - there’s something different about the professor explaining how he understands one part of his lecture to apply to the practical experience.

Stata was not bad. I can see where R came from, but Stata seems to be standard in Europe among economists. I have the SE edition, so I guess that means econ tests and packages. There are user written commands, too. But it feels like the software to use that requires the most minimal level of programming out of all I have tried. This is the program for running quantitative analysis for people who don’t feel a real need to build something new or make something more efficient (run loops for different variables) or custom (making graphs look extra pretty or something). What Jaume did was give us the code to copy and paste into Stata with an explanation in the “do” file with what each command was doing and what it would tell you. And then together, we would read the output.

I’ve always had such a hard time getting parens right and missing basic commas and it’s always made programming or running code generally quite hard. This is not my strength, although my copy is usually quite clean and I’ve got an eagle eye for proofreading other people’s work. Why this doesn’t work for programming, I can’t tell you. But the tools! The new tools!

Something that worked for me was cutting and pasting the do files into a Google doc and then exporting as a PDF to GoodNotes - so I was able to actually write the interpretations of the output out - like side by side of the columns. For me, that was huge - being able to have a memory marker assigning actual numbers to actual meanings for the output - and warnings about commands and so forth.

As previously mentioned, I am a kinesthetic learning and often learn by writing (hence taking notes during people’s presentations, etc - it is also an adaptive strategy for ADHD, which I taught myself but luckily other people will learn to do from someone else who has noticed this can help). If I had a laptop iPad situation in previous Stats classes, I would be curious to see what might happen -obviously the worst way to think about the past is as a counterfactual, but I have to say, this was so much better for learning.

Ok, that’s just one class, but I’ll post this. If it sounds like I don’t know anything or didn’t explain something correctly, please feel free to make suggestions/changes/help.

So how should I presume?

Grown-up Study Abroad Synthesis (part 1)

Cross-sectional data class' synthesis