Oliver Sherouse Writes Occasionally

on Public Policy
and Python Programming

Should the Results of Scotland's Independence Referendum Matter?

15 Sep 2014

This Thursday, Scotland will hold a referendum on independence. The conventional wisdom holds that the vote will be close, but that the “No” vote will carry the day. While the UK government does have final say on constitutional matters (and thus on independence), everyone seems to assume that it will honor the results of the referendum.

This entire situation highlights the absurdity of making important decisions through majority-rules direct democracy. Consider the possible outcomes. If the vote is 52 percent “Yes” to 48 percent “No”, then 48 percent of the populace, some two and a half million souls,1 will find their links to what they consider their home country severed. They will be forced, against their will, to either physically leave their homes to remain within the UK or try to mentally and emotionally locate themselves in a new political unit that they do not want or recognize as their own.

If, on the other hand, the vote is 52 percent “No” to 48 percent “Yes”, then you’ve got a similarly massive portion of the nation who actively wants not to be part of it—and not at some future point, but as soon as possible. No nation can tolerate that state of affairs for long with tranquility unless we assume that the no voters are just breezy and faddish. I’m not willing to believe that.

Surely, for such an important question, fifty-plus-one doesn’t cut it. Blindly following the referendum when public opinion is divided so evenly can only lead to a sense of disaffected displacement for the losing side. Whatever the economic effects,2 the social and psychological effects of feeling disconnected from your home country matter. A shared culture, a common historical heritage, a sense of community with your fellow citizens—these are all things that my fellow libertarians like to wave away as “tribalism,” but that are in fact necessary parts of forming a human identity.

As a matter of practicality the best result on Thursday is a “No” vote, not because I’m convinced the Scotland should remain part of the UK, but because it’s a push. The Scots can gain independence in the future much more easily than they can regain union. But that extra time only matters if the Scots use it to build a real consensus. Perhaps that means that the UK government can do a better job identifying and satisfying the concerns of those who favor independence. Maybe it means that the pro-Independence side can do a better job persuading their fellow Scots that an independent Scotland is not only attainable and advisable, but a place that will still be their home.

But that consensus has to be established before either side can claim a legitimate victory. Thursday’s referendum can only reveal the contours of the problem, not the wisest solution. In fact, it has probably already told us all it can.

  1. Yes, I’m assuming that the polls mirror the country exactly. If this bothers you, round down to one million out of five total. I don’t see how that makes things much better.

  2. Not that the economic effects aren’t important, but I haven’t studied them enough to make an argument either way, and they’re beside the point I’m making.

Comparing Price Indexes

18 Apr 2014

FRED, the excellent Federal Reserve Internet Database has a new blog, the goal of which seems to be to post interesting graphs and not say anything too controversial about them.

Recently, the FRED bloggers produced a post about the various price indexes used to measure inflation. There are lots of laws indexed to inflation—tax brackets, Social Security benefits, &c.—and small changes in how you measure it can matter quite a bit.

FRED identified three qualifications for a good price index in a policy context. An index, in their view, needs to: cover a sufficient part of the economy, be available immediately, and not pick up too much noise from price changes in particular products.

To see which performs best, they produced a chart with four different price indexes: the Consumer Price Index (CPI) for all items, the Consumer Price Index less food and energy, the Personal Consumer Expenditures (PCE) chain-type index, and the GDP deflator:

FRED chart

I think this is a bad chart, for two reasons: they use numbers indexed to different years (1982-84 for the CPIs, 2009 for the others), and because they’ve left off two increasingly mentioned options: the chained CPI and the chained CPI less food and energy. Here’s a chart with those two added, and everything indexed to 2000, which is when those two series became available:

Six Series

That’s also a bad chart, because there’s too much stuff on there. But there are two big standouts: First, the simple CPI increases way faster than everything else. In fact we’ve known for a while that the CPI actually over-states inflation, so it’s almost certainly not our best option. The Chained CPI less food and energy also seems like an outlier, probably because it’s excluding a large part of the economy. That seems like a violation of the Fed’s “cover as much of the economy as you can” qualification, so let’s get rid of it and the regular CPI Less Food and Energy as well. That leaves us three options: PCE, GDP deflator, and the chained CPI:

Three Series

And here’s what we get if we look at annualized inflation for those three series:

Inflation

Now, those three lines pretty clearly tell similar stories, so we’re getting into “close enough for government work” territory, here. The PCE and chained CPI are both jumpier, and show a big spike in inflation in 2008 and a big drop towards deflation in late 2008-2009. The deflator is calmer, and the financial crash looks more like a steady decline.

Ultimately, my instinct has always been to use the GDP deflator, exactly because it covers the entire economy and not some arbitrary basket of goods. The jumpiness of the chained CPI and PCE, especially when you look at the financial crisis, strike me as pulling in a lot of noise along with the signal. But in any case, all three—the deflator, PCE, or chained CPI—seem to be significantly more reliable than the plain old CPI. Which, unfortunately, is what we currently use to index almost everything.

Are We Over-Reacting to the Employee-Population Ratio?

08 Feb 2014

On Twitter, my old econ prof Don Marron points to a blog post at the New York Fed describing the employment-population ratio is a “A Mis-Leading Labor Market Indicator.” For context, the E/P is a number that people like me have been freaking out about since the financial crisis, because it looks like this:

Ahhhhhhhhh

That bit at the end where it goes back to 1980 levels and stays there? That’s us, right now. Wave!

Samuel Kapon and Joseph Tracy, the authors of the Fed post, argue that people like me, who react to this graph by openly weeping, are over-reacting because we aren’t taking into account changing demographics. And to prove their point, they undertook a workmanlike study of the employment patterns of different groups:

To explore this question, we take all individuals age sixteen or older from the Current Population Survey Outgoing Rotation Group samples from January 1982 to November 2013. This gives us monthly data with 10.2 million observations on individuals and their employment status. We divide these individuals into 280 different cohorts defined by each individual’s decade of birth, sex, race/ethnicity, and educational attainment. We assume that individuals within a specific cohort have similar career employment rate profiles. We use the 10.2 million observations to estimate these 280 career employment rate profiles.

Kapon and Tracy take these cohorts, recombine them to render an estimated E/P ratio based on demographics. Plotting that against the actual E/P ratio, we see that we’re not that far off after all:

Kapon and Tracy's Estimated E/P

Well, there you go: we’re not in great shape, but things aren’t apocalyptic either.

The problem is that you can’t do that. Let’s walk through the steps Kapon and Tracy took:

  1. Disaggregate a series into constituent groups
  2. Get average behavior for those groups
  3. Recombine the groups according to their proportions
  4. Draw conclusions from the fact that the estimated series is close to the observed series

That’s a tautology: that analysis literally can’t yield anything but a trend, because they’ve included all the sample data from the period they’re observing. Saying that we don’t need to worry about people not working because people right now have a pattern of not working doesn’t tell us anything. It’s the pattern that we’re worried about!

To illustrate, here’s a simple quadratic fit of the employee population ratio on the same scale as Kapon and Tracy’s graph:

Quadratic fit of E/P
ratio

I feel bad, because Kapon and Tracy went through a lot of work to basically make that graph. But if we were to only use the data before the latest recession to get our fitted line—I used through January of 2009, you get a different picture:

Quadratic fit of E/P ratio through 2009

Wow, now we’re way below trend, even taking into account the inverse U-shape we expect from an aging workforce. Here’s the difference between predicted and observed E/P when you use all data to predict, and also when you stop before 2009.

Difference between predicted and actual E/P ratio using two fits

So there’s two lessons here. First, you can’t draw any conclusions by using data to predict itself. You just can’t.

Second, the E/P Ratio is way, way below where it should be. While demographics should absolutely be taken into account, they’re just not enough to turn what we’re seeing into standard post-recession behavior.

Is Walmart's Charity Bad?

19 Nov 2013

Jacob Weissmann at The Atlantic flags a news story from Ohio about a Walmart that’s running a food drive for some of it’s own employees who fallen on particularly hard times. Moreover, he notes that the company runs a similar charity throughout its stores aimed at helping “Associates in Need”—employees who have come across unexpected expenses or hardships, “including homelessness, serious medical illnesses, and major repairs to primary vehicles.

Weissmann reads this situation as a rich company trying to make themselves feel better by encouraging charity to the workers they exploit. His subtitle says that the program is “proof that some Walmart employees can barely afford to eat.” The final paragraph of his piece runs as follows:

Again, it’s nice that Walmart has set up a charity so that its workers can lend a hand to their homeless colleagues. It’d be nicer if the company paid enough to make sure that wasn’t a concern in the first place.

I’ll give Weissmann enough credit to assume that he knows that no company can pay its employees more than they make for the company, and that these decisions have to be made on a marginal basis. His story seems to be that Walmart chooses to pay significantly less than marginal value, because otherwise you wouldn’t have employees who were one disaster away from destitution.

The problem with that story is that real wages for unskilled service employees are relatively high right now:1

Real wages of service-providers are high

If Walmart’s workers are getting severely underpaid right now, they can easily improve themselves by getting a better paying job somewhere else. But they aren’t doing that.

So here’s a story that makes more sense: Walmart does a good job of taking potential employees other companies don’t want, and making them productive enough to earn a low wage. It’s hard to get a job if you struggle with periodic homelessness, for example. But you can get one at Walmart, and now you’re earning something. How does Walmart pull this off? Some mix of management and the nature of the work, I’d imagine.

But the point is, if you’re pulling folks on or over the brink of poverty into your workforce, your workforce is going to have a lot of people on the brink of poverty. Only now those people are earning a living as respectable, productive members of society.

Where would these people be without Walmart? Unemployed. And if Walmart raised its wages, they would probably be unemployed as well, at least over time (PDF). Since those who are most vulnerable to economic distress are likely to be those who add the least marginal value, you’ll end up hurting precisely the people you’re trying to help. Instead of working, they’ll be dependent on a mix of welfare benefits and private assistance.

Weissmann wants Walmart to pay people more than they’re getting so that they don’t need to be as reliant on charity. I guess he doesn’t realize that that’s what they’re already doing.

  1. I realized I originally had slightly the wrong graph here, and updated on 4 September 2014

The Perfect Economic Analysis Workflow

13 Nov 2013

The way most people do economic data work is painful. You have to work with different programs and file types, many of which are unpleasant individually—the do file editor, the office equation editor—and few of which work well with one another. What’s worse, there’s no obvious way to organize files or keep track of changes. What a mess.

So since “beautiful is better than ugly”, I’ve put together a workflow for serious economic analysis that is coherent and straightforward. The basic outline is this:

  1. Track sources using BibTeX
  2. Do exploratory work in the IPython Notebook, using APIs to fetch data if possible
  3. Finalize your start-to-finish data processing, from data fetching and munging to statistical analysis to chart and graph creation in a Python script
  4. Write your prose using markdown as understood by Pandoc, with LaTeX where needed
  5. Use Pandoc to create a pdf version of your paper via LaTeX
  6. When you’re ready to receive feedback, upload all your source files and output datasets to a public repository at GitHub
  7. As you update your work, track changes using git

Let’s take a look at each of those a bit more fully.

Track Sources Using BibTeX

Since the first step of any meaningful research effort is an exploration of the existing literature, a good tool to keep track of your sources is essential. The general idea here is to track the sources you’re using in a form which can eventually be used by Pandoc to automatically format citations and save us a lot of trouble. Pandoc supports a number of formats, but I suggest staying with BibTeX because it’s mature, well-supported, and works well even if we slip into raw LaTeX later on. There are a lot of useful tools to make working with BibTeX easy, but I find it just as easy to edit the .bib file by hand.

Explore Data Using the IPython Notebook

It took me longer than I care to admit to get the point of the IPython Notebook, but now that I have it’s hard to imagine doing work without it. The IPython Notebook is an interactive environment that lets you execute lines of code, grouped into “cells”, which you can re-edit and re-run as often as you like, until you get it right. Using IPython’s pylab functionality, you can create matplotlib charts in-line—by far the most pleasant way to deal with them. You can even use markdown and html to include images and formatted text to explain your work to yourself or to others.

This gives you a wonderful space to play around with data. In the ideal situation, you can use libraries and APIs to download your data as required. I’ll insert here a shameless plug for my wrapper libraries for the World Bank and BLS APIs, but there’s also good support for FRED. Quandl also looks promising, though I’ll admit I haven’t actually found a use for it yet.

You can also use all your other Python data libraries, including Pandas and Statsmodels. Pandas, of course, has excellent import and export functions to help you deal with any data you can’t get through a nice library or API.

Finalize Your Python Script

As you’re playing around with your data, you’ll want to save your constants and functions to create your models, outputs, and graphs to a proper Python script. If you’re careful, you can download your IPython file as a script and just run it that way, but more often you’ll be developing the script and the exploratory analysis at the same time. Generally, I use my script as a library within IPython so that I never have to worry that I’ve copied my functions wrong or anything like that.

Writing a good script to accompany an economic analysis is worth it’s own post, but the broad principles to keep in mind are:

  • Use particularly clear names for variables, and use docstrings and comments to explain what’s going on even more than you normally would
  • To maximize replicability, use one file that can do everything from start to finish, from pulling the data to running your analysis, to outputting your final datasets and creating your plots
  • In contrast to what you would normally do with a script, don’t encapsulate variables in a main function—or if you do, be sure to return important variables into the global namespace so that they can be easily accessed in an interactive context

Write Prose in Pandoc’s Markdown

Pandoc is a heck of a program, and it’s the linchpin of the workflow presented here. Pandoc, as its name implies, takes pretty much any kind of document and turns it into pretty much any other kind. But for the best results for our purposes, you’ll want to write with markdown and output to a PDF using LaTeX.

Markdown is nothing but specially formatted plain text. Originally it was intended to be converted to HTML, but Pandoc has really brought out the fact that if you can identify italics, say, for the purposes of HTML, you can identify them for the purposes of LaTeX or Word or whatever you like. It’s easy to learn, and since it’s plain text you can use the same editor you use to create your Python script.

Pandoc has also included a number of extensions that makes its version of markdown particularly useful, including support for citations, footnotes, syntax highlighting for code, and everything else you need to write a solid paper. Perhaps most importantly, you can use raw LaTeX for math expressions or other commands, meaning you get to use LaTeX’s beautifully simple language to write out your equations, while still getting all the simplicity of markdown’s formatting.

This also means that if, say, you output a table from a statsmodels OLS model object in LaTeX format, you can include that table using an \input{} command that will simply be ignored when converting to other formats. The whole system just works together, unless you want to make custom alterations, in which case it still works together pretty well.

Create a PDF Using Pandoc and LaTeX

In the simplest case, all you’ll need to do to create a beautiful version of your paper is run it through pandoc using a command something like this:

pandoc -S --normalize -o mypaper.pdf mypaper.markdown

This will convert markdown to LaTeX, then compile using LaTeX. The -S flag tells pandoc to use smart quotes and dashes, while the --normaliaze option simply makes the output a bit cleaner technically.

If you want to do anything particularly complicated, however, such as using a custom title page layout or post-processing the translated LaTeX to set the size of figures, you may want to create a makefile on Linux or OSX, or a batch file on Windows.

For example, you might have a special skeleton LaTeX file called “mypaper.tex” with front and back matter but missing the body text, which you’ve written in Pandoc. Instead, where the body text goes, your skeleton file has an \input{body.tex} command. The makefile for this on a system that uses make might look like this:

pdf:
    pandoc -S --normalize -o body.tex mypaper.markdown
    pdflatex mypaper.tex
    bibtex mypaper
    pdflatex mypaper.tex
    pdflatex mypaper.tex

Running make would convert the body text, then run the LaTeX-BibLaTeX pattern to produce the final paper.

Again, this is really only something you have to mess with if you want to be fancy; Pandoc’s functionality will cover most of what you’ll generally want.

Release Files on GitHub

Once you’ve got your code, data, and prose drafted, you’ve got a working product ready to share with others. The draft is usable, but you’re going to want to make controlled changes, while keeping track of why and where you got your ideas. It seems to me that a version control system is by far the best way to do this, and the best one going right now is git.

Git tracks all your files and allows you to make changes incrementally, document the reasons for them, and even reverts specific changes without affecting changes before or after. Think of it as Word’s track changes, but not terrible, and you can use it for all your (text-based) files in the same way.

Currently, the easiest way to publish your files on git is to create a public repository on GitHub. GitHub is a fantastic, convenient development platform (this blog, actually, is a GitHub repo), excellent documentation, and even a tutorial to help you learn the basics of git. And it’s a particularly good platform for data, since they’ve done work making data pleasant to see right in the repository itself.

Keep Things Up-to-date

Now all you have to do is mark the changes to your code and prose as you receive feedback from others. The nice thing about git commit messages is that they allow you to include a fuller description of the reasons you’ve changed your mind (or not changed your mind) than is usually justified in the text of a paper. They also allow you to associate particular changes with the views of particular people, so it’s easier for you to know who to thank and for others to recognize where contributions were really made. When you’re done working on your paper, you can just commit your final pdf and data set, then let the files live on the web as a fully documented contribution to economic science.

Conclusion

I’ve called this the “Perfect Economic Analysis Workflow,” and that is a lie. It’s not perfect; there are still little things that bug me (Copy and pasting code between IPython Notebooks and an editor? Surely I’m missing something!), and you might choose to substitute some tools for others.

But I think it’s pretty darned good, and hopefully it will save some time and frustration for others.