CEPR Uniform Data Extracts
ceprDATA.org provides consistent, user-friendly versions of the Survey of Income and Program Participation (SIPP), Current Population Survey (CPS), American Community Survey (ACS), and other datasets used at CEPR available to all interested policy researchers and academics.
Each dataset listed above is available to download. In addition, you can download and modify any of the underlying Stata programs used to generate the CEPR Uniform Extracts.
All of the programs and data on ceprDATA.org are free and open-source. If you have any questions or comments, please contact us.
What We Do
CEPR creates Uniform Extracts and makes them available to the public free of charge. The CEPR Uniform Extracts are fully documented, user-friendly, cross-platform, and utilize open source code. The extracts include nearly 400 commonly used variables from each dataset, including weights, basic demographics, labor market indicators, and income, which have been recoded to be consistent across panels (to the extent possible) and in a format that is easy-to-use for most analytic purposes.
A typical researcher with experience working with microdata can expect to be "up and running" after a few weeks of familiarizing themselves with the documentation, programs, and data provided by CEPR, rather than having to start from scratch and spend months (if not years, as we have done) just understanding the basics. CEPR Uniform Extracts provide a "jump start" for researchers to help them learn from our experiences working with complicated data structures and finding errors. Researchers can then spend their time focusing on their real goal, analysis.
The documentation includes all code (written in Stata 8/9) that generates the Extracts, and, for each Extract, codebooks. The inclusion of the Stata code allows the researcher to not only see exactly what we've done, but also easily modify it or add new variables from each dataset as needed.
The program for each Extract runs for years where data is available, across platforms (Unix, Linux, Mac, Windows), creating a set of variables that, to the extent possible, are consistent across panels. Inconsistencies are clearly noted in the appropiate codebooks. The codebooks also note which raw variables were used to generate each variable.
More details about common data issues can be found on our the Documentation page for each dataset. If you know something about any of the datasets that you'd like to share with users, please send us your paper or comments and we will add them to our resources as well. We can be reached at ceprdata [at] cepr [dot] net.
Why We Do It
We've been working with these datasets for years and found that we wanted to eliminate the problem of reinventing the wheel whenever we needed an extract of the data. The problem is that in order to come back to a dataset, you need to take the time to clearly document what you've done. By the time we did all that, we figured we might as well share it. Our hope is that if you find this useful, you'll drop us a note, send us your research paper, and, if you add new variables to our Uniform Extracts (which you most likely will), please send us your code and we will gladly add it to our compilation of Extracts, giving you full credit. Of course, you'll have to follow our cardinal rule: document everything you do. We can be reached at ceprdata [at] cepr [dot] net.
When using this data for analysis, please cite it accordingly.
CPS
- Basic Monthly (1994-2011)
- DWS (1994-2010)
- Job Tenure (1998-2004)
- ORG (1979-2011)
- March (1980-2011)