There is talk of eliminating the SIPP, and CEPR is organizing researchers and organizations to weigh in on this debate. Go here for more information.
SIPP Uniform Data Extracts
Background
CEPR creates Uniform Extracts of the U.S. Census Survey of Income and Program Participation (SIPP). Download the data, or the underlying programs and documentation used to create the extracts. If you don't want to download the data files (which are quite large), you can order CDs of the final data.
About the SIPP
The SIPP is a multi-panel, nationally representative dataset created by the U.S. Census. A panel dataset is one that follows the same individuals over time. The first SIPP panel was begun in the mid-1980s and the latest one was begun in 2004. The SIPP tracks individuals for two to four years, depending on the panel. SIPP respondents are asked questions every fourth month about their experiences over the prior four months.
SIPP respondents are asked questions about their participation in income maintenance programs, such as welfare and unemployment compensation, their household and family composition, employment and earnings, access to services, including health insurance and child-care, assets, and other topics. Some of these questions are only asked once or twice during the panel, but demographics, program participation, and employment-related information are asked each month.
The SIPP is the preferred dataset for some specific questions:
- The SIPP contains much more information than typically used datasets, such as the Current Population Survey, for individuals over time. For example, the topical modules provide information on the assets of individuals as well as their work schedules and child care usage.
- Usually, labor market data is "static" in that it covers only one point in time. Thus, from the Current Population Survey, we can determine someone's wages in March 2002. However, with the SIPP, we can follow an individual over time so we know how much wages have grown for individuals from March 2000 to March 2002 as well as how long they've been at their job and whether or not they had training earlier in the year, enriching our analysis of what is happening in the labor market.
What We Do
CEPR creates Uniform Extracts from the SIPP and makes them available to the public free of charge. The CEPR SIPP Uniform Extracts are fully documented, user-friendly, cross-platform, and utilize open source code. The extracts include nearly 400 commonly used variables from the SIPP, including weights, basic demographics, labor market indicators, income, and program participation, which have been recoded to be consistent across panels (to the extent possible) and in a format that is easy-to-use for most analytic purposes.
A typical researcher with experience working with microdata can expect to be "up and running" after a few weeks of familiarizing themselves with the documentation, programs, and data provided by CEPR, rather than having to start from scratch and spend months (if not years, as we have done) just understanding the basics. CEPR SIPP Uniform Extracts provide a "jump start" for researchers to help them learn from our experiences working with SIPP's complicated data structure and finding errors. Researchers can then spend their time focusing on their real goal, analysis.
The documentation includes all code (written in Stata 8/9) that generates the Extracts, and, for each Extract, codebooks and User Notes, which include analysis of comparability with other published data. The inclusion of the Stata code allows the researcher to not only see exactly what we've done, but also easily modify it or add new variables from the SIPP, including Topical Module data, as needed.
The Extracts are divided up thematically, with each one covering a separate topic, in order to facilitate ease of use and to keep them to a manageable size. For example, Extract B: Demographics contains demographic information on all SIPP respondents. The program for each Extracts runs for SIPP panels 1990 through 2001 (with 2004 coming soon!), across platforms (Unix, Linux, Mac, Windows), creating a set of variables that, to the extent possible, are consistent across panels. Inconsistencies are clearly noted in the codebooks. The codebooks also note which raw SIPP variables were used to generate each variable. A separate document, Crosswalk [xls format], shows how the raw SIPP variables are matched across panels.
For example, the education variable changes considerably between the 1993 and 1996 panels. We have generated a new education variable that is more consistent across panels. The User Notes show how this variable compares to other published education data. Further, since we provide all our Stata programs, other researchers can reject our "fix" and use their own, but at least they know of the problem, know what we think in the best solution, and have documentation on why we think so.
More details about common SIPP issues can be found on our SIPP Documentation page. If you know something about SIPP that you'd like to share with SIPP users, please send us your paper or comments and we will add them to our resources as well. We can be reached at ceprdata [at] cepr [dot] net.
Why We Do It
We've been working with the SIPP for years and found that we wanted to eliminate the problem of reinventing the wheel whenever we needed an extract of the SIPP data. The problem is that in order to come back to a dataset, you need to take the time to clearly document what you've done. By the time we did all that, we figured we might as well share it. Our hope is that if you find this useful, you'll drop us a note, send us your research paper, and, if you add new variables to our Uniform Extracts (which you most likely will), please send us your code and we will gladly add it to our compilation of Extracts, giving you full credit. Of course, you'll have to follow our cardinal rule: document everything you do. We can be reached at ceprdata [at] cepr [dot] net.
When using this data for analysis, please cite it as
- Center for Economic and Policy Research. 2006. SIPP Uniform Extracts, Version 2.0. Washington, DC.
Getting Started
To use CEPR's SIPP Uniform Extracts, you should first do a few of things. Take a look at the documentation to get a sense of what variables we include and how the SIPP compares to other data sources. Then, take a look at the programs to learn how we construct our SIPP Uniform Extracts and how to most efficiently make use of them. Finally, take a look at our page of publications to see what is possible with the SIPP and what others have done. Please feel free to contact us at ceprdata [at] cepr [dot] net if you have questions or comments.