skip to main content
article
Free Access

The 1940 and 1950 Public Use Sample Project: Data quality issues

Published:20 May 1981Publication History
Skip Abstract Section

Abstract

The 1940 and 1950 Public Use Sample Project is the creation of 1/100 household samples from the 1940 and 1950 Censuses of Population. The data source for the samples is the microfilmed original Population Schedules which contain the census enumerator's recording of household information. The procedure to sample the universe of household listings and transcribe the sample households' data is described in the paper. A pretest of the 1940 Public Use Sample included a comparison of three methods of sampling and transcription. The results of this comparison are reported. The applicability of these procedures to similar projects is discussed.

References

  1. 1 For a general description of a Public Use Sample (PUS) and the project to produce PUS from the 1940 and 1950 Censuses, see R. Cohn "The Production and Research Potential of the 1940 and 1950 Public Use Samples," Pp. 113-117 in Data Bases in the Humanities and Social Sciences. J. Raben (Ed.), North Holland, 1980.Google ScholarGoogle Scholar
  2. 2 The sampling procedure is complicated by household members who were not at home during the enumerator's original visit being recorded on separate schedules. The sample selection procedure includes a second pass through the microfilm to transcribe the data for these household members and attach their data to the main household listing.Google ScholarGoogle Scholar
  3. 3 In the 1940 PUS an additional selection stage is required. To make the household sample self-weighting, the selection of a household is made with a probability equal to the inverse of the household size. For both 1940 and 1950, special selection procedures are used for institutional populations and populations living in other types of "group quarters," e.g., hotels, boarding houses.Google ScholarGoogle Scholar
  4. 4 Preliminary inspection of the microfilm indicated that a large proportion of the coded version of the items were illegible. The coding operation, particularly in 1940, did not take into consideration future reproduction of the schedules. Many of the colored inks and pencils used for the 1940 coding operations did not reproduce well photographically.Google ScholarGoogle Scholar
  5. 5 Two versions of the direct data entry procedure were tested. One version combined the sampling and data transcription procedures as a continuous operation performed by a single operator. The second version separated the sampling and data transcription procedures, with different operators performing the distinct tasks. There was an initial concern that the combination of the sampling and data transcription tasks would be too complex for an operator with minimal experience in this type of activity. However, analysis of the 1940 PUS pretest indicates that the combination of the two procedures is more efficient.Google ScholarGoogle Scholar
  6. 6 These timing estimates are the average elasped time for processing fifty households among operators who had previously processed 300 households. There is evidence of a rather steep learning curve for both types of processing. The direct entry data transcribers improved their speed of data entry by 50% over the period of processing 350 households. Hand transcription improved by 33% over the same period.Google ScholarGoogle Scholar
  7. 7 In production, the sampling and data entry procedures are verified on a 10% sample basis. The sampling verification procedure selects 10% of the microfilm reels for resampling. If the sampling verifier does not select the same households within a certain margin of error the entire microfilm reel is rejected and resampled. The data entry verification involves the rekeying of the data for 10% of the selected households. If the entered data from the verifier do not match the originally entered data within a margin of error, the entire reel is rejected and the data are rekeyed.Google ScholarGoogle Scholar

Index Terms

  1. The 1940 and 1950 Public Use Sample Project: Data quality issues

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM SIGSOC Bulletin
            ACM SIGSOC Bulletin  Volume 12-13, Issue 4-1
            Aug. 1981
            89 pages
            ISSN:0163-5794
            DOI:10.1145/1015528
            Issue’s Table of Contents
            • cover image ACM Conferences
              CHI '81: Proceedings of the Joint Conference on Easier and More Productive Use of Computer Systems. (Part - I): Information Processing in the Social Sciences and Humanities - Volume 1981
              May 1981
              75 pages
              ISBN:0897910567
              DOI:10.1145/800275

            Copyright © 1981 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 May 1981

            Check for updates

            Qualifiers

            • article
          • Article Metrics

            • Downloads (Last 12 months)17
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader