Abstract
The 1940 and 1950 Public Use Sample Project is the creation of 1/100 household samples from the 1940 and 1950 Censuses of Population. The data source for the samples is the microfilmed original Population Schedules which contain the census enumerator's recording of household information. The procedure to sample the universe of household listings and transcribe the sample households' data is described in the paper. A pretest of the 1940 Public Use Sample included a comparison of three methods of sampling and transcription. The results of this comparison are reported. The applicability of these procedures to similar projects is discussed.
- 1 For a general description of a Public Use Sample (PUS) and the project to produce PUS from the 1940 and 1950 Censuses, see R. Cohn "The Production and Research Potential of the 1940 and 1950 Public Use Samples," Pp. 113-117 in Data Bases in the Humanities and Social Sciences. J. Raben (Ed.), North Holland, 1980.Google Scholar
- 2 The sampling procedure is complicated by household members who were not at home during the enumerator's original visit being recorded on separate schedules. The sample selection procedure includes a second pass through the microfilm to transcribe the data for these household members and attach their data to the main household listing.Google Scholar
- 3 In the 1940 PUS an additional selection stage is required. To make the household sample self-weighting, the selection of a household is made with a probability equal to the inverse of the household size. For both 1940 and 1950, special selection procedures are used for institutional populations and populations living in other types of "group quarters," e.g., hotels, boarding houses.Google Scholar
- 4 Preliminary inspection of the microfilm indicated that a large proportion of the coded version of the items were illegible. The coding operation, particularly in 1940, did not take into consideration future reproduction of the schedules. Many of the colored inks and pencils used for the 1940 coding operations did not reproduce well photographically.Google Scholar
- 5 Two versions of the direct data entry procedure were tested. One version combined the sampling and data transcription procedures as a continuous operation performed by a single operator. The second version separated the sampling and data transcription procedures, with different operators performing the distinct tasks. There was an initial concern that the combination of the sampling and data transcription tasks would be too complex for an operator with minimal experience in this type of activity. However, analysis of the 1940 PUS pretest indicates that the combination of the two procedures is more efficient.Google Scholar
- 6 These timing estimates are the average elasped time for processing fifty households among operators who had previously processed 300 households. There is evidence of a rather steep learning curve for both types of processing. The direct entry data transcribers improved their speed of data entry by 50% over the period of processing 350 households. Hand transcription improved by 33% over the same period.Google Scholar
- 7 In production, the sampling and data entry procedures are verified on a 10% sample basis. The sampling verification procedure selects 10% of the microfilm reels for resampling. If the sampling verifier does not select the same households within a certain margin of error the entire microfilm reel is rejected and resampled. The data entry verification involves the rekeying of the data for 10% of the selected households. If the entered data from the verifier do not match the originally entered data within a margin of error, the entire reel is rejected and the data are rekeyed.Google Scholar
Index Terms
- The 1940 and 1950 Public Use Sample Project: Data quality issues
Recommendations
The 1940 and 1950 Public Use Sample Project: Data quality issues
CHI '81: Proceedings of the Joint Conference on Easier and More Productive Use of Computer Systems. (Part - I): Information Processing in the Social Sciences and Humanities - Volume 1981The 1940 and 1950 Public Use Sample Project is the creation of 1/100 household samples from the 1940 and 1950 Censuses of Population. The data source for the samples is the microfilmed original Population Schedules which contain the census enumerator's ...
Contrasting views of public engagement on local government data use in the UK
ICEGOV '19: Proceedings of the 12th International Conference on Theory and Practice of Electronic GovernanceGovernment at all levels in the UK and around the world face increasing challenges in regulating and governing their own use of data and data technologies, including at the local level. At the same time publics are increasingly aware of and critical ...
Stratification and sample size of data sources for agricultural mathematical programming models
A comparison is made between the variance of the estimator of the total of a variable obtained from both a simple and a stratified random sampling, in which the sample sizes of some strata are equal to the stratum population size. It is shown that in ...
Comments