skip to main content
10.1145/2939672.2939711acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Identifying Earmarks in Congressional Bills

Published: 13 August 2016 Publication History

Abstract

Earmarks are legislative provisions that direct federal funds to specific projects, circumventing the competitive grant-making process of federal agencies. Identifying and cataloging earmarks is a tedious, time-consuming process carried out by experts from public interest groups. In this paper, we present a machine learning system for automatically extracting earmarks from congressional bills and reports. We first describe a table-parsing algorithm for extracting budget allocations from appropriations tables in congressional bills. We then use machine learning classifiers to identify budget allocations as earmarked objects with an out of sample ROC AUC score of 0.89. Using this system, we construct the first publicly available database of earmarks dating back to 1995. Our machine learning approach adds transparency, accuracy, and speed to the congressional appropriations process.

References

[1]
Anonymous Republican Lobbyist. Interview. Washington, D.C., October 30 2014.
[2]
D. A. Austin and M. R. Levit. Mandatory spending since 1962. Congressional Research Service, March 23 2012.
[3]
A. Bonica. Mapping the ideological marketplace. American Journal of Political Science, 58(2):367--386, 2014.
[4]
S. Condon. Ron paul, don young, and joseph cao ignore gop earmark ban, risk reprimand. CBS News, April 2 2010.
[5]
T. Finnigan. All about pork: The abuse of earmarks and the needed reforms. Policy Briefing Series, 2007.
[6]
C. Hare and K. T. Poole. The polarization of contemporary American politics. Polity, 46(3):411--429, 2014.
[7]
G. King. Replication, replication. PS: Political Science & Politics, 28(03):444--452, 1995.
[8]
R. T. Meyers. Strategic Budgeting. Ann Arbor: University of Michigan, 1996.
[9]
B. Montopoli. House republicans adopt earmarks ban in new congress. CBS News, November 18 2010.
[10]
R. Nixon. Lawmakers finance pet projects without earmarks. New York Times, December 27 2010.
[11]
D. Pinto, A. McCallum, X. Wei, and W. B. Croft. Table extraction using conditional random fields. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 235--242. ACM, 2003.
[12]
R. Portman. Memorandum for the heads of departments and agencies. Office of Management and Budget, January 25 2007.
[13]
P. Pyreddy and W. B. Croft. Tintin: A system for retrieval in text tables. In Proceedings of the second ACM international conference on Digital libraries, pages 193--200. ACM, 1997.
[14]
B. Sinclair. Unorthodox Lawmaking: New Legislative Processes in the US Congress. CQ Press, 2011.
[15]
S. Streeter. Earmarks and limitations in appropriations bills. Congressional Research Service, December 7 2004.

Cited By

View all
  • (2022)A textual analysis of the US Securities and Exchange Commission's accounting and auditing enforcement releases relating to the Sarbanes–Oxley ActInternational Journal of Intelligent Systems in Accounting and Finance Management10.1002/isaf.150629:1(19-40)Online publication date: 25-Apr-2022
  • (2019)Reconstructing and analyzing the transnational human trafficking networkProceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3341161.3342879(493-500)Online publication date: 27-Aug-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. earmarks
  2. information extraction
  3. machine learning
  4. natural language processing

Qualifiers

  • Research-article

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A textual analysis of the US Securities and Exchange Commission's accounting and auditing enforcement releases relating to the Sarbanes–Oxley ActInternational Journal of Intelligent Systems in Accounting and Finance Management10.1002/isaf.150629:1(19-40)Online publication date: 25-Apr-2022
  • (2019)Reconstructing and analyzing the transnational human trafficking networkProceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3341161.3342879(493-500)Online publication date: 27-Aug-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media