skip to main content
10.1145/3316482.3326344acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Automating the generation of hardware component knowledge bases

Published: 23 June 2019 Publication History

Abstract

Hardware component databases are critical resources in designing embedded systems. Since generating these databases requires hundreds of thousands of hours of manual data entry, they are proprietary, limited in the data they provide, and have many random data entry errors.
We present a machine-learning based approach for automating the generation of component databases directly from datasheets. Extracting data directly from datasheets is challenging because: (1) the data is relational in nature and relies on non-local context, (2) the documents are filled with technical jargon, and (3) the datasheets are PDFs, a format that decouples visual locality from locality in the document. The proposed approach uses a rich data model and weak supervision to address these challenges.
We evaluate the approach on datasheets of three classes of hardware components and achieve an average quality of 75 F1 points which is comparable to existing human-curated knowledge bases. We perform two applications studies that demonstrate the extraction of multiple data modalities such as numerical properties and images. We show how different sources of supervision such as heuristics and human labels have distinct advantages which can be utilized together within a single methodology to automatically generate hardware component knowledge bases.

References

[1]
2015. Choosing the right transistor for a switching circuit. https://electronics.stackexchange.com/questions/29029/ choosing-the-right-transistor-for-a-switching-circuit
[2]
Fraser Anderson, Tovi Grossman, and George Fitzmaurice. 2017. Trigger-Action-Circuits: Leveraging Generative Design to Enable Novices to Design and Build Circuitry. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology . ACM, 331–342.
[3]
Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D Manning. 2015. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1. 344–354.
[4]
Michele Banko, Michael J Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In IJCAI, Vol. 7. 2670–2676.
[5]
Hui Chao and Jian Fan. 2004. Layout and content extraction for pdf documents. In International Workshop on Document Analysis Systems. Springer, 213–224.
[6]
Dave Doherty. 2019. About Digikey. https://www.digikey.com/en/ resources/about-digikey
[7]
Daniel Drew, Julie L Newcomb, William McGrath, Filip Maksimovic, David Mellis, and Björn Hartmann. 2016. The toastboard: Ubiquitous instrumentation and automated checking of breadboarded circuits. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology . ACM, 677–686.
[8]
Oren Etzioni, Anthony Fader, Janara Christensen, Stephen Soderland, et al. 2011. Open information extraction: The second generation. In Twenty-Second International Joint Conference on Artificial Intelligence .
[9]
Benoît Frénay and Michel Verleysen. 2014. Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25, 5 (2014), 845–869.
[10]
William Huang, Ye-Sheng Kuo, Pat Pannuto, and Prabal Dutta. 2014. Opo: a wearable sensor for capturing high-fidelity face-to-face interactions. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems . ACM, 61–75.
[11]
Antonio Iannopollo, Stavros Tripakis, and Alberto SangiovanniVincentelli. 2019. Constrained synthesis from component libraries. Science of Computer Programming 171 (2019), 21–41.
[12]
Manas Joglekar, Hector Garcia-Molina, and Aditya Parameswaran. 2015. Comprehensive and reliable crowd assessment algorithms. In 2015 IEEE 31st International Conference on Data Engineering . IEEE, 195–206.
[13]
Ying Liu, Kun Bai, Prasenjit Mitra, and C Lee Giles. 2007. Tableseer: automatic table metadata extraction and searching in digital libraries. In Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries . ACM, 91–100.
[14]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. 2009. Distant supervision for relation extraction without labeled data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2 . Association for Computational Linguistics, 1003–1011.
[15]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI) . 561–577.
[16]
Ermelinda Oro and Massimo Ruffolo. 2009. Trex: An approach for recognizing and extracting tables from pdf documents. In 2009 10th International Conference on Document Analysis and Recognition . IEEE, 906–910.
[17]
Shanan E Peters, Ce Zhang, Miron Livny, and Christopher Ré. 2014. A machine reading system for assembling synthetic paleontological databases. PLoS one 9, 12 (2014), e113523.
[18]
Raf Ramakers, Fraser Anderson, Tovi Grossman, and George Fitzmaurice. 2016. Retrofab: A design tool for retrofitting physical interfaces using actuators, sensors and 3d printing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . ACM, 409–419.
[19]
Raf Ramakers, Kashyap Todi, and Kris Luyten. 2015. PaperPulse: an integrated approach for embedding electronics in paper designs. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems . ACM, 2457–2466.
[20]
Rohit Ramesh, Richard Lin, Antonio Iannopollo, Alberto SangiovanniVincentelli, Björn Hartmann, and Prabal Dutta. 2017. Turning coders into makers: the promise of embedded design generation. In Proceedings of the 1st Annual ACM Symposium on Computational Fabrication . ACM, 4.
[21]
Alexander Ratner, Stephen H Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid training data creation with weak supervision. Proceedings of the VLDB Endowment 11, 3 (2017), 269–282.
[22]
Alexander J Ratner, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christopher Ré. 2016. Data Programming: Creating Large Training Sets, Quickly. In Advances in Neural Information Processing Systems. 3567–3575.
[23]
Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Christopher Ré. 2018. Fonduer: Knowledge Base Construction from Richly Formatted Data. In Proceedings of the 2018 International Conference on Management of Data . ACM, 1301–1316.
[24]
Ce Zhang, Vidhya Govindaraju, Jackson Borchardt, Tim Foltz, Christopher Ré, and Shanan Peters. 2013. GeoDeepDive: statistical inference using familiar data-processing languages. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data . ACM, 993–996.
[25]
Yuchen Zhang, Xi Chen, Dengyong Zhou, and Michael I Jordan. 2014. Spectral methods meet EM: A provably optimal algorithm for crowdsourcing. In Advances in neural information processing systems. 1260– 1268.

Cited By

View all
  • (2024)Lightweight Automated Reasoning for Network ArchitecturesProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696865(237-245)Online publication date: 18-Nov-2024
  • (2022)Edge-Centric Programming for IoT Applications With Automatic Code PartitioningIEEE Transactions on Computers10.1109/TC.2021.312936771:10(2408-2422)Online publication date: 1-Oct-2022
  • (2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020

Index Terms

  1. Automating the generation of hardware component knowledge bases

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      LCTES 2019: Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems
      June 2019
      218 pages
      ISBN:9781450367240
      DOI:10.1145/3316482
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication Notes

      Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

      Publication History

      Published: 23 June 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Badges

      Author Tags

      1. Design Tools
      2. Knowledge Base Construction

      Qualifiers

      • Research-article

      Conference

      LCTES '19

      Acceptance Rates

      Overall Acceptance Rate 116 of 438 submissions, 26%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 23 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Lightweight Automated Reasoning for Network ArchitecturesProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696865(237-245)Online publication date: 18-Nov-2024
      • (2022)Edge-Centric Programming for IoT Applications With Automatic Code PartitioningIEEE Transactions on Computers10.1109/TC.2021.312936771:10(2408-2422)Online publication date: 1-Oct-2022
      • (2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media