skip to main content
10.1145/3487553.3524640acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
short-paper

Detecting Regulation Violations for an Indian Regulatory Body through Multi Label Classification

Published: 16 August 2022 Publication History

Abstract

The Securities and Exchange Board of India (SEBI) is the regulatory body for securities and commodities in India. SEBI creates, and enforces regulations that must be followed by all listed companies. To the best of our knowledge, this is the first work on identifying the regulation(s) that a SEBI-related case violates, which could be of substantial value to companies, lawyers, and other stakeholders in the regulatory process. We create a dataset for this task by automatically extracting violations from publicly available case-files. Using this data, we explore various multi-label text classification methods to determine the potentially multiple regulations violated by (the facts of) a case. Our experiments demonstrate the importance of employing contextual text representations to understand complex financial and legal concepts. We also highlight the challenges that must be addressed to develop a fully functional system in the real-world.

References

[1]
Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: Pretrained Language Model for Scientific Text. In EMNLP. arXiv:arXiv:1903.10676
[2]
Rachana Buch. 2018. A Survey on Multi Label Classification.
[3]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Machine learning 20, 3 (1995), 273–297.
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv abs/1810.04805(2019).
[5]
Simon Haykin. 1994. Neural networks: a comprehensive foundation. Prentice Hall PTR.
[6]
Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278–282.
[7]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[8]
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2019. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics (Sep 2019). https://doi.org/10.1093/bioinformatics/btz682
[9]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. In Empirical Methods in Natural Language Processing (EMNLP). 1532–1543. http://www.aclweb.org/anthology/D14-1162
[10]
Claude Sammut and Geoffrey I. Webb (Eds.). 2010. TF–IDF. Springer US, Boston, MA, 986–987. https://doi.org/10.1007/978-0-387-30164-8_832
[11]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical Attention Networks for Document Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 1480–1489. https://doi.org/10.18653/v1/N16-1174

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Companion Proceedings of the Web Conference 2022
April 2022
1338 pages
ISBN:9781450391306
DOI:10.1145/3487553
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BERT
  2. information extraction
  3. multi label classification
  4. neural networks
  5. regulation violation detection

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

  • JPMorgan Chase and Company

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 76
    Total Downloads
  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media