research-article

CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs

Authors:
Gaurav Oberoi

Mastercard, IN

Mastercard, IN
View Profile

,
Pranav Poduval

Mastercard, IN

Mastercard, IN
View Profile

,
Karamjit Singh

Mastercard, IN

Mastercard, IN
View Profile

,
Sangam Verma

Mastercard, IN

Mastercard, IN
View Profile

,
Pranay Gupta

MasterCard, IN

MasterCard, IN
View Profile

ICAIF '22: Proceedings of the Third ACM International Conference on AI in FinanceNovember 2022Pages 420–427https://doi.org/10.1145/3533271.3561788

Published:26 October 2022Publication History

ICAIF '22: Proceedings of the Third ACM International Conference on AI in Finance

Pages 420–427

ABSTRACT

Similarity-search is an important problem to solve for the payment industry having user-merchant interaction data. It finds out merchants similar to a given merchant and solves various tasks like peer-set generation, recommendation, community detection, and anomaly detection. Recent works have shown that by leveraging interaction data, Graph Neural Networks (GNNs) can be used to generate node embeddings for entities like a merchant, which can be further used for such similarity-search tasks. However, most of the real-world financial data come with high cardinality categorical features such as city, industry, super-industries, etc. which are fed to the GNNs in a one-hot encoded manner. Current GNN algorithms are not designed to work for such sparse features which makes it difficult for them to learn these sparse features preserving embeddings. In this work, we propose CaPE, a Category Preserving Embedding generation method which preserves the high cardinality feature information in the embeddings. We have designed CaPE to preserve other important numerical feature information as well. We compare CaPE with the latest GNN algorithms for embedding generation methods to showcase its superiority in peer set generation tasks on real-world datasets, both external as well as internal (synthetically generated). We also compared our method for a downstream task like link prediction.

References

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.Google ScholarCross Ref
Belur V Dasarathy. 1991. Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Tutorial(1991).Google Scholar
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. Advances in neural information processing systems 29 (2016), 3844–3852.Google Scholar
Kaize Ding, Yichuan Li, Jundong Li, Chenghao Liu, and Huan Liu. 2019. Feature interaction-aware graph neural networks. arXiv preprint arXiv:1908.07110(2019).Google Scholar
Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. 2017. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 135–144.Google ScholarDigital Library
David Duvenaud, Dougal Maclaurin, Jorge Aguilera-Iparraguirre, Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P Adams. 2015. Convolutional networks on graphs for learning molecular fingerprints. arXiv preprint arXiv:1509.09292(2015).Google Scholar
Ming Gao, Leihui Chen, Xiangnan He, and Aoying Zhou. 2018. Bine: Bipartite network embedding. In The 41st international ACM SIGIR conference on research & development in information retrieval. 715–724.Google ScholarDigital Library
Mihajlo Grbovic and Haibin Cheng. 2018. Real-time personalization using embeddings for search ranking at airbnb. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 311–320.Google ScholarDigital Library
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864.Google ScholarDigital Library
William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 1025–1035.Google Scholar
Chaoyang He, Tian Xie, Yu Rong, Wenbing Huang, Junzhou Huang, Xiang Ren, and Cyrus Shahabi. 2019. Cascade-BGNN: Toward Efficient Self-supervised Representation Learning on Large-scale Bipartite Graphs. arXiv preprint arXiv:1906.11994(2019).Google Scholar
Vassilis N Ioannidis, Da Zheng, and George Karypis. 2020. PanRep: Universal node embeddings for heterogeneous graphs. (2020).Google Scholar
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with gpus. IEEE Transactions on Big Data(2019).Google ScholarCross Ref
Anish Khazane, Jonathan Rider, Max Serpe, Antonia Gogoglou, Keegan Hines, C Bayan Bruss, and Richard Serpe. 2019. Deeptrax: Embedding graphs of financial transactions. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). IEEE, 126–133.Google ScholarCross Ref
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).Google Scholar
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907(2016).Google Scholar
Vineet Kosaraju, Amir Sadeghian, Roberto Martín-Martín, Ian Reid, S Hamid Rezatofighi, and Silvio Savarese. 2019. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. arXiv preprint arXiv:1907.03395(2019).Google Scholar
Adam Lerer, Ledell Wu, Jiajun Shen, Timothee Lacroix, Luca Wehrstedt, Abhijit Bose, and Alex Peysakhovich. 2019. Pytorch-biggraph: A large-scale graph embedding system. arXiv preprint arXiv:1903.12287(2019).Google Scholar
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM computing surveys (CSUR) 50, 6 (2017), 1–45.Google Scholar
Cheng-Yuan Liou, Wei-Chen Cheng, Jiun-Wei Liou, and Daw-Ran Liou. 2014. Autoencoder for words. Neurocomputing 139(2014), 84–96.Google ScholarDigital Library
Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE transactions on information theory 28, 2 (1982), 129–137.Google ScholarDigital Library
Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. 2016. Learning convolutional neural networks for graphs. In International conference on machine learning. PMLR, 2014–2023.Google Scholar
Karl Pearson. 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science 2, 11 (1901), 559–572.Google Scholar
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.Google ScholarDigital Library
Weiping Song, Zhiping Xiao, Yifan Wang, Laurent Charlin, Ming Zhang, and Jian Tang. 2019. Session-based social recommendation via dynamic graph attention networks. In Proceedings of the Twelfth ACM international conference on web search and data mining. 555–563.Google ScholarDigital Library
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web. 1067–1077.Google ScholarDigital Library
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903(2017).Google Scholar
Jizhe Wang, Pipei Huang, Huan Zhao, Zhibo Zhang, Binqiang Zhao, and Dik Lun Lee. 2018. Billion-scale commodity embedding for e-commerce recommendation in alibaba. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 839–848.Google ScholarDigital Library
Rex Ying, Ruining He, Kaifeng Chen, Pong Eksombatchai, William L Hamilton, and Jure Leskovec. 2018. Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 974–983.Google ScholarDigital Library

Index Terms

CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs
1. Computing methodologies
  1. Artificial intelligence
  2. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings

Recommendations

Probabilistic embeddings of bounded genus graphs into planar graphs
SCG '07: Proceedings of the twenty-third annual symposium on Computational geometry

A probabilistic C-embedding of a (guest) metric M into a collection of(host) metrics M'₁, ..., M'_k is a randomized mapping F of M intoone of the M'₁, ..., M'_k such that, for any two points p,q in theguest metric: The distance between F(p) and F(q) in ...
Read More
Nearest-neighbor-preserving embeddings

In this article we introduce the notion of nearest-neighbor-preserving embeddings. These are randomized embeddings between two metric spaces which preserve the (approximate) nearest-neighbors. We give two examples of such embeddings for Euclidean ...
Read More
Learning Backward Compatible Embeddings
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Embeddings, low-dimensional vector representation of objects, are fundamental in building modern machine learning systems. In industrial settings, there is usually an embedding team that trains an embedding model to solve intended tasks (e.g., product ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICAIF '22: Proceedings of the Third ACM International Conference on AI in Finance
November 2022
527 pages
ISBN:9781450393768
DOI:10.1145/3533271
Editors:
Daniele Magazzeni
J.P. Morgan AI Research
,
Senthil Kumar
Capital One
,
Rahul Savani
University of Liverpool
,
Renyuan Xu
University of Southern California
,
Carmine Ventre
King's College London
,
Blanka Horvath
University of Oxford
,
Ruimeng Hu
University of California Santa Barbara
,
Tucker Balch
J.P. Morgan AI Research
,
Francesca Toni
Imperial College London
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Embeddings
Financial Graphs
Graph Neural Networks
Similarity Search
Sparse Features
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 101
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs

ICAIF '22: Proceedings of the Third ACM International Conference on AI in Finance

ABSTRACT

References

Cited By

Index Terms

Recommendations

Probabilistic embeddings of bounded genus graphs into planar graphs

Nearest-neighbor-preserving embeddings

Learning Backward Compatible Embeddings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

CaPE: Category Preserving Embeddings for Similarity-Search in Financial Graphs

ICAIF '22: Proceedings of the Third ACM International Conference on AI in Finance

ABSTRACT

References

Cited By

Index Terms

Recommendations

Probabilistic embeddings of bounded genus graphs into planar graphs

Nearest-neighbor-preserving embeddings

Learning Backward Compatible Embeddings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media