research-article

Addressing Non-Representative Surveys using Multiple Instance Learning

Authors:
Yaniv Katz

Similarweb, Tel Aviv, Israel

Similarweb, Tel Aviv, Israel
View Profile

,
Oded Vainas

Similarweb, Tel Aviv, Israel

Similarweb, Tel Aviv, Israel
View Profile

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data MiningAugust 2021Pages 3117–3127https://doi.org/10.1145/3447548.3467109

Published:14 August 2021Publication History

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

Pages 3117–3127

ABSTRACT

In recent years, non representative survey sampling and non response bias constitute major obstacles in obtaining a reliable population quantity estimate from finite survey samples. As such, researchers have been focusing on identifying methods to resolve these biases. In this paper, we look at this well known problem from a fresh perspective, and formulate it as a learning problem. To meet this challenge, we suggest solving the learning problem using a multiple instance learning (MIL) paradigm. We devise two different MIL based neural network topologies, each based on a different implementation of an attention pooling layer. These models are trained to accurately infer the population quantity of interest even when facing a biased sample. To the best of our knowledge, this is the first time MIL has ever been suggested as a solution to this problem. In contrast to commonly used statistical methods, this approach can be accomplished without having to collect sensitive personal data of the respondents and without having to access population level statistics of the same sensitive data. To validate the effectiveness of our approaches, we test them on a real-world movie rating dataset which is used to mimic a biased survey by experimentally contaminating it with different kinds of survey bias. We show that our suggested topologies outperform other MIL architectures, and are able to partly counter the adverse effect of biased sampling on the estimation quality. We also demonstrate how these methods can be easily adapted to perform well even when part of the survey is based on a small number of respondents.

Supplemental Material

addressing_nonrepresentative_surveys_using_multiple-yaniv_katz-oded_vainas-38958078-vw9T.mp4

mp4

134.3 MB

Download

References

Ji-Yeon Kim. 2004. Nonparametric regression estimation under complex sampling designs. PhD thesis. doi: 10.31274/rtd-180813--13205.Google Scholar
Scott Keeter, Carolyn Miller, Andrew Kohut, Robert M. Groves, and Stanley Presser. 2000. Consequences of reducing nonresponse in a national telephone survey. Public Opinion Quarterly, 64, 2, (August 2000), 125--148. doi: 10.1086/317759.Google Scholar
Shannon Greenwood, Andrew Perrin, and Maeve Duggan. 2016. Social media update. Pew Research Center.Google Scholar
Mick P. Couper, Christopher Antoun, and Aigul Mavletova. 2017. Mobile web surveys. In Total Survey Error in Practice. John Wiley & Sons, Inc., (February2017), 133--154. doi: 10.1002/9781119041702.ch7.Google ScholarCross Ref
Michael P. Battaglia, David C. Hoaglin, and Martin R. Frankel. 2009. Practical considerations in raking survey data. Survey Practice, 2, 5, (June 2009), 1--10. doi: 10.29115/sp-2009-0019.Google Scholar
Roderick J. A. Little. 1993. Post-stratification: a modeler's perspective. Journal of the American Statistical Association, 88, 423, (September 1993), 1001--1012.doi: 10.1080/01621459.1993.10476368.Google ScholarCross Ref
Bobby Duffy, Kate Smith, George Terhanian, and John Bremer. 2005. Comparing data from online and face-to-face surveys. International Journal of Market Research, 47, 6, (November 2005), 615--639.doi: 10.1177/147078530504700602.Google ScholarCross Ref
Douglas. Rivers. 2007. Sampling for web surveys. Joint Statistical Meetings.Google Scholar
Cavan Reilly, Andrew Gelman, and Jonathan Katz. 2001. Post stratification without population level information on the post stratifying variable with application to political polling. Journal of the American Statistical Association, 96, 453, (March 2001), 1--11. doi: 10.1198/016214501750332640.Google ScholarCross Ref
Pike R. Gary. 2007. Adjusting for nonresponse in surveys. In Higher Education: Handbook of Theory and Research. Springer Netherlands, 411--449.doi: 10.1007/978--1--4020--5666--6_8.Google Scholar
Felix Bader, Johannes Bauer, Martina Kroher, and Partick Riordan. 2016. Privacy concerns in responses to sensitive questions. a survey experiment on the influence of numeric codes on unit nonresponse, item nonresponse, and misreporting. en.methods, data, (July 2016), No 1 (2016). doi: 10.12758/MDA.2016.003.Google Scholar
Vili Lehdonvirta, Atte Oksanen, Pekka Rasanen, and Grant Blank. 2020. Social media, web, and panel surveys: using non- probability samples in social and policy research, (April 2020). doi: 10.31219/osf.io/qrwg4.Google Scholar
Andrew Gelman and Thomas C. Little. 1997. Post stratification into many categories using hierarchical logistic regression.Google Scholar
Max Goplerud, Shiro Kuriwaki, Marc Ratkovic, and Dustin Tingley. 2018. Sparse multilevel regression (and post stratification (smrp)).Google Scholar
Trent D. Buskirk, Antje Kirchner, Adam Eck, and Curtis S. Signorino. 2018. An introduction to machine learning methods for survey researchers. Survey Practice, 11, 1, (January 2018), 1--10. doi: 10.29115/sp-2018-0004.Google Scholar
Christoph Kern, Thomas Klausch, and Frauke Kreuter. 2019. Tree-based ma-chine learning methods for survey research. en.Survey Research Methods, Vol 13, (April 2019), No 1 (2019). doi: 10.18148/SRM/2019.V1I1.7395.Google Scholar
Thomas G. Dietterich, Richard H. Lathrop, and Tomás Lozano-Pérez. 1997.Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89, 1--2, 31--71. doi: 10.1016/s0004--3702(96)00034--3.Google ScholarDigital Library
Francisco Herrera, Sebastián Ventura, Rafael Bello, Chris Cornelis, Amelia Zafra, Dánel Sánchez-Tarrago, and Sarah Vluymans. 2016. Multi-instance regression. In Multiple Instance Learning. Springer International Publishing, 127--140. doi: 10.1007/978--3--319--47759--6_6.Google Scholar
Jaume Amores. 2013. Multiple instance classification: review, taxonomy and comparative study. 201, (August 2013), 81--105.issn: 0004--3702. doi: 10.1016/j.artint.2013.06.003.Google ScholarDigital Library
Gang Chen and Jason Corso. 2012. Greedy multiple instance learning via codebook learning and nearest neighbor voting. arXiv preprint arXiv, (May 2012). arXiv: 1205.0610.Google Scholar
Minlong Peng and Qi Zhang. 2019. Address instance-level label prediction in multiple instance learning. arXiv preprint arXiv, (May 2019). arXiv: 1905.12226.Google Scholar
Guoqing Liu, Jianxin Wu, and Zhi-Hua Zhou. 2012. Key instance detection in multi-instance learning. 4th Asian Conference on Machine Learning, 253--68.Google Scholar
Marc-André Carbonneau, Veronika Cheplygina, Eric Granger, and Ghyslain Gagnon. 2018. Multiple instance learning: a survey of problem characteristics and applications. Pattern Recognition, 77, (May 2018), 329--353. doi: 10.1016/j.patcog.2017.10.009.Google ScholarDigital Library
Weixin Li and Nuno Vasconcelos. 2015. Multiple instance learning for soft bags via top instances. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, (June 2015). doi: 10.1109/cvpr.2015.7299056.Google ScholarCross Ref
Nikolaos Pappas and Andrei Popescu-Belis. 2017. Explicit document modeling through weighted multiple-instance learning. Journal of Artificial Intelligence Research, 58, (March 2017), 591--626. doi: 10.1613/jair.5240.Google ScholarCross Ref
P.J. Sudharshan, Caroline Petitjean, Fabio Spanhol, Luiz Eduardo Oliveira, Laurent Heutte, and Paul Honeine. 2019. Multiple instance learning for histopathological breast cancer image classification. Expert Systems with Applications, 117, (March 2019), 103--111. doi: 10.1016/j.eswa.2018.09.049.Google Scholar
Stefan Duffner and Christophe Garcia. 2020. Multiple instance learning for training neural networks under label noise. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, (July 2020). doi: 10.1109/ijcnn48605.2020.9206669.Google ScholarCross Ref
Yanshan Xiao, Fei Liang, and Bo Liu. 2020. A transfer learning-based multi-instance learning method with weak labels. IEEE Transactions on Cybernetics, (March 2020), 1--14.doi: 10.1109/tcyb.2020.2973450.Google ScholarCross Ref
V. N. Vapnik. 1998. Statistical learning theory. Wiley.Google ScholarCross Ref
Bianca Zadrozny. 2004. Learning and evaluating classifiers under sample selection bias. In Twenty-first international conference on Machine learning - ICML'04. ACM Press, (July 2004). doi: 10.1145/1015330.1015425.Google ScholarDigital Library
Hidetoshi Shimodaira. 2000. Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference, 90, 2, (October 2000), 227--244. doi: 10.1016/s0378--3758(00)00115--4.Google Scholar
Jose Blanchet and Henry Lam. 2012. State-dependent importance sampling for rare-event simulation: an overview and recent advances.Surveys in Operations Research and Management Science, 17, 1, (January 2012), 38--59.doi: 10.1016/j.sorms.2011.09.002.Google Scholar
Soumya Ray and David Page. 2001. Multiple instance regression. Proc. Int. Conf. Mach. Learn., 425--432.Google Scholar
Daniel Dooly, Qi Zhang, Sally Goldman, and Robert Amar. 2001. Multiple instance learning of real valued data. In ICML. (June 2001).Google Scholar
Zhuang Wang, Vladan Radosavljevic, Bo Han, Zoran Obradovic, and Slobodan Vucetic. 2008. Aerosol optical depth prediction from satellite observations by multiple instance regression. In Proceedings of the 2008 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics,(April 2008). doi: 10.1137/1.9781611972788.15.Google ScholarCross Ref
Yun Wang, Juncheng Li, and Florian Metze. 2019. A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling. InICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, (May 2019). doi: 10.1109/icassp.2019.8682847.Google Scholar
Tomas Pevny and Petr Somol. 2016. Using neural network formalism to solve multiple-instance problems, (September 23, 2016). arXiv: 1609.07257[cs.LG].Google Scholar
Yong Xu, Qiuqiang Kong, Wenwu Wang, and Mark D. Plumbley. 2017. Large-scale weakly supervised audio classification using gated convolutional neural network, (October 1, 2017). arXiv: 1710.00343[cs.SD].Google Scholar
Changsong Yu, Karim Said Barsim, Qiuqiang Kong, and Bin Yang. 2018. Multi-level attention model for weakly supervised audio classification, (March 6, 2018). arXiv: 1803.02353[eess.AS].Google Scholar
Xinggang Wang, Yongluan Yan, Peng Tang, Xiang Bai, and Wenyu Liu. 2018. Revisiting multiple instance neural networks. Pattern Recognition, 74, (February 2018), 15--24.doi: 10.1016/j.patcog.2017.08.026.Google ScholarDigital Library
Vijay Manikandan Janakiraman. 2018. Explaining aviation safety incident susing deep temporal multiple instance learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM,(July 2018). doi: 10.1145/3219819.3219871.Google ScholarDigital Library
Yue Ning, Sathappan Muthiah, Huzefa Rangwala, and Naren Ramakrishnan. 2016. Modeling precursors for event forecasting via nested multi-instance learning. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, (August 2016). doi: 10.1145/2939672.2939802.Google ScholarDigital Library
Maximilian Ilse, Jakub M. Tomczak, and Max Welling. 2018. Attention-based deep multiple instance learning. Proceedings of the 35 th International Conference on Machine Learning, (February 13, 2018). arXiv: 1802.04712[cs.LG].Google Scholar
F. Maxwell Harper and Joseph A. Konstan. 2015. The Movie Lens datasets. ACM Transactions on Interactive Intelligent Systems, 5, 4, (December 2015), 1--19. doi:10.1145/2827872.Google ScholarDigital Library
Veronika Cheplygina, David M.J. Tax, and Marco Loog. 2015. Multiple instance learning with bag dissimilarities. Pattern Recognition, 48, 1, (January 2015),264--275.doi: 10.1016/j.patcog.2014.07.022.Google ScholarDigital Library
Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of neural networks using drop connect. In Proceedings of the 30th International Conference on Machine Learning(Proceedings of Machine Learning Research) number 3. Sanjoy Dasgupta and David McAllester, editors. Volume 28. PMLR, Atlanta, Georgia, USA, (June 2013), 1058--1066. http://proceedings.mlr.press/v28/wan13.html.Google Scholar
Zhuang Wang, Liang Lan, and Slobodan Vucetic. 2012. Mixture model for multiple instance regression and applications in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 50, 6, (June 2012), 2226--2237. doi:10.1109/tgrs.2011.2171691.Google ScholarCross Ref
Kiri L. Wagstaff, Terran Lane, and Alex Roper. 2008. Multiple-instance regression with structured data. In 2008 IEEE International Conference on Data Mining Workshops. IEEE, (December 2008). doi: 10.1109/icdmw.2008.31.Google ScholarDigital Library
Douglas A Reynolds. 2009. Gaussian mixture models. Encyclopedia of biometrics 741.Google Scholar
Guolin Ke, Qi Meng, Thomas Finely, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: a highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30.Google Scholar
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision. Springer. (September 2014).Google Scholar

Index Terms

Addressing Non-Representative Surveys using Multiple Instance Learning
1. Computing methodologies
  1. Machine learning

Recommendations

Revisiting multiple instance neural networks

We revisit the problem of solving MIL using neural networks (MINNs), which are ignored in current MIL research community. Our experiments show that MINNs are very effective and efficient.We proposed a novel MI-Net which is centered on learning bag ...
Read More
Grey-based multiple instance learning with multiple bag-representative

Multiple instance learning is a modification in supervised learning that handles the classification of collection instances, which called bags. Each bag contains a number of instances whose features are extracted. In multiple instance learning, the ...
Read More
Multiple instance learning with bag dissimilarities

Multiple instance learning (MIL) is concerned with learning from sets (bags) of objects (instances), where the individual instance labels are ambiguous. In this setting, supervised learning cannot be applied directly. Often, specialized MIL methods ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
August 2021
4259 pages
ISBN:9781450383325
DOI:10.1145/3447548
General Chairs:
Feida Zhu
Singapore Management University
,
Beng Chin Ooi
National University of Singapore
,
Chunyan Miao
Nanyang Technology University
,
Program Chairs:
Haixun Wang,
Iryna Skrypnyk,
Wynne Hsu,
Sanjay Chawla
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 August 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
attention pooling
bag representation
instance representation
multiple instance learning
multiple instance regression
neural networks
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 174
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Addressing Non-Representative Surveys using Multiple Instance Learning

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Revisiting multiple instance neural networks

Grey-based multiple instance learning with multiple bag-representative

Multiple instance learning with bag dissimilarities

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Addressing Non-Representative Surveys using Multiple Instance Learning

KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Revisiting multiple instance neural networks

Grey-based multiple instance learning with multiple bag-representative

Multiple instance learning with bag dissimilarities

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media