research-article

Contextual rule-based feature engineering for author-paper identification

Authors:
Erheng Zhong

Hong Kong University of Science and Technology, Hong Kong

Hong Kong University of Science and Technology, Hong Kong
View Profile

,
Lianghao Li

Hong Kong University of Science and Technology, Hong Kong

Hong Kong University of Science and Technology, Hong Kong
View Profile

,
Naiyan Wang

Hong Kong University of Science and Technology, Hong Kong

Hong Kong University of Science and Technology, Hong Kong
View Profile

,
Ben Tan

Hong Kong University of Science and Technology, Hong Kong

Hong Kong University of Science and Technology, Hong Kong
View Profile

,
Yin Zhu

Hong Kong University of Science and Technology, Hong Kong

Hong Kong University of Science and Technology, Hong Kong
View Profile

,
Lili Zhao

Hong Kong University of Science and Technology, Hong Kong

Hong Kong University of Science and Technology, Hong Kong
View Profile

,
Qiang Yang

Hong Kong University of Science and Technology, Hong Kong and Huawei Noah's Ark Lab, Hong Kong

Hong Kong University of Science and Technology, Hong Kong and Huawei Noah's Ark Lab, Hong Kong
View Profile

KDD Cup '13: Proceedings of the 2013 KDD Cup 2013 WorkshopAugust 2013Article No.: 6Pages 1–6https://doi.org/10.1145/2517288.2517293

Published:11 August 2013Publication History

KDD Cup '13: Proceedings of the 2013 KDD Cup 2013 Workshop

Pages 1–6

ABSTRACT

We present the ideas and methodologies that we used to address the KDD Cup 2013 challenge on author-paper identification. We firstly formulate the problem as a personalized ranking task and then propose to solve the task through a supervised learning framework. The key point is to eliminate those incorrectly assigned papers of a given author based on existing records. We choose Gradient Boosted Tree as our main classifier. Through our exploration we conclude that the most critical factor to achieve our results is the effective feature engineering. In this paper, we formulate this process as a unified framework that constructs features based on contextual information and combines machine learning techniques with human intelligence. Besides this, we suggest several strategies to parse authors' names, which improve the prediction results significantly. Divide-conquer based model building as well as the model averaging techniques also benefit the prediction precision.

References

M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5):16--23, Sept. 2003. Google ScholarDigital Library
C. J. C. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical report, Microsoft Research, 2010.Google Scholar
O. Chapelle and Y. Chang. Yahoo! learning to rank challenge overview. Journal of Machine Learning Research - Proceedings Track, 14:1--24, 2011.Google Scholar
J. H. Friedman. Stochastic gradient boosting. Comput. Stat. Data Anal., 38(4):367--378, Feb. 2002. Google ScholarDigital Library
J. K. Laurila, D. Gatica-Perez, I. Aad, J. Blom, O. Bornet, T. Do, O. Dousse, J. Eberle, and M. Miettinen. The mobile data challenge: Big data for mobile computing research. In Mobile Data Challenge by Nokia Workshop, Newcastle, UK, 2012.Google Scholar
T.-Y. Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225--331, Mar. 2009. Google ScholarDigital Library
B.-W. On, D. Lee, J. Kang, and P. Mitra. Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, JCDL '05, pages 344--353, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
W. Pan, E. Zhong, and Q. Yang. Transfer learning for text mining. In C. C. Aggarwal and C. Zhai, editors, Mining Text Data, pages 223--257. Springer, 2012.Google ScholarCross Ref
S. B. Roy, M. D. Cock, V. Mandava, B. Dalessandro, C. Perlich, W. Cukierski, and B. Hamner. The microsoft academic search dataset and kdd cup 2013. In KDD Cup 2013 workshop, 2013. Google ScholarDigital Library
J. Tang, A. C. Fong, B. Wang, and J. Zhang. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 2012. Google ScholarDigital Library
J. Xie, V. Rojkova, S. Pal, and S. Coggeshall. A combination of boosting and bagging for kdd cup 2009 - fast scoring on a large database. Journal of Machine Learning Research - Proceedings Track, 7:35--43, 2009.Google Scholar
J. Ye, J.-H. Chow, J. Chen, and Z. Zheng. Stochastic gradient boosted distributed decision trees. In Proceedings of the 18th ACM conference on Information and knowledge management, CIKM '09, pages 2061--2064, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
H. Zhang, E. Riedl, V. A. Petrushin, S. Pal, and J. Spoelstra. Committee based prediction system for recommendation: Kdd cup 2011, track2. Journal of Machine Learning Research - Proceedings Track, 18:215--229, 2012.Google Scholar
E. Zhong, B. Tan, K. Mo, and Q. Yang. User demographics prediction based on mobile data. Pervasive and Mobile Computing, (0):--, 2013.Google Scholar

Recommendations

Rule-Based Forecasting: Development and Validation of an Expert Systems Approach to Combining Time Series Extrapolations

This paper examines the feasibility of rule-based forecasting, a procedure that applies forecasting expertise and domain knowledge to produce forecasts according to features of the data. We developed a rule base to make annual extrapolation forecasts ...
Read More
Feature fusion of side face and gait for video-based human identification

Video-based human recognition at a distance remains a challenging problem for the fusion of multimodal biometrics. As compared to the approach based on match score level fusion, in this paper, we present a new approach that utilizes and integrates ...
Read More
Identification based on feature fusion of multimodal biometrics and deep learning

This paper proposes a novel methodology for individuals identification based on convolutional neural network (CNN) and machine learning (ML) algorithms. The technique is based on fusioning biometric modalities at the feature level. For this purpose, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

KDD Cup '13: Proceedings of the 2013 KDD Cup 2013 Workshop
August 2013
69 pages
ISBN:9781450324953
DOI:10.1145/2517288

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 August 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 126
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Contextual rule-based feature engineering for author-paper identification

KDD Cup '13: Proceedings of the 2013 KDD Cup 2013 Workshop

ABSTRACT

References

Cited By

Recommendations

Rule-Based Forecasting: Development and Validation of an Expert Systems Approach to Combining Time Series Extrapolations

Feature fusion of side face and gait for video-based human identification

Identification based on feature fusion of multimodal biometrics and deep learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Contextual rule-based feature engineering for author-paper identification

KDD Cup '13: Proceedings of the 2013 KDD Cup 2013 Workshop

ABSTRACT

References

Cited By

Recommendations

Rule-Based Forecasting: Development and Validation of an Expert Systems Approach to Combining Time Series Extrapolations

Feature fusion of side face and gait for video-based human identification

Identification based on feature fusion of multimodal biometrics and deep learning

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media