Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Authors: Claudio Barros and Perrine Moreau

Affiliation: Data Science Direction, Médiamétrie, 70 rue Rivay, Levallois-Perret, France

Keyword(s): Text Mining, URLs, User Profiling, Feature Engineering, Topic Extraction, Semantics.

Abstract: Text data is undoubtedly one of the most rich and peculiar source of information there is. It can come in many forms and require specific treatment based on their nature in order to create meaningful features that can be subsequently used in predictive modelling. URLs in particular are quite specific and require adaptations in terms of processing compared to usual corpora of texts. In this paper, we review different ways we have used URLs to create meaningful features, both by exploiting the URL itself and by scrapping its page content. We additionally attempt to measure the impact of the addition of different groups of features created in a predictive modelling use case.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.223.255.147

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Barros, C. and Moreau, P. (2022). User Profiling: On the Road from URLs to Semantic Features. In Proceedings of the 11th International Conference on Data Science, Technology and Applications - DATA; ISBN 978-989-758-583-8; ISSN 2184-285X, SciTePress, pages 227-235. DOI: 10.5220/0011139900003269

@conference{data22,
author={Claudio Barros and Perrine Moreau},
title={User Profiling: On the Road from URLs to Semantic Features},
booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - DATA},
year={2022},
pages={227-235},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0011139900003269},
isbn={978-989-758-583-8},
issn={2184-285X},
}

TY - CONF

JO - Proceedings of the 11th International Conference on Data Science, Technology and Applications - DATA
TI - User Profiling: On the Road from URLs to Semantic Features
SN - 978-989-758-583-8
IS - 2184-285X
AU - Barros, C.
AU - Moreau, P.
PY - 2022
SP - 227
EP - 235
DO - 10.5220/0011139900003269
PB - SciTePress