Abstract:
Data Science and Data Engineering are a few of the high-demand skills that are still prevalent in current market economics. There has been an increasing demand for softwa...Show MoreMetadata
Abstract:
Data Science and Data Engineering are a few of the high-demand skills that are still prevalent in current market economics. There has been an increasing demand for software engineers to combine skills from both these areas to drive better and more accurate Machine Learning algorithms over an automated, distributed data-platform. This paper presents one such experience in combining these skills to build a data infrastructure that forms the foundation for developing and deploying machine learning algorithms. As part of this work, we present how to develop an automated data pipeline using multiple cloud services to drive a recommendation system for a social publishing platform (https://medium.com/) dataset, allowing for enhanced user experience through personalized content suggestions. Utilizing machine learning (ML) and natural language processing (NLP), the system analyzes article content and user behavior to recommend similar articles, writers, and similar content based on user interests. Preliminary results show the system’s effectiveness in fresh content recommendation based on articles, authors, and user preferences. The automated data pipeline pulls in the latest information using APIs over the Medium website such that the dataset for the recommendation engine is always kept up-to date. Thus, the integration of cloud-based data processing with advanced analytical techniques improves the overall digital content interaction and discoverability.
Date of Conference: 20-22 December 2024
Date Added to IEEE Xplore: 18 February 2025
ISBN Information: