ABSTRACT
Many data scientists are currently pointing out that the amount of Machine Learning (ML) research that will cross into practice will depend, not just on the ability of the specialized algorithms used to scrutinize positive/negative examples, but also on the quality of the data exploited for training those algorithms. Our experience, while training a neural network with a huge dataset comprised of over fifteen million water meter readings, confirms such conjecture. In this paper, we report on the actions we took to extrapolate from that database just those data that could correctly represent the complex statistical phenomenon in play. With an adequate re-organization of those data, we got an interesting, yet controversial, result. On the one hand, we improved the accuracy on the prediction when a water meter fails/needs disassembly based on a history of water consumption measurements, thus making smarter a meter maintenance process; on the other hand, all this came with the paradox of a (statistical) transformation of the initial dataset: while we alleviate a problem with a restructured and better interpretable data model, we simultaneously change the replicated form of those data.
- Pettersen, L. (2018) Why Artificial Intelligence will not outsmart complex knowledge work. Work, Employment and Society. Sage. To appear.Google Scholar
- Jordan, M. I., Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255--260.Google ScholarCross Ref
- Delnevo, G., Roccetti, M., Mirri, S. (2019). Intelligent and good machines? The role of domain and context codification, Mobile Networks and Applications, Elsevier. To appear.Google Scholar
- Witten, I. H., Frank, E., Hall, M. A., Pal, C. J. (2016). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann. Google ScholarDigital Library
- Alkowaileet, W., Alsubaiee, S., Carey, M., Li, C., Ramampiaro, H., Sinthong, P., Wang, X. (2018). Enhancing Big Data with semantics: The AsterixDB approach. In Proc. of 12th IEEE International Conference on Semantic Computing, 314--315. IEEE.Google ScholarCross Ref
- Emani, C. K., Cullot, N., Nicolle, C. (2015). Understandable big data: a survey. Computer Science Review, 17, 70--81. Google ScholarDigital Library
- Casini, L., Delnevo, G., Roccetti, M., Zagni, N., Cappiello, G. (2019). Deep Water: Predicting water meter failures through a human-machine intelligence collaboration. In Proc. of 1st International Conference on Human Interaction & Emerging Technologies. Springer. To appearGoogle Scholar
- Bird, S., Kenthapadi, K., Kiciman, E., Mitchell, M. (2019). Fairness-Aware Machine Learning: Practical challenges and lessons learned. In Proc. of 12th ACM International Conference on Web Search and Data Mining, 834--835. ACM Google ScholarDigital Library
- Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P., Roth, D. (2019). A comparative study of fairness-enhancing interventions in machine learning. In Proc. of Fairness, Accountability, and Transparency Conference, 329--338, ACM. Google ScholarDigital Library
- Rosner, D., Roccetti, M., Marfia, G., (2014). The digitization of cultural practices. Communications of the ACM 57(6), 82--87, ACM. Google ScholarDigital Library
- Casini, L., Marfia, G., Roccetti, M., (2018). Some reflections on the potential and limitations of deep learning for automated music generation. In Proc. of International Symposium on Personal, Indoor and Mobile Radio Communications, IEEEGoogle ScholarCross Ref
Index Terms
- A Paradox in ML Design: Less data for a smarter water metering cognification experience
Recommendations
A Cautionary Tale for Machine Learning Design: why we Still Need Human-Assisted Big Data Analysis
AbstractSupervised Machine Learning (ML) requires that smart algorithms scrutinize a very large number of labeled samples before they can make right predictions. And this is not always true either. In our experience, in fact, a neural network trained with ...
Smartdata: Data preprocessing to achieve smart data in R
AbstractAs the amount of data available exponentially grows, data scientists are aware that finding the value in the data is key to a successful data exploiting. However, the data rarely presents itself in a ordered, clean way. In opposition ...
Detecting anomalies and de-noising monitoring data from sensors: A smart data approach
AbstractWhen monitoring safety levels in deep pit foundations using sensors, anomalies (e.g., highly correlated variables) and noise (e.g., high dimensionality) exist in the extracted time series data, impacting the ability to assess risks. ...
Comments