skip to main content
10.1145/3109761.3109782acmotherconferencesArticle/Chapter ViewAbstractPublication PagesimlConference Proceedingsconference-collections
research-article

Analyzing used-car web listings via text mining

Published:17 October 2017Publication History

ABSTRACT

Used car trade is one of the major components of the world economies. It is not uncommon to sell a car by placing an internet advertisement irrespective of the geography in these days. A typical content of an advertisement is usually composed of two parts namely the structured and the free text data. The structured data may include some information about the asking price, make, model, year, mileage of the car and the contact info. In most cases, seller may give important clues about the car's current conditions in the free text data where the title (head) of the advertisement can be included as free text too. This paper reports preliminary results from a text mining study conducted on 75K used car internet listings collected from two major car listing web sites in Turkey. As expected, the words and the phrases related to the description of the car are observed to be frequent. The leading concepts in the free text are found to be regarding how to describe the current condition of a car, for example "no crash history".

References

  1. Ahmet Afsin Akin and Mehmet Dündar Akin. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure 10 (2007), 1--5.Google ScholarGoogle Scholar
  2. L. Lee B. Pang. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 271. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Lee B. Pang and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol. 10. Association for Computational Linguistics, 79--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. William B. Cavnar and John M. Trenkle. 1994. N-Gram-Based Text Categorization. In In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval. 161--175.Google ScholarGoogle Scholar
  5. J. Elder IV G. Miner and T. Hill. 2012. Practical text mining and statistical analysis for non-structured text data applications. Academic Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Kjell. 1994. Authorship attribution of text samples using neural networks and Bayesian classifiers. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Vol. 2. 1660--1664. 400086Google ScholarGoogle ScholarCross RefCross Ref
  7. Tong Zhang Sholom M. Weiss, Nitin Indurkhya and Fred J. Damerau. 2005. Text Mining Predictive Methods for Analyzing Unstructured Information. Springer.Google ScholarGoogle Scholar
  8. E. Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60, 3 (2009), 538--556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Stolcke. 2002. SRILM-an extensible language modeling toolkit. In Interspeech.Google ScholarGoogle Scholar
  10. Peter D. Turney. 2002. Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). Association for Computational Linguistics, Stroudsburg, PA, USA, 417--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Umut. 2009. Sentiment Analysis In Turkish. Master's thesis. Middle East Technical University, Ankara, Turkey.Google ScholarGoogle Scholar
  12. V. Uzun. 2014. Semantic Text Mining And An Application In Turkish Documents. Master's thesis. Dokuz Eylul University, Izmir, Turkey.Google ScholarGoogle Scholar
  13. G. K. Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading MA (USA).Google ScholarGoogle Scholar

Index Terms

  1. Analyzing used-car web listings via text mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IML '17: Proceedings of the 1st International Conference on Internet of Things and Machine Learning
        October 2017
        581 pages
        ISBN:9781450352437
        DOI:10.1145/3109761

        Copyright © 2017 ACM

        © 2017 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 October 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader