ABSTRACT
Used car trade is one of the major components of the world economies. It is not uncommon to sell a car by placing an internet advertisement irrespective of the geography in these days. A typical content of an advertisement is usually composed of two parts namely the structured and the free text data. The structured data may include some information about the asking price, make, model, year, mileage of the car and the contact info. In most cases, seller may give important clues about the car's current conditions in the free text data where the title (head) of the advertisement can be included as free text too. This paper reports preliminary results from a text mining study conducted on 75K used car internet listings collected from two major car listing web sites in Turkey. As expected, the words and the phrases related to the description of the car are observed to be frequent. The leading concepts in the free text are found to be regarding how to describe the current condition of a car, for example "no crash history".
- Ahmet Afsin Akin and Mehmet Dündar Akin. 2007. Zemberek, an open source NLP framework for Turkic languages. Structure 10 (2007), 1--5.Google Scholar
- L. Lee B. Pang. 2004. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 271. Google ScholarDigital Library
- L. Lee B. Pang and S. Vaithyanathan. 2002. Thumbs up?: sentiment classification using machine learning techniques. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Vol. 10. Association for Computational Linguistics, 79--86. Google ScholarDigital Library
- William B. Cavnar and John M. Trenkle. 1994. N-Gram-Based Text Categorization. In In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval. 161--175.Google Scholar
- J. Elder IV G. Miner and T. Hill. 2012. Practical text mining and statistical analysis for non-structured text data applications. Academic Press. Google ScholarDigital Library
- B. Kjell. 1994. Authorship attribution of text samples using neural networks and Bayesian classifiers. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Vol. 2. 1660--1664. 400086Google ScholarCross Ref
- Tong Zhang Sholom M. Weiss, Nitin Indurkhya and Fred J. Damerau. 2005. Text Mining Predictive Methods for Analyzing Unstructured Information. Springer.Google Scholar
- E. Stamatatos. 2009. A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60, 3 (2009), 538--556. Google ScholarDigital Library
- A. Stolcke. 2002. SRILM-an extensible language modeling toolkit. In Interspeech.Google Scholar
- Peter D. Turney. 2002. Thumbs Up or Thumbs Down?: Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL '02). Association for Computational Linguistics, Stroudsburg, PA, USA, 417--424. Google ScholarDigital Library
- E. Umut. 2009. Sentiment Analysis In Turkish. Master's thesis. Middle East Technical University, Ankara, Turkey.Google Scholar
- V. Uzun. 2014. Semantic Text Mining And An Application In Turkish Documents. Master's thesis. Dokuz Eylul University, Izmir, Turkey.Google Scholar
- G. K. Zipf. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley, Reading MA (USA).Google Scholar
Index Terms
- Analyzing used-car web listings via text mining
Recommendations
A smart car control model for brake comfort based on car following
This paper demonstrates a novel car-following model focused on passenger comfort, for example, a rapid deceleration will make passengers uncomfortable. The brake comfort model of car following was set up according to the relationship between vehicle ...
Classifying Indonesian Online Articles as Advertisement Placement Base Using Text Mining
ICBIM 2017: Proceedings of the International Conference on Business and Information ManagementRapid development in technological aspect resulting in growing level of human needs for the latest news, so that emerged a new trend of publishing and accessing news through online media or usually called online journalism. In addition, the number of ...
Breaking news detection from the web documents through text mining and seasonality
In recent years, news distribution through the internet has increased significantly and so does our growing dependency on online news sources. As vast numbers of web documents from different news websites are readily available, it is possible to extract ...
Comments