ABSTRACT
Over the last decade, India has witnessed an explosion in the ecommerce industry. There is increasing adoption of e-commerce in smaller towns and cities over and above the densely populated urban centers. In this paper, we discuss the practical challenges involved with developing high-precision geocoding engines for these geographical regions in India. These challenges motivate the next iteration of our geocoding framework. In particular, we focus on addressing three core areas of improvement: 1) leveraging customer delivery data for geocoding, 2) understanding and solving for the diversity and variations in addresses for these new regions, and 3) overcoming the limited coverage of our reference corpus. To this end, we present GeoCloud. Key contributions of GeoCloud are 1) a training algorithm for learning reference-representations from delivery coordinates and 2) a retrieval algorithm for geocoding new addresses. We perform extensive testing of GeoCloud across India to capture the regional, socio-economical and linguistic diversity of our country. Our evaluation data is sampled from 72 cities and 21 states from the delivery addresses of a large e-commerce platform in India. The results show a significant improvement in precision and recall over the state-of-the-art geocoding system for India, and demonstrate the effectiveness of our intuitive, robust and generic approach. While we have shown the effectiveness of the framework for Indian addresses, we believe the framework can be applied to other countries as well, particularly where addresses are unstructured. To the best of our knowledge, this is the first instance of geocoding by learning reference-representations from large-scale delivery data.
- T. Ravindra Babu, Abhranil Chatterjee, Shivram Khandeparker, A. Vamsi Subhash, and Sawan Gupta. 2015. Geographical Address Classification without Using Geolocation Coordinates. In Proceedings of the 9th Workshop on Geographic Information Retrieval (Paris, France) (GIR '15). Association for Computing Machinery, New York, NY, USA, Article 8, 10 pages. https://doi.org/10.1145/2837689.2837696 Google ScholarDigital Library
- Pavel Berkhin, Michael R. Evans, Florin Teodorescu, Wei Wu, and Dragomir Yankov. 2015. A New Approach to Geocoding: BingGC. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (Seattle, Washington) (SIGSPATIAL '15). Association for Computing Machinery, New York, NY, USA, Article 7, 10 pages. https://doi.org/10.1145/2820783.2820827 Google ScholarDigital Library
- Kalaari Capital. 2017. Kalaari-KStart - Internet Growth in India. http://kstart.in/wp-content/uploads/2017/08/India_Internet_Report_2017.pdfGoogle Scholar
- Abhranil Chatterjee, Janit Anjaria, Sourav Roy, Arnab Ganguli, and Krishanu Seal. 2016. SAGEL: Smart Address Geocoding Engine for Supply-Chain Logistics. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (Burlingame, California) (SIGSPACIAL '16). Association for Computing Machinery, New York, NY, USA, Article 42, 10 pages. https://doi.org/10.1145/2996913.2996917 Google ScholarDigital Library
- DescriptionPricewaterhouseCoopers. 2014. PWC e-Commerce Growth Report for 2014. https://www.pwc.in/assets/pdfs/publications/2015/ecommerce-in-india-accelerating- growth.pdfGoogle Scholar
- Ernst and Young. 2017. EY Growth Report, 2017. https://www.ey.com/Publication/vwLUAssets/ey-indias-growth-paradigm/$FILE/ey- indias- growth-paradigm.pdfGoogle Scholar
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-based Algorithm for Discovering Clusters a Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (Portland, Oregon) (KDD'96). AAAI Press, 226--231. http://dl.acm.org/citation.cfm?id=3001460.3001507 Google ScholarDigital Library
- Indian Brand Equity Foundation. 2018. IBEF e-Commerce Review 2018. https://www.ibef.org/download/Ecommerce-February-20181.pdfGoogle Scholar
- Wolf Garbe. 2019. Sym Spell Compound. https://github.com/wolfgarbe/SymSpellGoogle Scholar
- GeoNames. 2017. GeoNames Geographical Database. http://www.geonames.org/Google Scholar
- Gisgraphy. 2020. Gisgraphy World Geocoding. http://www.gisgraphy.comGoogle Scholar
- Daniel W Goldberg. 2008. A geocoding best practices guide. (2008).Google Scholar
- Daniel W Goldberg. 2011. Improving Geocoding Match Rates with Spatially-Varying Block Metrics. Transactions in GIS 15, 6 (2011), 829--850.Google ScholarCross Ref
- Daniel W Goldberg, John P Wilson, and Craig A Knoblock. 2007. From text to geographic coordinates: the current state of geocoding. URISA-WASHINGTON DC- 19, 1 (2007), 33.Google Scholar
- Chandigarh Government. 2018. Interactive Map of Chandigarh. http://chandigarh.gov.in/knowchd_map.htmGoogle Scholar
- Komoot. 2020. Photon. https://github.com/komoot/photonGoogle Scholar
- MapQuest. 2020. MapQuest Developer Network. https://developer.mapquest.com/Google Scholar
- Office of the Registrar General and India Census Commissioner. 2001. Census Data 2001, India at a glance. https://www.censusindia.gov.in/Census_Data_2001/India_at_glance/glance.aspxGoogle Scholar
- OpenStreetMap. 2020. Nominatim Opensource Search. https://github.com/twain47/NominatimGoogle Scholar
- OpenStreetMap. 2020. OpenStreetMap Nominatim. http://nominatim.openstreetmap.org/Google Scholar
- Google Maps Platform. 2020. Google Maps Geocoding API. https://developers.google.com/Google Scholar
- Sina Rashidian, Xinyu Dong, Amogh Avadhani, Prachi Poddar, and Fusheng Wang. 2017. Effective Scalable and Integrative Geocoding for Massive Address Datasets. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (Redondo Beach, CA, USA) (SIGSPATIAL '17). Association for Computing Machinery, New York, NY, USA, Article 26, 10 pages. https://doi.org/10.1145/3139958.3139986 Google ScholarDigital Library
- RedSeer. 2018. RedSeer 2018 e-Tailing Perspective. http://redseer.com/reports/e-tailing-in-india-redseer-perspective/Google Scholar
- Anil Kumar Singh. 2006. A Computational Phonetic Model for Indian Language Scripts. In Proceedings of Constraints on Spelling Changes: Fifth International Workshop on Writing Systems.Google Scholar
- Here Technologies. 2020. HERE Geocoder API. https://developer.here.com/Google Scholar
- Duck-Hye Yang, Lucy Mackey Bilaver, Oscar Hayes, and Robert Goerge. 2004. Improving geocoding practices: evaluation of geocoding tools. Journal of medical systems 28, 4 (2004), 361--370. Google ScholarDigital Library
- Paul A Zandbergen. 2009. Geocoding quality and implications for spatial analysis. Geography Compass 3, 2 (2009), 647--680.Google ScholarCross Ref
Index Terms
- A Geocoding Framework Powered by Delivery Data
Recommendations
Geographical address classification without using geolocation coordinates
GIR '15: Proceedings of the 9th Workshop on Geographic Information RetrievalOnline retail focuses on optimal delivery system of ordered shipments. In the Last Mile context of a Supply Chain, automatic categorization of addresses is an important problem. An automated solution to this problem reduces manual effort significantly ...
SAGEL: smart address geocoding engine for supply-chain logistics
SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsWith the recent explosion of e-commerce industry in India, the problem of address geocoding, that is, transforming textual address descriptions to geographic reference, such as latitude, longitude coordinates, has emerged as a core problem for supply ...
Effective Scalable and Integrative Geocoding for Massive Address Datasets
SIGSPATIAL '17: Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsWith increased accessibility of large scale open data, public health studies are able to take advantage of integrative spatial big data to increase the spatial resolution to community or neighborhood level. One critical information for such studies is ...
Comments