ABSTRACT
Web Index (WIX in short) is a system that achieves joining information resources on the Web. WIX replaces keywords in Web documents hyperlinks to other web pages based on a WIX file that a user chose. WIX file is a kind of a dictionary that have a set of WIX entries (keyword and target URL). Using WIX, users can join any Web contents and arbitrary dictionaries. In conventional WIX, matching and linking are executed only for fixed character strings between the keyword set and the input text. However, when a user wants to search for phrases like idioms, this matching system is not sufficient because of the declension of words, change of the verb tense, and so on. Therefore, we propose a phrasal pattern matching mechanism on WIX. This helps users easily find idiom expressions in the text on the web and get more information.
- Dimitra Anastasiou. 2010. Idiom treatment experiments in machine translation. Cambridge Scholars Publishing.Google Scholar
- Masahiro Hayashi and Motomichi Toyama. 2011. Keio WIX System (1) User Interface. In DEIM'11 The 3rd Forum on Data Engineering and Information Management.Google Scholar
- F Ishizaki and M Toyama. 2012. An incremental update algorithm for large Aho-Corasick automaton. In Proceedings of the 4th Forum on Data Engineering and Information Management, F11-5. 1--6.Google Scholar
- Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. (1993).Google Scholar
- Pablo N Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. 2011. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems. 1--8. Google ScholarDigital Library
- Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge. In Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM '07. 233--242. Google ScholarDigital Library
- David Milne and Ian H Witten. 2008. Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management. 509--518. Google ScholarDigital Library
- Ryosuke Mori and Motomichi Toyama. 2011. Keio WIX System (2) Server Side Implementation. In DEIM'11 The 3rd Forum on Data Engineering and Information Management.Google Scholar
- W. Shen, J. Han, J. Wang, X. Yuan, and Z. Yang. 2018. SHINE+: A General Framework for Domain-Specific Entity Linking with Heterogeneous Information Networks. IEEE Transactions on Knowledge and Data Engineering 30, 2 (2018), 353--366.Google Scholar
- spacy.io. [n.d.]. spaCy · Industrial-strength Natural Language Processing in Python. https://spacy.io/Google Scholar
- Ikuya Yamada, Tomotaka Ito, Shinnosuke Usami, Shinsuke Takagi, Tomoya Toyoda, Hideaki Takeda, and Yoshiyasu Takefuji. 2014. Linkify: Enhanced reading experience by augmenting text using Linked Open Data. ISWC 2014 Semantic Web Challenge (2014).Google Scholar
Index Terms
- A Patten Matcher for English Idioms on Web IndeX
Recommendations
Automatic Determination of Hyperlink Destination in Web Index
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications SymposiumIn general, a search engine is used to obtain information about specified keywords of interest. However, users must go through the list of web pages presented by the search engine in order to find the page that meets the purpose. In order to reduce this ...
Understanding web documents: finding pagelets for transformation using structural patterns
Understanding a web document and the sections inside the document is very important for web transformation and information retrieval from web pages. Detecting pagelets, which are small features located inside a web page, in order to understand a web ...
Supporting Web Content Development using Web Index
IDEAS '15: Proceedings of the 19th International Database Engineering & Applications SymposiumContent Management Systems (CMS) have been used recently to facilitate dynamic website creation and enable division of labor in web authoring. However, even with the use of CMS, the hyperlinks in websites must be created for every page in the website by ...
Comments