Abstract
Different context types have different potentialities in representing subjects of documents. However, little is known on their probable differences regarding discriminating documents with different subjects. The present study aimed to compare the discrimination powers of five context types (i.e. title, textual, text-citation, reference, and reference-title contexts). Using a content analysis method, a test collection consisted of 3637 papers containing 15 English homographs in 54 subject clusters were analyzed by discriminant analysis to determine the discrimination among the clusters. The results confirmed that the homograph context types significantly discriminate between the clusters. The contexts can correctly cluster the documents from about 40 to 80%, with text and reference-title context types being stronger in differentiating the clusters. This is the first study to compare the powers of different textual contexts kinds and recommend to concentrate on context types to discriminate between documents with homographs to improve the effectiveness of their retrieval. It is suggested that a comparison of the context types be used as a tool in re-ranking or clustering of the retrieved results.
Similar content being viewed by others
Notes
One of the homographs were very low in terms of the number of documents retrieved and was, hence, ignored in the analyses.
References
Abu-Jbara, A., Ezra, J., & Radev, D. (2013). Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 596–606).
Bijen Khan, M. & Morad Zade, S. (2010). Homographs in Persian: Presented in the first Persian language and computer workshops. Tehran: Samt (Persian).
Blaikie, N. (2009). Designing social studies. Cambridge: Polity.
Brewster, C., & O’Hara, K. (2007). Knowledge representation with ontologies: Present challenges—Future possibilities. International Journal of Human-Computer Studies, 65(7), 563–568.
Bussmann, H. (2006). Routledge dictionary of language and linguistics (G. Trauth, & K. Kazzazi, Eds. and Trans.). New York: Routledge.
Chaplot, D. S., & Bhattacharyya, D. P. (2014). Literature Survey on Unsupervised Word Sense Disambiguation. IIT Bombay, May, 7.
Choi, S. H. (2010). Document clustering using reference titles. Journal of Information Management, 27(2), 241–252.
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.
Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. In Literature survey for the language and statistics II course at CMU (Vol. 4, pp. 192–195).
Giyanani, R. (2013). A survey on word sense disambiguation. IOSR Journal of Computer Engineering (IOSR-JCE), 14, 30–33.
Glänzel, W. (2015). Bibliometrics-aided retrieval: where information retrieval meets scientometrics. Scientometrics, 102(3), 2215–2222.
Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 joint ACM/IEEE conference on digital libraries, 2004 (pp. 296–305). IEEE.
Harada, T., & Tsuda, K. (2014). Classifying homographs in japanese social media texts using a user interest model. Procedia Computer Science, 35, 929–936.
Harmandas, V., Sanderson, M., & Dunlop, M. D. (1997). Association for computing machinery: “Image retrieval by hypertext links”. In Proceedings of the 20th Annual international ACM-SIGIR conference on research and development in information retrieval. Philadelphia, PA, July 27–31.
Hearst, M. (1991). Noun homograph disambiguation using local context in large text corpora. In Proceedings of the 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research. Oxford, October. http://people.ischool.berkeley.edu/~hearst/papers/oed91.pdf.
Hindle, D. (1990). Noun classification from predicate-argument structures. In Proceedings of the 28th annual meeting on Association for Computational Linguistics (pp. 268–275). Association for Computational Linguistics.
Hjørland, B. (2008). What is knowledge organization (KO)? Knowledge Organization, 35(2/3), 86–101.
Hurford, J. R., Heasley, B., & Smith, M. B. (2007). Semantics: A coursebook (2nd ed). New York: Cambridge University press.
Ingwersen, P. (1992). Information retrieval interaction. London: Taylor Graham Publishing.
Jeong, Y. K., Song, M., & Ding, Y. (2014). Content-based author co-citation analysis. Journal of Informetrics, 8(1), 197–211.
Kalita, P., & Barman, A. K. (2015). Word sense disambiguation: A survey. International Journal of Engineering and Computer Science, 4(5), 11743–11748V.
Klecka, W. R. (1980). Discriminant analysis. London: Sage Publications.
Lee, Y. K., Ng, H. T., & Chia, T. K. (2004). Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In Senseval-3: Third international workshop on the evaluation of systems for the semantic analysis of text (pp. 137–140).
Liu, X., Yu, Y., Guo, C., Sun, Y., & Gao, L. (2014). Full-text based context-rich heterogeneous network mining approach for citation recommendation. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 361–370). IEEE Press.
Machlup, F. (1983). Semantic quirks in studies of information. In F. Machlup & U. Mansfield (Eds.), The study of information: Interdisciplinary messages (pp. 641–671). New York: Wiley.
Makki, R., & Homayounpour, M. (2008). Word sense disambiguation of Farsi homographs using thesaurus and corpus. In A. Ranta & B. Nordström (Eds.), Proceedings of 6th International Conference of the Advances in Natural Language Processing (pp. 315–323). New York: Springer.
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.
Nakov, P. I., Schwartz, A. S., & Hearst, M. (2004). Citances: Citation sentences for semantic analysis of bioscience text. In Proceedings of the SIGIR (Vol. 4, pp. 81–88).
Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. In IJCAI (Vol. 99, pp. 926–931).
Nicolaisen, J. (2007). Citation analysis. Annual Review of Information Science and Technology, 41(1), 609–641.
Palmer, F. R. (1981). A new outline (2nd ed). New York: Cambridge University press.
Pao, M. L. (1989). Concepts of information retrieval. Englewood: Libraries Unlimited.
Qazvinian, V., & Radev, D. R. (2010). Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 555–564). Association for Computational Linguistics.
Rezapour, A. R., Fakhrahmad, S. M., & Sadreddini, M. H. (2011). Applying weighted KNN to word sense disambiguation. In Proceedings of the world congress on engineering (Vol. 3, pp. 6–8).
Riahi, N., & Sedghi, F. (2012). A semi-supervised method for persian homograph disambiguation. In 2012 20th Iranian conference on electrical engineering (ICEE) (pp. 748–751). IEEE.
Searle, J. R. (1984). Intentionality and its place in nature. Synthese, 61(1), 3–16.
Sekkingstad, A. (2016). Word sense disambiguation in webpages. Developing a program capable to disambiguate words with a website text as context. Master’s thesis, The University of Bergen.
Shan, S. M., Cui, Y., & He, Y. H. (2016). Homonyms discovery in folksonomy based on user community analysis. Journal of Electronic Science and Technology, 14(3), 275–280.
Shin, J. C., & Ock, C. Y. (2016). Improvement of Korean homograph disambiguation using Korean lexical semantic network (UWordMap). Journal of KIISE, 43(1), 71–79.
Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388.
Smith, L. C. (1981). Citation analysis. Library Dends, 30, 83–106.
Soergel, D., Lauser, B., Liang, A., Fisseha, F., Keizer, J., & Katz, S. (2006). Reengineering thesauri for new applications: The AGROVOC example. Journal of digital information. https://journals.tdl.org/jodi/index.php/jodi/article/view/112/111.
Soudani, N., Bounhas, I., & Slimani, Y. (2016). Semantic information retrieval: A comparative experimental study of NLP tools and language resources for Arabic. In 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI) (pp. 879–887). IEEE.
Tang, J., Fong, A. C., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.
Tesprasit, V., Charoenpornsawat, P., & Sornlertlamvanich, V. (2003). A context-sensitive homograph disambiguation in Thai text-to-speech synthesis. In Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology: Companion volume of the proceedings of HLT-NAACL 2003—short papers (Vol. 2, pp. 103–105). Association for Computational Linguistics.
Tong, T., Dinakarpandian, D., & Lee, Y. (2009). Literature clustering using citation semantics. In 42nd Hawaii international conference on system sciences, 2009. HICSS’09. (pp. 1–10). IEEE.
Tsai, C. T., Kundu, G., & Roth, D. (2013). Concept-based analysis of scientific literature. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 1733–1738). ACM.
Twilley, L. C., Dixon, P., Taylor, D., & Clark, K. (1994). University of Alberta norms of relative meaning frequency for 566 homographs. Memory & Cognition, 22(1), 111–126.
Weizenbaum, J. (1976). Computer power and human reason. San Francisco: W. H. Freeman.
Wolfram, D. (2015). The symbiotic relationship between information retrieval and informetrics. Scientometrics, 102(3), 2201–2214.
Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y., & Ma, J. (2004). Learning to cluster web search results. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 210–217). ACM.
Acknowledgements
The authors would like to cordially thank Professor M. Fakhr Ahmad (at Shiraz University) for all his valuable advice and help.
Author information
Authors and Affiliations
Corresponding author
Appendix: List of the homographs identified by Twilley et al. (1994)
Appendix: List of the homographs identified by Twilley et al. (1994)
Homograph | Homograph | Homograph | Homograph | Homograph | Homograph |
---|---|---|---|---|---|
ACE | BRIDGE | COUNT | DUCK | GAG | INTEREST |
ACT | BROKE | COUNTER | DUMP | GAME | INTIMATE |
ADMIT | BRUSH | COURSE | EAR | GAS | INVALID |
ADVANCE | BULB | COURT | ENTRANCE | GEAR | IRON |
AFFAIR | CABINET | COVERED | EXCISE | GERM | ISSUE |
AIR | CABLE | CRAB | EXPRESS | GIN | JACK |
ANGLE | CALF | CRAFT | FAIR | GLARE | JAM |
ANNUAL | CALL | CRANE | FALL | GLASS | JAR |
BALL | CAN | CRANK | FAN | GRACE | JERK |
ARM | CANE | CREST | FANCY | GRADE | JET |
ARTICLE | CAP | CRICKET | FARE | GRAFT | JOINT |
BAND | CAPE | CROOK | FAST | GRAIN | JUICE |
BANK | CAPITAL | CRUST | FAULT | GRASS | JUNK |
BAR | CARD | CUE | FAWN | GRATE | KERNEL |
BARK | CARP | CUFF | FELT | GRAVE | KEY |
BASE | CASE | CURB | FENCE | GREEN | KICK |
BASS | CAST | CYCLE | FAELD | GRILL | KID |
BAT | CELL | DART | FAGURE | GRIND | KIND |
BATTERY | CHAIN | DASH | FALE | GROSS | LACE |
BAY | CHANCE | DATE | FALM | GROUND | LAND |
BEAD | CHANGE | DECK | FANE | GUY | LAP |
BEAM | CHARGE | DEED | FANISH | HABIT | LASH |
BEAR | CHARM | DEEP | Fire | HAIL | LEAD |
BEEF | CHECK | DEPOSIT | FIRM | HAM | LEAF |
BEING | CHEST | DESERT | FIT | HAMPER | LEAN |
BELT | CHEW | DIAMOND | FIX | HAND | LEFT |
BEND | CHINA | DIE | FLAT | HANG | LETTER |
BILL | CHIP | DIGEST | FLEET | HARD | LIE |
BIT | CHOP | DIGIT | FLIGHT | HARP | LIGHT |
BITTER | CHUCK | DIP | FLING | HATCH | LIKE |
BLOCK | CLIP | DIRT | FLOAT | HAUNT | LIME |
BLOW | CLOG | DIVE | FLUSH | HEAD | LIMP |
BLUE | CLUB | DOUGH | FLY | HEAT | LINE |
BLUFF | COAST | DOVE | FOIL | HEEL | LIP |
BLUNT | COAT | DOWN | FOLD | HEM | LIST |
BOARD | COLD | DRAFT | FOOT | HIDE | LITTER |
BOIL | COMB | DRAG | FORCE | HOLD | LOAF |
BOLT | COMPACT | DRAW | FORM | HOOD | LOBBY |
BOND | COMPANY | DRESS | FOUL | HOP | LOCK |
BOOM | COMPOUND | DRILL | FRAME | HORN | LOG |
BOOT | CONSOLE | DRIP | FRAY | HOST | LOT |
BOUND | CONTACT | DRIVE | FREE | HOUND | LOUNGE |
BOW | CONTRACT | DROP | FRESH | HULL | LOW |
BOWL | Copy | DROVE | FRISK | HUSKY | MAD |
BOX | CORD | DRUM | FRONT | INCENSE | MAJOR |
BREAK | CORN | DRY | FUSE | INCLINE | MARBLE |
MARCH | PERCH | RARE | SECOND | STAKE | TENDER |
MARK | PERFECT | RASH | SENSE | STALK | TERM |
MAROON | PERJOD | REAR | SENTENCE | STALL | TERMINAL |
MASS | PERMIT | RECORD | SET | STAMP | TERMS |
MATCH | PET | REEL | SHARE | STAND | THROW |
MEAL | PICK | REFLECT | SHARP | STAPLE | TICK |
MEAN | PICKET | REFRAIN | SHED | STAR | TIE |
MESS | PILE | REFUSE | SHELL | STATE | TILL |
MIGHT | PINCH | REGISTER | SHIFT | STATIC | TIP |
MIND | PIPE | RELISH | SHIP | STEEP | TIRE |
MINE | PIT | RENT | SHOOT | STEER | TOAST |
MINT | PITCH | RESERVATION | SHOT | STERN | TOLL |
MINUTE | PITCHER | RESERVE | SHOWER | STEW | TOOL |
MISS | PLAIN | RESORT | SHUTTLE | STICK | TOP |
MODEL | PLANE | REST | SIDE | STILL | TRAC |
MOLD | PLANT | RIB | SIGN | STING | TRACK |
MOLE | PLAY | RICH | SINK | STIR | TRADE |
MOTION | PLOT | RIDDLE | SKIRT | STITCH | TRAIN |
MUG | POACH | RIGHT | SLAB | STOCK | TREAT |
NAG | POINT | RING | SLIDE | STOLE | TRIAL |
NAIL | POKER | ROAD | SLING | STORE | TRIM |
NAP | POLE | ROCK | SLIP | STORY | TRIP |
NET | POOL | ROLL | SLUG | STRAIN | TRUNK |
NOTE | PORT | ROOM | SMACK | STRAND | TRUST |
NOVEL | POST | ROOT | SMART | STRAW | TRY |
NUT | POT | ROSE | SMELT | STRAY | TURN |
OBJECT | POUND | ROUND | SNAP | STRESS | TYPE |
ODD | PRESENT | ROW | SOCK | STRIKE | UPSET |
OPERATION | PRESS | RUBBER | SOLE | STRIP | UPSET |
ORDER | PRIME | RULER | SORE | STROKE | VAULT |
ORGAN | PRODUCE | RUNG | SOUND | STUD | VENT |
PACK | PROJECT | RUNNER | SOW | SUBJECT | VESSEL |
PAD | PROOF | SACK | SPADE | SUIT | VICE |
PAGE | PRUNE | SAGE | SPARE | SWALLOW | VOLUME |
PALM | PUMP | SAP | SPEAKER | SWAMP | WAKE |
PANEL | PUNCH | SASH | SPEED | SWEAR | WALKER |
PARK | PUPIL | SAW | SPELL | SWITCH | WASH |
PART | QUACK | SCALE | SPOT | TAB | WASTE |
PARTY | QUEEN | SCALLOP | SPRAY | TACK | WATCH |
PASS | QUIVER | SCHOOL | SPREAD | TAG | WAVE |
PASSAGE | RACE | SCOOP | SPRING | TAP | WAX |
PAT | RACKET | SCRAP | SQUARE | TAPER | WEAR |
PATIENT | RAKE | SCRATCH | SQUASH | TART | WELL |
PAWN | RAM | SCREEN | STABLE | TAX | WILL |
PEER | RANGE | SCRUB | STAFF | TEAR | WIND |
PELT | RANK | SEAL | STAG | TEMPLE | WING |
PEN | RAP | SEASON | STAGE | TEND | WORK |
WOUND | |||||
YARD | |||||
YARN | |||||
YELLOW | |||||
YIELD | |||||
YOKE | |||||
ZEST |
Rights and permissions
About this article
Cite this article
Sotudeh, H., Houshyar, M. Comparing discrimination powers of text and citation-based context types. Scientometrics 114, 229–251 (2018). https://doi.org/10.1007/s11192-017-2566-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-017-2566-9