Skip to main content
Log in

Comparing discrimination powers of text and citation-based context types

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

Different context types have different potentialities in representing subjects of documents. However, little is known on their probable differences regarding discriminating documents with different subjects. The present study aimed to compare the discrimination powers of five context types (i.e. title, textual, text-citation, reference, and reference-title contexts). Using a content analysis method, a test collection consisted of 3637 papers containing 15 English homographs in 54 subject clusters were analyzed by discriminant analysis to determine the discrimination among the clusters. The results confirmed that the homograph context types significantly discriminate between the clusters. The contexts can correctly cluster the documents from about 40 to 80%, with text and reference-title context types being stronger in differentiating the clusters. This is the first study to compare the powers of different textual contexts kinds and recommend to concentrate on context types to discriminate between documents with homographs to improve the effectiveness of their retrieval. It is suggested that a comparison of the context types be used as a tool in re-ranking or clustering of the retrieved results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. One of the homographs were very low in terms of the number of documents retrieved and was, hence, ignored in the analyses.

References

  • Abu-Jbara, A., Ezra, J., & Radev, D. (2013). Purpose and polarity of citation: Towards NLP-based bibliometrics. In Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies (pp. 596–606).

  • Bijen Khan, M. & Morad Zade, S. (2010). Homographs in Persian: Presented in the first Persian language and computer workshops. Tehran: Samt (Persian).

  • Blaikie, N. (2009). Designing social studies. Cambridge: Polity.

    Google Scholar 

  • Brewster, C., & O’Hara, K. (2007). Knowledge representation with ontologies: Present challenges—Future possibilities. International Journal of Human-Computer Studies, 65(7), 563–568.

    Article  Google Scholar 

  • Bussmann, H. (2006). Routledge dictionary of language and linguistics (G. Trauth, & K. Kazzazi, Eds. and Trans.). New York: Routledge.

  • Chaplot, D. S., & Bhattacharyya, D. P. (2014). Literature Survey on Unsupervised Word Sense Disambiguation. IIT Bombay, May, 7.

  • Choi, S. H. (2010). Document clustering using reference titles. Journal of Information Management, 27(2), 241–252.

    Google Scholar 

  • Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.

    Google Scholar 

  • Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. In Literature survey for the language and statistics II course at CMU (Vol. 4, pp. 192–195).

  • Giyanani, R. (2013). A survey on word sense disambiguation. IOSR Journal of Computer Engineering (IOSR-JCE), 14, 30–33.

    Article  Google Scholar 

  • Glänzel, W. (2015). Bibliometrics-aided retrieval: where information retrieval meets scientometrics. Scientometrics, 102(3), 2215–2222.

    Article  Google Scholar 

  • Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 joint ACM/IEEE conference on digital libraries, 2004 (pp. 296–305). IEEE.

  • Harada, T., & Tsuda, K. (2014). Classifying homographs in japanese social media texts using a user interest model. Procedia Computer Science, 35, 929–936.

    Article  Google Scholar 

  • Harmandas, V., Sanderson, M., & Dunlop, M. D. (1997). Association for computing machinery: “Image retrieval by hypertext links”. In Proceedings of the 20th Annual international ACM-SIGIR conference on research and development in information retrieval. Philadelphia, PA, July 27–31.

  • Hearst, M. (1991). Noun homograph disambiguation using local context in large text corpora. In Proceedings of the 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research. Oxford, October. http://people.ischool.berkeley.edu/~hearst/papers/oed91.pdf.

  • Hindle, D. (1990). Noun classification from predicate-argument structures. In Proceedings of the 28th annual meeting on Association for Computational Linguistics (pp. 268–275). Association for Computational Linguistics.

  • Hjørland, B. (2008). What is knowledge organization (KO)? Knowledge Organization, 35(2/3), 86–101.

    Google Scholar 

  • Hurford, J. R., Heasley, B., & Smith, M. B. (2007). Semantics: A coursebook (2nd ed). New York: Cambridge University press.

    Book  Google Scholar 

  • Ingwersen, P. (1992). Information retrieval interaction. London: Taylor Graham Publishing.

    Google Scholar 

  • Jeong, Y. K., Song, M., & Ding, Y. (2014). Content-based author co-citation analysis. Journal of Informetrics, 8(1), 197–211.

    Article  Google Scholar 

  • Kalita, P., & Barman, A. K. (2015). Word sense disambiguation: A survey. International Journal of Engineering and Computer Science, 4(5), 11743–11748V.

    Google Scholar 

  • Klecka, W. R. (1980). Discriminant analysis. London: Sage Publications.

    Book  Google Scholar 

  • Lee, Y. K., Ng, H. T., & Chia, T. K. (2004). Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In Senseval-3: Third international workshop on the evaluation of systems for the semantic analysis of text (pp. 137–140).

  • Liu, X., Yu, Y., Guo, C., Sun, Y., & Gao, L. (2014). Full-text based context-rich heterogeneous network mining approach for citation recommendation. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 361–370). IEEE Press.

  • Machlup, F. (1983). Semantic quirks in studies of information. In F. Machlup & U. Mansfield (Eds.), The study of information: Interdisciplinary messages (pp. 641–671). New York: Wiley.

    Google Scholar 

  • Makki, R., & Homayounpour, M. (2008). Word sense disambiguation of Farsi homographs using thesaurus and corpus. In A. Ranta & B. Nordström (Eds.), Proceedings of 6th International Conference of the Advances in Natural Language Processing (pp. 315–323). New York: Springer.

    Google Scholar 

  • Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Nakov, P. I., Schwartz, A. S., & Hearst, M. (2004). Citances: Citation sentences for semantic analysis of bioscience text. In Proceedings of the SIGIR (Vol. 4, pp. 81–88).

  • Nanba, H., & Okumura, M. (1999). Towards multi-paper summarization using reference information. In IJCAI (Vol. 99, pp. 926–931).

  • Nicolaisen, J. (2007). Citation analysis. Annual Review of Information Science and Technology, 41(1), 609–641.

    Article  Google Scholar 

  • Palmer, F. R. (1981). A new outline (2nd ed). New York: Cambridge University press.

  • Pao, M. L. (1989). Concepts of information retrieval. Englewood: Libraries Unlimited.

  • Qazvinian, V., & Radev, D. R. (2010). Identifying non-explicit citing sentences for citation-based summarization. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 555–564). Association for Computational Linguistics.

  • Rezapour, A. R., Fakhrahmad, S. M., & Sadreddini, M. H. (2011). Applying weighted KNN to word sense disambiguation. In Proceedings of the world congress on engineering (Vol. 3, pp. 6–8).

  • Riahi, N., & Sedghi, F. (2012). A semi-supervised method for persian homograph disambiguation. In 2012 20th Iranian conference on electrical engineering (ICEE) (pp. 748–751). IEEE.

  • Searle, J. R. (1984). Intentionality and its place in nature. Synthese, 61(1), 3–16.

    Article  Google Scholar 

  • Sekkingstad, A. (2016). Word sense disambiguation in webpages. Developing a program capable to disambiguate words with a website text as context. Master’s thesis, The University of Bergen.

  • Shan, S. M., Cui, Y., & He, Y. H. (2016). Homonyms discovery in folksonomy based on user community analysis. Journal of Electronic Science and Technology, 14(3), 275–280.

    Google Scholar 

  • Shin, J. C., & Ock, C. Y. (2016). Improvement of Korean homograph disambiguation using Korean lexical semantic network (UWordMap). Journal of KIISE, 43(1), 71–79.

    Article  Google Scholar 

  • Small, H. (2011). Interpreting maps of science using citation context sentiments: A preliminary investigation. Scientometrics, 87(2), 373–388.

    Article  Google Scholar 

  • Smith, L. C. (1981). Citation analysis. Library Dends, 30, 83–106.

    Google Scholar 

  • Soergel, D., Lauser, B., Liang, A., Fisseha, F., Keizer, J., & Katz, S. (2006). Reengineering thesauri for new applications: The AGROVOC example. Journal of digital information. https://journals.tdl.org/jodi/index.php/jodi/article/view/112/111.

  • Soudani, N., Bounhas, I., & Slimani, Y. (2016). Semantic information retrieval: A comparative experimental study of NLP tools and language resources for Arabic. In 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI) (pp. 879–887). IEEE.

  • Tang, J., Fong, A. C., Wang, B., & Zhang, J. (2012). A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering, 24(6), 975–987.

    Article  Google Scholar 

  • Tesprasit, V., Charoenpornsawat, P., & Sornlertlamvanich, V. (2003). A context-sensitive homograph disambiguation in Thai text-to-speech synthesis. In Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on Human Language Technology: Companion volume of the proceedings of HLT-NAACL 2003short papers (Vol. 2, pp. 103–105). Association for Computational Linguistics.

  • Tong, T., Dinakarpandian, D., & Lee, Y. (2009). Literature clustering using citation semantics. In 42nd Hawaii international conference on system sciences, 2009. HICSS’09. (pp. 1–10). IEEE.

  • Tsai, C. T., Kundu, G., & Roth, D. (2013). Concept-based analysis of scientific literature. In Proceedings of the 22nd ACM international conference on Conference on information & knowledge management (pp. 1733–1738). ACM.

  • Twilley, L. C., Dixon, P., Taylor, D., & Clark, K. (1994). University of Alberta norms of relative meaning frequency for 566 homographs. Memory & Cognition, 22(1), 111–126.

    Article  Google Scholar 

  • Weizenbaum, J. (1976). Computer power and human reason. San Francisco: W. H. Freeman.

    Google Scholar 

  • Wolfram, D. (2015). The symbiotic relationship between information retrieval and informetrics. Scientometrics, 102(3), 2201–2214.

    Article  Google Scholar 

  • Zeng, H. J., He, Q. C., Chen, Z., Ma, W. Y., & Ma, J. (2004). Learning to cluster web search results. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 210–217). ACM.

Download references

Acknowledgements

The authors would like to cordially thank Professor M. Fakhr Ahmad (at Shiraz University) for all his valuable advice and help.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hajar Sotudeh.

Appendix: List of the homographs identified by Twilley et al. (1994)

Appendix: List of the homographs identified by Twilley et al. (1994)

Homograph

Homograph

Homograph

Homograph

Homograph

Homograph

ACE

BRIDGE

COUNT

DUCK

GAG

INTEREST

ACT

BROKE

COUNTER

DUMP

GAME

INTIMATE

ADMIT

BRUSH

COURSE

EAR

GAS

INVALID

ADVANCE

BULB

COURT

ENTRANCE

GEAR

IRON

AFFAIR

CABINET

COVERED

EXCISE

GERM

ISSUE

AIR

CABLE

CRAB

EXPRESS

GIN

JACK

ANGLE

CALF

CRAFT

FAIR

GLARE

JAM

ANNUAL

CALL

CRANE

FALL

GLASS

JAR

BALL

CAN

CRANK

FAN

GRACE

JERK

ARM

CANE

CREST

FANCY

GRADE

JET

ARTICLE

CAP

CRICKET

FARE

GRAFT

JOINT

BAND

CAPE

CROOK

FAST

GRAIN

JUICE

BANK

CAPITAL

CRUST

FAULT

GRASS

JUNK

BAR

CARD

CUE

FAWN

GRATE

KERNEL

BARK

CARP

CUFF

FELT

GRAVE

KEY

BASE

CASE

CURB

FENCE

GREEN

KICK

BASS

CAST

CYCLE

FAELD

GRILL

KID

BAT

CELL

DART

FAGURE

GRIND

KIND

BATTERY

CHAIN

DASH

FALE

GROSS

LACE

BAY

CHANCE

DATE

FALM

GROUND

LAND

BEAD

CHANGE

DECK

FANE

GUY

LAP

BEAM

CHARGE

DEED

FANISH

HABIT

LASH

BEAR

CHARM

DEEP

Fire

HAIL

LEAD

BEEF

CHECK

DEPOSIT

FIRM

HAM

LEAF

BEING

CHEST

DESERT

FIT

HAMPER

LEAN

BELT

CHEW

DIAMOND

FIX

HAND

LEFT

BEND

CHINA

DIE

FLAT

HANG

LETTER

BILL

CHIP

DIGEST

FLEET

HARD

LIE

BIT

CHOP

DIGIT

FLIGHT

HARP

LIGHT

BITTER

CHUCK

DIP

FLING

HATCH

LIKE

BLOCK

CLIP

DIRT

FLOAT

HAUNT

LIME

BLOW

CLOG

DIVE

FLUSH

HEAD

LIMP

BLUE

CLUB

DOUGH

FLY

HEAT

LINE

BLUFF

COAST

DOVE

FOIL

HEEL

LIP

BLUNT

COAT

DOWN

FOLD

HEM

LIST

BOARD

COLD

DRAFT

FOOT

HIDE

LITTER

BOIL

COMB

DRAG

FORCE

HOLD

LOAF

BOLT

COMPACT

DRAW

FORM

HOOD

LOBBY

BOND

COMPANY

DRESS

FOUL

HOP

LOCK

BOOM

COMPOUND

DRILL

FRAME

HORN

LOG

BOOT

CONSOLE

DRIP

FRAY

HOST

LOT

BOUND

CONTACT

DRIVE

FREE

HOUND

LOUNGE

BOW

CONTRACT

DROP

FRESH

HULL

LOW

BOWL

Copy

DROVE

FRISK

HUSKY

MAD

BOX

CORD

DRUM

FRONT

INCENSE

MAJOR

BREAK

CORN

DRY

FUSE

INCLINE

MARBLE

MARCH

PERCH

RARE

SECOND

STAKE

TENDER

MARK

PERFECT

RASH

SENSE

STALK

TERM

MAROON

PERJOD

REAR

SENTENCE

STALL

TERMINAL

MASS

PERMIT

RECORD

SET

STAMP

TERMS

MATCH

PET

REEL

SHARE

STAND

THROW

MEAL

PICK

REFLECT

SHARP

STAPLE

TICK

MEAN

PICKET

REFRAIN

SHED

STAR

TIE

MESS

PILE

REFUSE

SHELL

STATE

TILL

MIGHT

PINCH

REGISTER

SHIFT

STATIC

TIP

MIND

PIPE

RELISH

SHIP

STEEP

TIRE

MINE

PIT

RENT

SHOOT

STEER

TOAST

MINT

PITCH

RESERVATION

SHOT

STERN

TOLL

MINUTE

PITCHER

RESERVE

SHOWER

STEW

TOOL

MISS

PLAIN

RESORT

SHUTTLE

STICK

TOP

MODEL

PLANE

REST

SIDE

STILL

TRAC

MOLD

PLANT

RIB

SIGN

STING

TRACK

MOLE

PLAY

RICH

SINK

STIR

TRADE

MOTION

PLOT

RIDDLE

SKIRT

STITCH

TRAIN

MUG

POACH

RIGHT

SLAB

STOCK

TREAT

NAG

POINT

RING

SLIDE

STOLE

TRIAL

NAIL

POKER

ROAD

SLING

STORE

TRIM

NAP

POLE

ROCK

SLIP

STORY

TRIP

NET

POOL

ROLL

SLUG

STRAIN

TRUNK

NOTE

PORT

ROOM

SMACK

STRAND

TRUST

NOVEL

POST

ROOT

SMART

STRAW

TRY

NUT

POT

ROSE

SMELT

STRAY

TURN

OBJECT

POUND

ROUND

SNAP

STRESS

TYPE

ODD

PRESENT

ROW

SOCK

STRIKE

UPSET

OPERATION

PRESS

RUBBER

SOLE

STRIP

UPSET

ORDER

PRIME

RULER

SORE

STROKE

VAULT

ORGAN

PRODUCE

RUNG

SOUND

STUD

VENT

PACK

PROJECT

RUNNER

SOW

SUBJECT

VESSEL

PAD

PROOF

SACK

SPADE

SUIT

VICE

PAGE

PRUNE

SAGE

SPARE

SWALLOW

VOLUME

PALM

PUMP

SAP

SPEAKER

SWAMP

WAKE

PANEL

PUNCH

SASH

SPEED

SWEAR

WALKER

PARK

PUPIL

SAW

SPELL

SWITCH

WASH

PART

QUACK

SCALE

SPOT

TAB

WASTE

PARTY

QUEEN

SCALLOP

SPRAY

TACK

WATCH

PASS

QUIVER

SCHOOL

SPREAD

TAG

WAVE

PASSAGE

RACE

SCOOP

SPRING

TAP

WAX

PAT

RACKET

SCRAP

SQUARE

TAPER

WEAR

PATIENT

RAKE

SCRATCH

SQUASH

TART

WELL

PAWN

RAM

SCREEN

STABLE

TAX

WILL

PEER

RANGE

SCRUB

STAFF

TEAR

WIND

PELT

RANK

SEAL

STAG

TEMPLE

WING

PEN

RAP

SEASON

STAGE

TEND

WORK

WOUND

YARD

YARN

YELLOW

YIELD

YOKE

ZEST

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sotudeh, H., Houshyar, M. Comparing discrimination powers of text and citation-based context types. Scientometrics 114, 229–251 (2018). https://doi.org/10.1007/s11192-017-2566-9

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-017-2566-9

Keywords

Navigation