Abstract
The decision of reading or not a research paper is commonly made while reading its title and abstract. Although content and merit should lead to that decision, other factors such as writing style may intervene. Eventually, more readings could produce more citations. We investigated the stylistic factors in the title and abstract of research papers that affect their “citability”, and built a prediction model for citations at 5, 10, and 15 years. Since the number of citations is the preferred ranking function of several academic search engines, our “citability” function could alleviate the under-representation of recent not-yet-cited papers in query results. For this study, we collected a large dataset of around 750,000 titles and abstracts from articles in Scopus, intended to be representative of the entire science. For each instance, we extracted a relatively large set of 3578 stylistic features that were extracted at different linguistic levels, i.e. characters, syllables, tokens (i.e. words), sentences, stop/content words, and part-of-speech (POS) tags. Particularly, we present a novel set of corpus-based stylistic features that we called Corpus Spectral Signatures (CSS). We found out that a linear prediction model for citations (binned into quartiles) build with only the top-250 correlated features achieved a mean absolute error of 0.805 quartiles, and that on average, predictions were highly correlated with their real values (Spearman’s \(rho=0.515\)). CSS features were among the top correlated features, but POS features were the most predictive group of features in an ablation study.











Similar content being viewed by others

Notes
Scopus also provides an API, but its weekly quota limit is restrictive for our purposes.
Queries in Scopus may also be restricted by ‘Date of Publication’, but a one-day filter is too limited for our purposes.
The only exception was “artificial intelligence” because, unlike other single-word keywords, the individual words on this bigram do not fairly represent the category.
The number of hits per domain were collected in January 2020.
See several examples in Abdel-Rahman et al. (2017).
See https://en.wikipedia.org/wiki/N-gram for a definition an examples of n-grams.
Lease and Charniak (2005) observed in a corpus of titles and abstracts from articles in the biomedical domain that 71% of the titles are noun phrases.
References
Abdel-Rahman, F., Okeremgbo, B., Alhamadah, F., Jamadar, S., Anthony, K., & Saleh, M. A. (2017). Caenorhabditis elegans as a model to study the impact of exposure to light emitting diode (led) domestic lighting. Journal of Environmental Science and Health, Part A, 52(5), 433–439.
Agirre, E., Cer, D., Diab, M., & Gonzalez-Agirre, A. (2012). SemEval-2012 task 6: A pilot on semantic textual similarity. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Association for Computational Linguistics, Montréal, Canada, (pp. 385–393), https://www.aclweb.org/anthology/S12-1051
Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.
Bornmann, L., & Leydesdorff, L. (2017). Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on web of science data. Journal of Informetrics, 11(1), 164–175.
Brzezinski, M. (2015). Power laws in citation distributions: Evidence from scopus. Scientometrics, 103(1), 213–228.
Crossley, S. A., Skalicky, S., Dascalu, M., McNamara, D. S., & Kyle, K. (2017). Predicting text comprehension, processing, and familiarity in adult readers: New approaches to readability formulas. Discourse Processes, 54(5–6), 340–359.
De-Arteaga, M., Jimenez, S., Dueñas, G., Mancera, S., & Baquero, J. (2013). Author profiling using corpus statistics, lexicons and stylistic features-notebook for PAN at CLEF-2013. In P. Forner, R. Navigli, & D. Tufis (Eds.), CLEF 2013 evaluation labs and workshop–working notes papers, 23–26 September, Valencia. Spain: CEUR-WS.org.
Didegah, F., & Thelwall, M. (2013). Which factors help authors produce the highest impact research? Collaboration, journal and document properties. Journal of Informetrics, 7(4), 861–873.
Didegah, F., & Thelwall, M. (2014). Article properties associating with the citation impact of individual articles in the social sciences. In E. Noyons (Ed.), Proceedings of the science and technology indicators conference 2014 Leiden “Context Counts: Pathways to Master Big and Little Data”, Universiteit Leiden, (pp. 169–175).
Dong, Y., Johnson, R .A., & Chawla, N. V. (2015). Will this paper increase your h-index?: Scientific impact prediction. In Proceedings of the eighth ACM international conference on web search and data mining–WSDM ’15 (pp. 149–158). ACM Press
Falahati Qadimi Fumani, M. R., Goltaji, M., & Parto, P. (2015). The impact of title length and punctuation marks on article citations. Annals of Library and Information Studies, 62(3), 126–132.
Fawcett, T. W., & Higginson, A. D. (2012). Heavy use of equations impedes communication among biologists. Proceedings of the National Academy of Sciences, 109(29), 11735–11739.
Garfield, E. (1965). Can citation indexing be automated? In M. E. Stevens, & V. E. Giuliano, L. B. Heilprin (Eds.), Statistical association methods for mechanized documentation (Vol. 269, pp. 189–192). National Bureau of Standards Miscellaneous Publication
Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA, 295(1), 90–93.
Gnewuch, M., & Wohlrabe, K. (2017). Title characteristics and citations in economics. Scientometrics, 110(3), 1573–1578.
Golosovsky, M. (2017). Power-law citation distributions are not scale-free. Physical Review E, 96(3), 032306.
Gruber, M. (2017). Improving efficiency by shrinkage: The James–Stein and ridge regression estimators. Routledge.
Guo, F., Ma, C., Shi, Q., & Zong, Q. (2018). Succinct effect or informative effect: The relationship between title length and the number of citations. Scientometrics, 116(3), 1531–1539.
Habibzadeh, F., & Yadollahie, M. (2010). Are shorter article titles more attractive for citations? Crosssectional study of 22 scientific journals. Croatian Medical Journal, 51(2), 165–170.
Hartley, J. (2007). Planning that title: Practices and preferences for titles with colons in academic articles. Library & Information Science Research, 29(4), 553–568.
Holmes, D. I. (1998). The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing, 13(3), 111–117.
Jacques, T. S., & Sebire, N. J. (2010). The impact of article titles on citation hits: An analysis of general and specialist medical journals. JRSM Short Reports, 1(1), 1–5.
Jamali, H. R., & Nikzad, M. (2011). Article title type and its relation with the number of downloads and citations. Scientometrics, 88(2), 653–661.
Jimenez, S., Becerra, C., & Gelbukh, A. (2012) Soft cardinality: A parameterized similarity function for text comparison. In Proceedings of the sixth international workshop on semantic evaluation, association for computational linguistics, (pp. 449–453).
Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Tech. Rep. 56, University of Central Florida.
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. Tesol Quarterly, 49(4), 757–786.
Lease, M., & Charniak, E. (2005). Parsing biomedical literature. In International conference on natural language processing, pp. 58–69). Springer(
Lee, D. H. (2019). Predictive power of conference-related factors on citation rates of conference papers. Scientometrics, 118(1), 281–304.
Liang, F. M. (1983). Word hy-phen-a-tion by com-put-er. Tech. rep., Calif. Univ. Stanford. Comput. Sci. Dept.
Lin, M., Lucas, H. C, Jr., & Shmueli, G. (2013). Research commentar–too big to fail: large samples and the p-value problem. Information Systems Research, 24(4), 906–917.
Lokker, C., McKibbon, A., McKinlay, J., Wilczynski, N., & Haynes, B. (2008). Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: Retrospective cohort study. BMJ, 336(7645), 655–657.
Moed, H. F. (2005). Citation analysis in research evaluation. New York: Springer.
Nair, L. B., & Gibbert, M. (2016). What makes a ‘good’ title and (how) does it matter for citations? A review and general model of article title attributes in management science. Scientometrics, 107(3), 1331–1359.
Paiva, C. E., Lima, J., Pd S. N., & Paiva, B. S. R. (2012). Articles with short titles describing the results are cited more often. Clinics, 67(5), 509–513.
Price, D. J. D. S. (1965). Networks of scientific papers. Science, pp. 510–515.
Rennie, J., & Srebro, N. (2005). Loss functions for preference levels: Regression with discrete ordered labels. In Proceedings of the IJCAI multidisciplinary workshop on advances in preference handling, Kluwer Norwell, MA, Vol. 1.
Rostami, F., Mohammadpoorasl, A., & Hajizadeh, M. (2014). The effect of characteristics of title on citation rates of articles. Scientometrics, 98(3), 2007–2010.
Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43(9), 628–638.
Severance, S. J., & Cohen, K. B. (2015). Measuring the readability of medical research journal abstracts. Proceedings of BioNLP, 15, 127–133.
Smith, L. C. (1981). Citation analysis. Library Trends, 30(1), 83–106.
Sohrabi, B., & Iraj, H. (2017). The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts. Scientometrics, 110(1), 243–251.
Tahamtan, I., Afshar, A., & Ahamdzadeh, K. (2016). Factors affecting number of citations: A comprehensive review of the literature. Scientometrics, 107(3), 1195–1225.
Tang, L. (2013). Does “birds of a feather flock together” matter-Evidence from a longitudinal study on US-China scientific collaboration. Journal of Informetrics, 7(2), 330–344.
Thelwall, M., & Wilson, P. (2014). Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8(4), 963–971.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Van Wesel, M., Wyatt, S., & ten Haaf, J. (2014). What a difference a colon makes: How superficial factors influence subsequent citation. Scientometrics, 98(3), 1601–1615.
Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of tf* idf, lsi and multi-words for text classification. Expert Systems with Applications, 38(3), 2758–2765.
Acknowledgements
This work was partially supported by the CONACYT, Mexico, under Grant A1-S-47854, and by the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico, under Grants 20200859, 20200797, and 20201948. We would like to express our gratitude to the anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Appendices
A Search keywords for data extraction from Scopus
abnormal | crystal | finance | mental | quality |
accounting | cultural | fish | metabolism | radiation |
acoustics | cure | flow | metal | recycling |
aerodynamics | customer | fluid | method | religion |
aerospace | data | food | microbiology | renewable |
aging | database | forensic | microelectronics | respiratory |
agricultural | debates | forestry | microwaves | risk |
algebra | decision | fuel | mining | robot |
algorithm | demand | gender | molecular | safety |
alloys | dementia | gene | molecules | sensory |
analysis | demography | genetics | monetary | signal |
anatomy | dental | geology | music | simulation |
animal | dentistry | geometry | nanotechnology | sleep |
anthropology | dermatology | globalization | nature | social |
antibiotics | develop | graphics | network | software |
applications | development | hardware | neurology | soil |
archeology | diabetes | health | neuron | space |
architecture | diagnosed | heart | neuroscience | spectroscopy |
artificial intelligence | diagnosis | hematology | nonlinear | speech |
arts | diet | histology | nuclear | sports |
asteroid | disease | history | nursing | statistics |
astronomy | disorder | human | nutrition | strategy |
atmospheric | DNA | hypothesis | ocean | structure |
atomic | drugs | immunology | optical | sun |
automotive | dynamics | industrial | optimization | supply |
biochemistry | earth | infectious | organic | surface |
bioinformatics | ecology | inference | orthodontics | surgery |
biology | econometrics | innovation | oxides | symptoms |
biophysics | economics | inorganic | pandemic | system |
bird | economy | insect | parasitology | taxes |
brain | ecosystem | instrumentation | particle | testing |
building | education | interface | pathology | theoretical |
business | electrical | labor | pediatrics | therapy |
cancer | electronics | language | pharmacology | thermodynamics |
carbon | element | law | pharmacy | topology |
cardiology | endangered | learning | philosophy | tourism |
cardiovascular | endocrine | legislation | physics | toxicology |
catalysis | endocrinology | library | physiology | treatment |
cells | energy | life | planet | tree |
ceramics | engineering | linear | planetary | tropical |
chemical | epidemiology | linguistics | plastics | uncertaintly |
chemistry | equilibrium | literature | policy | urban |
civil | equation | logic | political | urology |
classics | equine | mammal | pollution | user |
climate | ergonomics | management | polymers | vaccine |
clinical | estimation | manufacturing | pricing | veterinary |
cloud | ethics | market | privacy | virology |
combustion | evidence | marketing | probability | virus |
communication | evolution | material | problem | vision |
companies | evolutionary | matter | process | visual |
complexity | exchange | measure | profit | waste |
computation | experimental | media | property | water |
conservation | exploration | medical | psychiatry | weapon |
control | fever | medicine | psychology | |
criptography | films | memory | pulmonary |
SJR journal categories
List of the Scimago Journal & Country Rank (SJR) subject categories taken from https://www.scimagojr.com/journalrank.php. Words in capital letters are occurrences of the list of keywords in Appendix 6. Words between square brackets are additional search keywords semantically related to the subject category selected by one of the authors (a professional linguist).
-
ACCOUNTING [TAXES]
-
ACOUSTICS and Ultrasonics
-
Advanced and Specialized NURSING
-
AEROSPACE ENGINEERING [AERODYNAMICS]
-
AGING
-
AGRICULTURAL and Biological Sciences
-
Agronomy and Crop Science
-
ALGEBRA and Number Theory
-
ANALYSIS
-
Analytical CHEMISTRY
-
ANATOMY
-
Anesthesiology and Pain MEDICINE
-
ANIMAL Science and Zoology [BIRD, FISH, MAMMAL]
-
ANTHROPOLOGY
-
Applied Mathematics
-
Applied MICROBIOLOGY and Biotechnology
-
Applied PSYCHOLOGY
-
Aquatic Science
-
ARCHEOLOGY
-
ARCHEOLOGY (arts and humanities)
-
ARCHITECTURE
-
ARTIFICIAL INTELLIGENCE
-
ARTS and Humanities
-
Assessment and DIAGNOSIS [SYMPTOMS, FEVER, MEASURE]
-
ASTRONOMY and Astrophysics [ASTEROID, PLANET, SUN]
-
ATMOSPHERIC Science
-
ATOMIC and MOLECULAR Physics, and Optics [PARTICLE]
-
AUTOMOTIVE ENGINEERING
-
Behavioral NEUROSCIENCE
-
BIOCHEMISTRY [CARBON]
-
BIOCHEMISTRY, GENETICS and MOLECULAR BIOLOGY [EXCHANGE]
-
BIOCHEMISTRY (medical)
-
Bioengineering [BIOINFORMATICS]
-
Biological PSYCHIATRY
-
Biomaterials
-
Biomedical ENGINEERING
-
BIOPHYSICS
-
Biotechnology
-
BUILDING and Construction
-
BUSINESS and International MANAGEMENT [GLOBALIZATION]
-
BUSINESS, MANAGEMENT and ACCOUNTING [PROFIT]
-
CANCER Research [TREATMENT]
-
CARDIOLOGY and CARDIOVASCULAR MEDICINE [HEART]
-
Care Planning
-
CATALYSIS
-
CELLs BIOLOGY
-
Cellular and MOLECULAR NEUROSCIENCE
-
CERAMICS and Composites
-
CHEMICAL ENGINEERING
-
CHEMICAL HEALTH and SAFETY
-
CHEMISTRY [OXIDES, STRUCTURE]
-
Chiropractics
-
CIVIL and Structural ENGINEERING
-
CLASSICS
-
CLINICAL BIOCHEMISTRY
-
CLINICAL PSYCHOLOGY
-
Cognitive NEUROSCIENCE [DEMENTIA]
-
Colloid and SURFACE CHEMISTRY
-
COMMUNICATION [MICROWAVES]
-
Community and Home Care
-
Complementary and Alternative MEDICINE
-
Complementary and Manual THERAPY
-
COMPUTATIONal Mathematics
-
COMPUTATIONal Mechanics
-
COMPUTATIONal Theory and Mathematics [COMPLEXITY, CRIPTOGRAPHY]
-
Computer GRAPHICS and Computer-Aided Design
-
Computer NETWORKs and Communications
-
Computer Science APPLICATIONS [CLOUD, PRIVACY]
-
Computer Science [ALGORITHM]
-
Computers in EARTH Sciences
-
Computer VISION and Pattern Recognition
-
Condensed MATTER PHYSICS
-
CONSERVATION [ENDANGERED]
-
CONTROL and OPTIMIZATION
-
CONTROL and SYSTEMs ENGINEERING
-
Critical Care and Intensive Care MEDICINE
-
Critical Care NURSING
-
CULTURAL Studies [RELIGION]
-
DECISION Sciences
-
DEMOGRAPHY
-
DENTAL Assisting
-
DENTAL Hygiene
-
DENTISTRY
-
DERMATOLOGY
-
DEVELOPMENT
-
Developmental and Educational PSYCHOLOGY
-
Developmental BIOLOGY
-
Developmental NEUROSCIENCE
-
Discrete Mathematics and Combinatorics
-
DRUG(s) Discovery
-
DRUG(s) Guides
-
EARTH and PLANETARY Sciences [CLIMATE, EXPLORATION]
-
Earth-Surface PROCESSes
-
Ecological Modeling
-
ECOLOGY [ECOSYSTEM]
-
ECOLOGY, EVOLUTION, Behavior and Systematics [EVOLUTIONARY]
-
Economic GEOLOGY
-
ECONOMICS and ECONOMETRICS [ECONOMY, MONETARY]
-
ECONOMICS, ECONOMETRICS and FINANCE
-
EDUCATION [LEARNING]
-
E-learning
-
ELECTRICAL and ELECTRONIC(s) ENGINEERING
-
Electrochemistry
-
Electronic, OPTICAL and Magnetic MATERIALs
-
Embryology
-
Emergency MEDICAL Services
-
Emergency MEDICINE [DEVELOP]
-
Emergency NURSING
-
ENDOCRINE and Autonomic SYSTEMs
-
ENDOCRINOLOGY
-
ENDOCRINOLOGY, DIABETES and METABOLISM
-
ENERGY ENGINEERING and Power Technology [SUPPLY]
-
ENERGY [COMBUSTION, THERMODYNAMICS]
-
ENGINEERING [PROBLEM, DYNAMICS, ELEMENT]
-
Environmental CHEMISTRY
-
Environmental ENGINEERING
-
Environmental Science
-
EPIDEMIOLOGY [PANDEMIC]
-
EQUINE
-
EXPERIMENTAL and Cognitive PSYCHOLOGY [MEMORY]
-
Family Practice [PRIVACY]
-
Filtration and Separation
-
FINANCE
-
FLUID FLOW and Transfer PROCESSes
-
FOOD Animals
-
FOOD Science
-
FORESTRY
-
FUEL Technology
-
Fundamentals and Skills
-
Gastroenterology [DIET]
-
GENDER Studies
-
GENETICS [GENE, ADN]
-
GENETICS (clinical)
-
Geochemistry and Petrology
-
Geography, Planning and DEVELOPMENT
-
GEOLOGY [MINING]
-
GEOMETRY and TOPOLOGY
-
Geophysics
-
Geotechnical ENGINEERING and Engineering GEOLOGY
-
Geriatrics and Gerontology
-
Gerontology
-
Global and PLANETARY Change
-
HARDWARE and ARCHITECTURE [MICROELECTRONICS]
-
HEALTH Informatics
-
HEALTH Information MANAGEMENT
-
HEALTH POLICY
-
HEALTH Professions
-
HEALTH (social science)
-
HEALTH, TOXICOLOGY and Mutagenesis
-
HEMATOLOGY
-
Hepatology
-
HISTOLOGY
-
HISTORY
-
HISTORY and PHILOSOPHY of Science
-
Horticulture
-
HUMAN-Computer Interaction [USER]
-
HUMAN Factors and ERGONOMICS
-
IMMUNOLOGY
-
IMMUNOLOGY and Allergy
-
IMMUNOLOGY and MICROBIOLOGY [VACCINE, VIRUS]
-
INDUSTRIAL and MANUFACTURING ENGINEERING
-
INDUSTRIAL Relations
-
INFECTIOUS DISEASEs
-
Information SYSTEMs [DATABASE, DATA]
-
Information SYSTEMs and MANAGEMENT
-
INORGANIC CHEMISTRY
-
INSECT Science
-
INSTRUMENTATION
-
Internal MEDICINE [DIAGNOSED]
-
Issues, ETHICS and Legal Aspects
-
LANGUAGE and LINGUISTICS
-
LAW
-
Leadership and MANAGEMENT
-
LIBRARY and Information Sciences
-
LIFE-span and LIFE-course Studies
-
LINGUISTICS and LANGUAGE
-
LITERATURE and Literary Theory
-
LOGIC
-
MANAGEMENT Information SYSTEMs
-
MANAGEMENT, Monitoring, POLICY and LAW [LEGISLATION]
-
MANAGEMENT of Technology and INNOVATION
-
MANAGEMENT Science and Operations Research [COMPANIES]
-
MARKETING [MARKET, PRICING, CUSTOMER, DEMAND]
-
MATERIALs CHEMISTRY [CRYSTAL]
-
MATERIALs Science [PROPERTY]
-
Maternity and Midwifery
-
Mathematical PHYSICS
-
Mathematics [EQUATION]
-
Mechanical ENGINEERING [ROBOT]
-
Mechanics of MATERIALs [TESTING]
-
MEDIA Technology
-
MEDICAL and Surgical NURSING
-
MEDICAL Assisting and Transcription
-
MEDICAL Laboratory Technology
-
MEDICAL Terminology
-
MEDICINE [ABNORMAL, METHOD, HYPOTHESIS]
-
METALs and ALLOYS
-
MICROBIOLOGY
-
MICROBIOLOGY (medical)
-
Modeling and SIMULATION
-
MOLECULAR BIOLOGY
-
MOLECULAR MEDICINE [MOLECULES]
-
Multidisciplinary
-
Museology
-
MUSIC
-
Nanoscience and NANOTECHNOLOGY
-
NATURE and Landscape CONSERVATION
-
Nephrology
-
NEUROLOGY [BRAIN, NEURON]
-
NEUROLOGY (clinical)
-
Neuropsychology and Physiological PSYCHOLOGY
-
NEUROSCIENCE [SLEEP]
-
NUCLEAR and High ENERGY PHYSICS
-
NUCLEAR ENERGY and ENGINEERING
-
Numerical ANALYSIS
-
Nurse Assisting
-
NURSING
-
NUTRITION and Dietetics
-
Obstetrics and Gynecology
-
Occupational THERAPY
-
OCEAN ENGINEERING
-
Oceanography
-
Oncology
-
Oncology (nursing)
-
Ophthalmology
-
Optometry
-
Oral SURGERY
-
ORGANIC CHEMISTRY [CARBON]
-
Organizational Behavior and HUMAN Resource MANAGEMENT [LABOR]
-
ORTHODONTICS
-
Orthopedics and SPORTS MEDICINE
-
Otorhinolaryngology
-
Paleontology
-
PARASITOLOGY
-
PATHOLOGY and FORENSIC MEDICINE
-
PEDIATRICS
-
PEDIATRICS, Perinatology and Child HEALTH
-
Periodontics
-
Pharmaceutical Science [ANTIBIOTICS]
-
PHARMACOLOGY
-
PHARMACOLOGY (medical)
-
PHARMACOLOGY (nursing)
-
PHARMACOLOGY, TOXICOLOGY and Pharmaceutics
-
PHARMACY
-
PHILOSOPHY
-
Physical and THEORETICAL CHEMISTRY
-
Physical THERAPY, SPORTS THERAPY and Rehabilitation [CURE]
-
PHYSICS and ASTRONOMY [EQUILIBRIUM]
-
PHYSIOLOGY
-
PHYSIOLOGY (medical)
-
Plant Science [TREE, TROPICAL]
-
Podiatry
-
POLITICAL Science and International Relations
-
POLLUTION [RECYCLING]
-
POLYMERS and PLASTICS
-
PROCESS CHEMISTRY and Technology
-
Psychiatric MENTAL HEALTH
-
PSYCHIATRY and MENTAL HEALTH [DISORDER]
-
PSYCHOLOGY
-
Public Administration
-
Public Health, Environmental and Occupational Health
-
PULMONARY and RESPIRATORY MEDICINE
-
RADIATION
-
Radiological and Ultrasound Technology
-
Radiology, NUCLEAR MEDICINE and Imaging
-
Rehabilitation
-
Religious Studies
-
RENEWABLE Energy, Sustainability and the Environment
-
Reproductive MEDICINE
-
Research and Theory
-
RESPIRATORY Care
-
Review and Exam Preparation
-
Reviews and References (medical)
-
Rheumatology
-
SAFETY Research
-
SAFETY, RISK, Reliability and QUALITY
-
SENSORY SYSTEMs
-
SIGNAL Processing
-
Small Animals
-
SOCIAL PSYCHOLOGY
-
SOCIAL Sciences [DEBATES, WEAPON]
-
SOCIAL Work [LABOR]
-
Sociology and POLITICAL Science
-
SOFTWARE
-
SOIL Science
-
SPACE and PLANETARY Science
-
SPECTROSCOPY
-
SPEECH and Hearing
-
SPORTS Science
-
Statistical and NONLINEAR PHYSICS [LINEAR]
-
STATISTICS and PROBABILITY [INFERENCE]
-
STATISTICS, PROBABILITY and UNCERTAINTLY [EVIDENCE, ESTIMATION]
-
STRATEGY and MANAGEMENT
-
Stratigraphy
-
Structural BIOLOGY
-
Surfaces and INTERFACEs
-
Surfaces, Coatings and FILMS
-
SURGERY
-
THEORETICAL Computer Science
-
TOURISM, Leisure and Hospitality MANAGEMENT
-
TOXICOLOGY
-
Transplantation
-
Transportation
-
URBAN Studies
-
UROLOGY
-
VETERINARY
-
VIROLOGY
-
VISUAL ARTS and Performing Arts
-
WASTE MANAGEMENT and Disposal
-
WATER Science and Technology
Science subject areas in the Scopus web search engine
-
1.
Agricultural and Biological Sciences
-
2.
Arts and Humanities
-
3.
Biochemistry, Genetics and Molecular Biology
-
4.
Business, Management and Accounting
-
5.
Chemistry and Chemical Engineering
-
6.
Computer Science
-
7.
Decision Sciences
-
8.
Dentistry
-
9.
Earth and Planetary Sciences
-
10.
Economics, Econometrics and Finance
-
11.
Energy
-
12.
Engineering
-
13.
Environmental Science
-
14.
Health Professions
-
15.
Immunology and Microbiology
-
16.
Materials Science
-
17.
Mathematics
-
18.
Medicine
-
19.
Multidisciplinary
-
20.
Neuroscience
-
21.
Nursing
-
22.
Pharmacology, Toxicology and Pharmaceutics
-
23.
Physics and Astronomy
-
24.
Psychology
-
25.
Social Sciences
-
26.
Undefined
-
27.
Veterinary
Part-of-speech tags from the Penn Treebank Project
POS | Description | Examples |
---|---|---|
CC | coordinating conjunction | and, but, or |
CD | cardinal digit | 1, one |
DT | determiner | the, a, an |
EX | existential | ‘there’ is |
FW | foreign word | chercheur (fr), muestra (es) |
IN | preposition/subordinating conjunction | in, on, before |
JJ | adjective | big |
JJR | adjective, comparative | bigger |
JJS | adjective, superlative | biggest |
LS | list marker | 1) |
MD | modal | could, will |
NN | noun, singular | desk |
NNS | noun plural | desks |
NNP | proper noun, singular | Harrison |
NNPS | proper noun, plural | Americans |
PDT | predeterminer | ’all the kids’ |
PNC | punctuation mark | ‘.,;: ...’ |
POS | possessive ending | parent’s |
PRP | personal pronoun | I, he, she |
PRP$ | possessive pronoun | my, his, hers |
RB | adverb | very, silently |
RBR | adverb, comparative | better |
RBS | adverb, superlative | best |
RP | particle | give up |
TO | to | go ’to’ the store. |
UH | interjection | errrrrrrrm |
VB | verb, base form | take |
VBD | verb, past tense | took |
VBG | verb, gerund/present participle | taking |
VBN | verb, past participle | taken |
VBP | verb, sing. present, non-3d | take |
VBZ | verb, 3rd person sing. present | takes |
WDT | wh-determiner | which |
WP | wh-pronoun | who, what |
WP$ | possessive wh-pronoun | whose |
WRB | wh-abverb | where, when |
Stopword list from the Natural Language Toolkit (NLTK)
a | can | here | myself | shouldn | was |
about | couldn | hers | needn | so | wasn |
above | d | herself | no | some | we |
after | did | him | nor | such | were |
again | didn | himself | not | t | weren |
against | do | his | now | than | what |
ain | does | how | o | that | when |
all | doesn | i | of | the | where |
am | doing | if | off | their | which |
an | don | in | on | theirs | while |
and | down | into | once | them | who |
any | during | is | only | themselves | whom |
are | each | isn | or | then | why |
aren | few | it | other | there | will |
as | for | its | our | these | with |
at | from | itself | ours | they | won |
be | further | just | ourselves | this | wouldn |
because | had | ll | out | those | y |
been | hadn | m | over | through | you |
before | has | ma | own | to | your |
being | hasn | me | re | too | yours |
below | have | mightn | s | under | yourself |
between | haven | more | same | until | yourselves |
both | having | most | shan | up | |
but | he | mustn | she | ve | |
by | her | my | should | very |
Rights and permissions
About this article
Cite this article
Jimenez, S., Avila, Y., Dueñas, G. et al. Automatic prediction of citability of scientific articles by stylometry of their titles and abstracts. Scientometrics 125, 3187–3232 (2020). https://doi.org/10.1007/s11192-020-03526-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-020-03526-1