Skip to main content
Log in

Automatic prediction of citability of scientific articles by stylometry of their titles and abstracts

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

The decision of reading or not a research paper is commonly made while reading its title and abstract. Although content and merit should lead to that decision, other factors such as writing style may intervene. Eventually, more readings could produce more citations. We investigated the stylistic factors in the title and abstract of research papers that affect their “citability”, and built a prediction model for citations at 5, 10, and 15 years. Since the number of citations is the preferred ranking function of several academic search engines, our “citability” function could alleviate the under-representation of recent not-yet-cited papers in query results. For this study, we collected a large dataset of around 750,000 titles and abstracts from articles in Scopus, intended to be representative of the entire science. For each instance, we extracted a relatively large set of 3578 stylistic features that were extracted at different linguistic levels, i.e. characters, syllables, tokens (i.e. words), sentences, stop/content words, and part-of-speech (POS) tags. Particularly, we present a novel set of corpus-based stylistic features that we called Corpus Spectral Signatures (CSS). We found out that a linear prediction model for citations (binned into quartiles) build with only the top-250 correlated features achieved a mean absolute error of 0.805 quartiles, and that on average, predictions were highly correlated with their real values (Spearman’s \(rho=0.515\)). CSS features were among the top correlated features, but POS features were the most predictive group of features in an ablation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. https://www.scopus.com.

  2. Scopus also provides an API, but its weekly quota limit is restrictive for our purposes.

  3. Queries in Scopus may also be restricted by ‘Date of Publication’, but a one-day filter is too limited for our purposes.

  4. The only exception was “artificial intelligence” because, unlike other single-word keywords, the individual words on this bigram do not fairly represent the category.

  5. The number of hits per domain were collected in January 2020.

  6. https://www.nltk.org/_modules/nltk/tokenize.html.

  7. http://www.speech.cs.cmu.edu/cgi-bin/cmudict.

  8. https://nedbatchelder.com/code/modules/hyphenate.py.

  9. https://explosion.ai/blog/part-of-speech-pos-tagger-in-python.

  10. See several examples in Abdel-Rahman et al. (2017).

  11. See https://en.wikipedia.org/wiki/N-gram for a definition an examples of n-grams.

  12. https://doi.org/10.1524/zkri.218.11.725.20298.

  13. https://doi.org/10.1103/PhysRevD.68.042001.

  14. https://doi.org/10.1145/954339.954342.

  15. Lease and Charniak (2005) observed in a corpus of titles and abstracts from articles in the biomedical domain that 71% of the titles are noun phrases.

  16. https://scikit-learn.org.

  17. https://docs.scipy.org/doc/scipy/reference/stats.html.

  18. https://pythonhosted.org/mord/.

References

  • Abdel-Rahman, F., Okeremgbo, B., Alhamadah, F., Jamadar, S., Anthony, K., & Saleh, M. A. (2017). Caenorhabditis elegans as a model to study the impact of exposure to light emitting diode (led) domestic lighting. Journal of Environmental Science and Health, Part A, 52(5), 433–439.

    Article  Google Scholar 

  • Agirre, E., Cer, D., Diab, M., & Gonzalez-Agirre, A. (2012). SemEval-2012 task 6: A pilot on semantic textual similarity. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics–Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), Association for Computational Linguistics, Montréal, Canada, (pp. 385–393), https://www.aclweb.org/anthology/S12-1051

  • Barabási, A. L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.

    Article  MathSciNet  Google Scholar 

  • Bornmann, L., & Leydesdorff, L. (2017). Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on web of science data. Journal of Informetrics, 11(1), 164–175.

    Article  Google Scholar 

  • Brzezinski, M. (2015). Power laws in citation distributions: Evidence from scopus. Scientometrics, 103(1), 213–228.

    Article  Google Scholar 

  • Crossley, S. A., Skalicky, S., Dascalu, M., McNamara, D. S., & Kyle, K. (2017). Predicting text comprehension, processing, and familiarity in adult readers: New approaches to readability formulas. Discourse Processes, 54(5–6), 340–359.

    Article  Google Scholar 

  • De-Arteaga, M., Jimenez, S., Dueñas, G., Mancera, S., & Baquero, J. (2013). Author profiling using corpus statistics, lexicons and stylistic features-notebook for PAN at CLEF-2013. In P. Forner, R. Navigli, & D. Tufis (Eds.), CLEF 2013 evaluation labs and workshop–working notes papers, 23–26 September, Valencia. Spain: CEUR-WS.org.

  • Didegah, F., & Thelwall, M. (2013). Which factors help authors produce the highest impact research? Collaboration, journal and document properties. Journal of Informetrics, 7(4), 861–873.

    Article  Google Scholar 

  • Didegah, F., & Thelwall, M. (2014). Article properties associating with the citation impact of individual articles in the social sciences. In E. Noyons (Ed.), Proceedings of the science and technology indicators conference 2014 Leiden “Context Counts: Pathways to Master Big and Little Data”, Universiteit Leiden, (pp. 169–175).

  • Dong, Y., Johnson, R .A., & Chawla, N. V. (2015). Will this paper increase your h-index?: Scientific impact prediction. In Proceedings of the eighth ACM international conference on web search and data mining–WSDM ’15 (pp. 149–158). ACM Press

  • Falahati Qadimi Fumani, M. R., Goltaji, M., & Parto, P. (2015). The impact of title length and punctuation marks on article citations. Annals of Library and Information Studies, 62(3), 126–132.

    Google Scholar 

  • Fawcett, T. W., & Higginson, A. D. (2012). Heavy use of equations impedes communication among biologists. Proceedings of the National Academy of Sciences, 109(29), 11735–11739.

    Article  Google Scholar 

  • Garfield, E. (1965). Can citation indexing be automated? In M. E. Stevens, & V. E. Giuliano, L. B. Heilprin (Eds.), Statistical association methods for mechanized documentation (Vol. 269, pp. 189–192). National Bureau of Standards Miscellaneous Publication

  • Garfield, E. (2006). The history and meaning of the journal impact factor. JAMA, 295(1), 90–93.

    Article  Google Scholar 

  • Gnewuch, M., & Wohlrabe, K. (2017). Title characteristics and citations in economics. Scientometrics, 110(3), 1573–1578.

    Article  Google Scholar 

  • Golosovsky, M. (2017). Power-law citation distributions are not scale-free. Physical Review E, 96(3), 032306.

    Article  Google Scholar 

  • Gruber, M. (2017). Improving efficiency by shrinkage: The James–Stein and ridge regression estimators. Routledge.

  • Guo, F., Ma, C., Shi, Q., & Zong, Q. (2018). Succinct effect or informative effect: The relationship between title length and the number of citations. Scientometrics, 116(3), 1531–1539.

    Article  Google Scholar 

  • Habibzadeh, F., & Yadollahie, M. (2010). Are shorter article titles more attractive for citations? Crosssectional study of 22 scientific journals. Croatian Medical Journal, 51(2), 165–170.

    Article  Google Scholar 

  • Hartley, J. (2007). Planning that title: Practices and preferences for titles with colons in academic articles. Library & Information Science Research, 29(4), 553–568.

    Article  Google Scholar 

  • Holmes, D. I. (1998). The evolution of stylometry in humanities scholarship. Literary and Linguistic Computing, 13(3), 111–117.

    Article  Google Scholar 

  • Jacques, T. S., & Sebire, N. J. (2010). The impact of article titles on citation hits: An analysis of general and specialist medical journals. JRSM Short Reports, 1(1), 1–5.

    Article  Google Scholar 

  • Jamali, H. R., & Nikzad, M. (2011). Article title type and its relation with the number of downloads and citations. Scientometrics, 88(2), 653–661.

    Article  Google Scholar 

  • Jimenez, S., Becerra, C., & Gelbukh, A. (2012) Soft cardinality: A parameterized similarity function for text comparison. In Proceedings of the sixth international workshop on semantic evaluation, association for computational linguistics, (pp. 449–453).

  • Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Tech. Rep. 56, University of Central Florida.

  • Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. Tesol Quarterly, 49(4), 757–786.

    Article  Google Scholar 

  • Lease, M., & Charniak, E. (2005). Parsing biomedical literature. In International conference on natural language processing, pp. 58–69). Springer(

  • Lee, D. H. (2019). Predictive power of conference-related factors on citation rates of conference papers. Scientometrics, 118(1), 281–304.

    Article  Google Scholar 

  • Liang, F. M. (1983). Word hy-phen-a-tion by com-put-er. Tech. rep., Calif. Univ. Stanford. Comput. Sci. Dept.

  • Lin, M., Lucas, H. C, Jr., & Shmueli, G. (2013). Research commentar–too big to fail: large samples and the p-value problem. Information Systems Research, 24(4), 906–917.

    Article  Google Scholar 

  • Lokker, C., McKibbon, A., McKinlay, J., Wilczynski, N., & Haynes, B. (2008). Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: Retrospective cohort study. BMJ, 336(7645), 655–657.

    Article  Google Scholar 

  • Moed, H. F. (2005). Citation analysis in research evaluation. New York: Springer.

    Google Scholar 

  • Nair, L. B., & Gibbert, M. (2016). What makes a ‘good’ title and (how) does it matter for citations? A review and general model of article title attributes in management science. Scientometrics, 107(3), 1331–1359.

    Article  Google Scholar 

  • Paiva, C. E., Lima, J., Pd S. N., & Paiva, B. S. R. (2012). Articles with short titles describing the results are cited more often. Clinics, 67(5), 509–513.

    Article  Google Scholar 

  • Price, D. J. D. S. (1965). Networks of scientific papers. Science, pp. 510–515.

  • Rennie, J., & Srebro, N. (2005). Loss functions for preference levels: Regression with discrete ordered labels. In Proceedings of the IJCAI multidisciplinary workshop on advances in preference handling, Kluwer Norwell, MA, Vol. 1.

  • Rostami, F., Mohammadpoorasl, A., & Hajizadeh, M. (2014). The effect of characteristics of title on citation rates of articles. Scientometrics, 98(3), 2007–2010.

    Article  Google Scholar 

  • Seglen, P. O. (1992). The skewness of science. Journal of the American Society for Information Science, 43(9), 628–638.

    Article  Google Scholar 

  • Severance, S. J., & Cohen, K. B. (2015). Measuring the readability of medical research journal abstracts. Proceedings of BioNLP, 15, 127–133.

    Article  Google Scholar 

  • Smith, L. C. (1981). Citation analysis. Library Trends, 30(1), 83–106.

    Google Scholar 

  • Sohrabi, B., & Iraj, H. (2017). The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts. Scientometrics, 110(1), 243–251.

    Article  Google Scholar 

  • Tahamtan, I., Afshar, A., & Ahamdzadeh, K. (2016). Factors affecting number of citations: A comprehensive review of the literature. Scientometrics, 107(3), 1195–1225.

    Article  Google Scholar 

  • Tang, L. (2013). Does “birds of a feather flock together” matter-Evidence from a longitudinal study on US-China scientific collaboration. Journal of Informetrics, 7(2), 330–344.

    Article  Google Scholar 

  • Thelwall, M., & Wilson, P. (2014). Regression for citation data: An evaluation of different methods. Journal of Informetrics, 8(4), 963–971.

    Article  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.

    MathSciNet  MATH  Google Scholar 

  • Van Wesel, M., Wyatt, S., & ten Haaf, J. (2014). What a difference a colon makes: How superficial factors influence subsequent citation. Scientometrics, 98(3), 1601–1615.

    Article  Google Scholar 

  • Zhang, W., Yoshida, T., & Tang, X. (2011). A comparative study of tf* idf, lsi and multi-words for text classification. Expert Systems with Applications, 38(3), 2758–2765.

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the CONACYT, Mexico, under Grant A1-S-47854, and by the Secretaría de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico, under Grants 20200859, 20200797, and 20201948. We would like to express our gratitude to the anonymous reviewers.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Jimenez.

Appendices

A Search keywords for data extraction from Scopus

abnormal

crystal

finance

mental

quality

accounting

cultural

fish

metabolism

radiation

acoustics

cure

flow

metal

recycling

aerodynamics

customer

fluid

method

religion

aerospace

data

food

microbiology

renewable

aging

database

forensic

microelectronics

respiratory

agricultural

debates

forestry

microwaves

risk

algebra

decision

fuel

mining

robot

algorithm

demand

gender

molecular

safety

alloys

dementia

gene

molecules

sensory

analysis

demography

genetics

monetary

signal

anatomy

dental

geology

music

simulation

animal

dentistry

geometry

nanotechnology

sleep

anthropology

dermatology

globalization

nature

social

antibiotics

develop

graphics

network

software

applications

development

hardware

neurology

soil

archeology

diabetes

health

neuron

space

architecture

diagnosed

heart

neuroscience

spectroscopy

artificial intelligence

diagnosis

hematology

nonlinear

speech

arts

diet

histology

nuclear

sports

asteroid

disease

history

nursing

statistics

astronomy

disorder

human

nutrition

strategy

atmospheric

DNA

hypothesis

ocean

structure

atomic

drugs

immunology

optical

sun

automotive

dynamics

industrial

optimization

supply

biochemistry

earth

infectious

organic

surface

bioinformatics

ecology

inference

orthodontics

surgery

biology

econometrics

innovation

oxides

symptoms

biophysics

economics

inorganic

pandemic

system

bird

economy

insect

parasitology

taxes

brain

ecosystem

instrumentation

particle

testing

building

education

interface

pathology

theoretical

business

electrical

labor

pediatrics

therapy

cancer

electronics

language

pharmacology

thermodynamics

carbon

element

law

pharmacy

topology

cardiology

endangered

learning

philosophy

tourism

cardiovascular

endocrine

legislation

physics

toxicology

catalysis

endocrinology

library

physiology

treatment

cells

energy

life

planet

tree

ceramics

engineering

linear

planetary

tropical

chemical

epidemiology

linguistics

plastics

uncertaintly

chemistry

equilibrium

literature

policy

urban

civil

equation

logic

political

urology

classics

equine

mammal

pollution

user

climate

ergonomics

management

polymers

vaccine

clinical

estimation

manufacturing

pricing

veterinary

cloud

ethics

market

privacy

virology

combustion

evidence

marketing

probability

virus

communication

evolution

material

problem

vision

companies

evolutionary

matter

process

visual

complexity

exchange

measure

profit

waste

computation

experimental

media

property

water

conservation

exploration

medical

psychiatry

weapon

control

fever

medicine

psychology

 

criptography

films

memory

pulmonary

 

SJR journal categories

List of the Scimago Journal & Country Rank (SJR) subject categories taken from https://www.scimagojr.com/journalrank.php. Words in capital letters are occurrences of the list of keywords in Appendix 6. Words between square brackets are additional search keywords semantically related to the subject category selected by one of the authors (a professional linguist).

  • ACCOUNTING [TAXES]

  • ACOUSTICS and Ultrasonics

  • Advanced and Specialized NURSING

  • AEROSPACE ENGINEERING [AERODYNAMICS]

  • AGING

  • AGRICULTURAL and Biological Sciences

  • Agronomy and Crop Science

  • ALGEBRA and Number Theory

  • ANALYSIS

  • Analytical CHEMISTRY

  • ANATOMY

  • Anesthesiology and Pain MEDICINE

  • ANIMAL Science and Zoology [BIRD, FISH, MAMMAL]

  • ANTHROPOLOGY

  • Applied Mathematics

  • Applied MICROBIOLOGY and Biotechnology

  • Applied PSYCHOLOGY

  • Aquatic Science

  • ARCHEOLOGY

  • ARCHEOLOGY (arts and humanities)

  • ARCHITECTURE

  • ARTIFICIAL INTELLIGENCE

  • ARTS and Humanities

  • Assessment and DIAGNOSIS [SYMPTOMS, FEVER, MEASURE]

  • ASTRONOMY and Astrophysics [ASTEROID, PLANET, SUN]

  • ATMOSPHERIC Science

  • ATOMIC and MOLECULAR Physics, and Optics [PARTICLE]

  • AUTOMOTIVE ENGINEERING

  • Behavioral NEUROSCIENCE

  • BIOCHEMISTRY [CARBON]

  • BIOCHEMISTRY, GENETICS and MOLECULAR BIOLOGY [EXCHANGE]

  • BIOCHEMISTRY (medical)

  • Bioengineering [BIOINFORMATICS]

  • Biological PSYCHIATRY

  • Biomaterials

  • Biomedical ENGINEERING

  • BIOPHYSICS

  • Biotechnology

  • BUILDING and Construction

  • BUSINESS and International MANAGEMENT [GLOBALIZATION]

  • BUSINESS, MANAGEMENT and ACCOUNTING [PROFIT]

  • CANCER Research [TREATMENT]

  • CARDIOLOGY and CARDIOVASCULAR MEDICINE [HEART]

  • Care Planning

  • CATALYSIS

  • CELLs BIOLOGY

  • Cellular and MOLECULAR NEUROSCIENCE

  • CERAMICS and Composites

  • CHEMICAL ENGINEERING

  • CHEMICAL HEALTH and SAFETY

  • CHEMISTRY [OXIDES, STRUCTURE]

  • Chiropractics

  • CIVIL and Structural ENGINEERING

  • CLASSICS

  • CLINICAL BIOCHEMISTRY

  • CLINICAL PSYCHOLOGY

  • Cognitive NEUROSCIENCE [DEMENTIA]

  • Colloid and SURFACE CHEMISTRY

  • COMMUNICATION [MICROWAVES]

  • Community and Home Care

  • Complementary and Alternative MEDICINE

  • Complementary and Manual THERAPY

  • COMPUTATIONal Mathematics

  • COMPUTATIONal Mechanics

  • COMPUTATIONal Theory and Mathematics [COMPLEXITY, CRIPTOGRAPHY]

  • Computer GRAPHICS and Computer-Aided Design

  • Computer NETWORKs and Communications

  • Computer Science APPLICATIONS [CLOUD, PRIVACY]

  • Computer Science [ALGORITHM]

  • Computers in EARTH Sciences

  • Computer VISION and Pattern Recognition

  • Condensed MATTER PHYSICS

  • CONSERVATION [ENDANGERED]

  • CONTROL and OPTIMIZATION

  • CONTROL and SYSTEMs ENGINEERING

  • Critical Care and Intensive Care MEDICINE

  • Critical Care NURSING

  • CULTURAL Studies [RELIGION]

  • DECISION Sciences

  • DEMOGRAPHY

  • DENTAL Assisting

  • DENTAL Hygiene

  • DENTISTRY

  • DERMATOLOGY

  • DEVELOPMENT

  • Developmental and Educational PSYCHOLOGY

  • Developmental BIOLOGY

  • Developmental NEUROSCIENCE

  • Discrete Mathematics and Combinatorics

  • DRUG(s) Discovery

  • DRUG(s) Guides

  • EARTH and PLANETARY Sciences [CLIMATE, EXPLORATION]

  • Earth-Surface PROCESSes

  • Ecological Modeling

  • ECOLOGY [ECOSYSTEM]

  • ECOLOGY, EVOLUTION, Behavior and Systematics [EVOLUTIONARY]

  • Economic GEOLOGY

  • ECONOMICS and ECONOMETRICS [ECONOMY, MONETARY]

  • ECONOMICS, ECONOMETRICS and FINANCE

  • EDUCATION [LEARNING]

  • E-learning

  • ELECTRICAL and ELECTRONIC(s) ENGINEERING

  • Electrochemistry

  • Electronic, OPTICAL and Magnetic MATERIALs

  • Embryology

  • Emergency MEDICAL Services

  • Emergency MEDICINE [DEVELOP]

  • Emergency NURSING

  • ENDOCRINE and Autonomic SYSTEMs

  • ENDOCRINOLOGY

  • ENDOCRINOLOGY, DIABETES and METABOLISM

  • ENERGY ENGINEERING and Power Technology [SUPPLY]

  • ENERGY [COMBUSTION, THERMODYNAMICS]

  • ENGINEERING [PROBLEM, DYNAMICS, ELEMENT]

  • Environmental CHEMISTRY

  • Environmental ENGINEERING

  • Environmental Science

  • EPIDEMIOLOGY [PANDEMIC]

  • EQUINE

  • EXPERIMENTAL and Cognitive PSYCHOLOGY [MEMORY]

  • Family Practice [PRIVACY]

  • Filtration and Separation

  • FINANCE

  • FLUID FLOW and Transfer PROCESSes

  • FOOD Animals

  • FOOD Science

  • FORESTRY

  • FUEL Technology

  • Fundamentals and Skills

  • Gastroenterology [DIET]

  • GENDER Studies

  • GENETICS [GENE, ADN]

  • GENETICS (clinical)

  • Geochemistry and Petrology

  • Geography, Planning and DEVELOPMENT

  • GEOLOGY [MINING]

  • GEOMETRY and TOPOLOGY

  • Geophysics

  • Geotechnical ENGINEERING and Engineering GEOLOGY

  • Geriatrics and Gerontology

  • Gerontology

  • Global and PLANETARY Change

  • HARDWARE and ARCHITECTURE [MICROELECTRONICS]

  • HEALTH Informatics

  • HEALTH Information MANAGEMENT

  • HEALTH POLICY

  • HEALTH Professions

  • HEALTH (social science)

  • HEALTH, TOXICOLOGY and Mutagenesis

  • HEMATOLOGY

  • Hepatology

  • HISTOLOGY

  • HISTORY

  • HISTORY and PHILOSOPHY of Science

  • Horticulture

  • HUMAN-Computer Interaction [USER]

  • HUMAN Factors and ERGONOMICS

  • IMMUNOLOGY

  • IMMUNOLOGY and Allergy

  • IMMUNOLOGY and MICROBIOLOGY [VACCINE, VIRUS]

  • INDUSTRIAL and MANUFACTURING ENGINEERING

  • INDUSTRIAL Relations

  • INFECTIOUS DISEASEs

  • Information SYSTEMs [DATABASE, DATA]

  • Information SYSTEMs and MANAGEMENT

  • INORGANIC CHEMISTRY

  • INSECT Science

  • INSTRUMENTATION

  • Internal MEDICINE [DIAGNOSED]

  • Issues, ETHICS and Legal Aspects

  • LANGUAGE and LINGUISTICS

  • LAW

  • Leadership and MANAGEMENT

  • LIBRARY and Information Sciences

  • LIFE-span and LIFE-course Studies

  • LINGUISTICS and LANGUAGE

  • LITERATURE and Literary Theory

  • LOGIC

  • MANAGEMENT Information SYSTEMs

  • MANAGEMENT, Monitoring, POLICY and LAW [LEGISLATION]

  • MANAGEMENT of Technology and INNOVATION

  • MANAGEMENT Science and Operations Research [COMPANIES]

  • MARKETING [MARKET, PRICING, CUSTOMER, DEMAND]

  • MATERIALs CHEMISTRY [CRYSTAL]

  • MATERIALs Science [PROPERTY]

  • Maternity and Midwifery

  • Mathematical PHYSICS

  • Mathematics [EQUATION]

  • Mechanical ENGINEERING [ROBOT]

  • Mechanics of MATERIALs [TESTING]

  • MEDIA Technology

  • MEDICAL and Surgical NURSING

  • MEDICAL Assisting and Transcription

  • MEDICAL Laboratory Technology

  • MEDICAL Terminology

  • MEDICINE [ABNORMAL, METHOD, HYPOTHESIS]

  • METALs and ALLOYS

  • MICROBIOLOGY

  • MICROBIOLOGY (medical)

  • Modeling and SIMULATION

  • MOLECULAR BIOLOGY

  • MOLECULAR MEDICINE [MOLECULES]

  • Multidisciplinary

  • Museology

  • MUSIC

  • Nanoscience and NANOTECHNOLOGY

  • NATURE and Landscape CONSERVATION

  • Nephrology

  • NEUROLOGY [BRAIN, NEURON]

  • NEUROLOGY (clinical)

  • Neuropsychology and Physiological PSYCHOLOGY

  • NEUROSCIENCE [SLEEP]

  • NUCLEAR and High ENERGY PHYSICS

  • NUCLEAR ENERGY and ENGINEERING

  • Numerical ANALYSIS

  • Nurse Assisting

  • NURSING

  • NUTRITION and Dietetics

  • Obstetrics and Gynecology

  • Occupational THERAPY

  • OCEAN ENGINEERING

  • Oceanography

  • Oncology

  • Oncology (nursing)

  • Ophthalmology

  • Optometry

  • Oral SURGERY

  • ORGANIC CHEMISTRY [CARBON]

  • Organizational Behavior and HUMAN Resource MANAGEMENT [LABOR]

  • ORTHODONTICS

  • Orthopedics and SPORTS MEDICINE

  • Otorhinolaryngology

  • Paleontology

  • PARASITOLOGY

  • PATHOLOGY and FORENSIC MEDICINE

  • PEDIATRICS

  • PEDIATRICS, Perinatology and Child HEALTH

  • Periodontics

  • Pharmaceutical Science [ANTIBIOTICS]

  • PHARMACOLOGY

  • PHARMACOLOGY (medical)

  • PHARMACOLOGY (nursing)

  • PHARMACOLOGY, TOXICOLOGY and Pharmaceutics

  • PHARMACY

  • PHILOSOPHY

  • Physical and THEORETICAL CHEMISTRY

  • Physical THERAPY, SPORTS THERAPY and Rehabilitation [CURE]

  • PHYSICS and ASTRONOMY [EQUILIBRIUM]

  • PHYSIOLOGY

  • PHYSIOLOGY (medical)

  • Plant Science [TREE, TROPICAL]

  • Podiatry

  • POLITICAL Science and International Relations

  • POLLUTION [RECYCLING]

  • POLYMERS and PLASTICS

  • PROCESS CHEMISTRY and Technology

  • Psychiatric MENTAL HEALTH

  • PSYCHIATRY and MENTAL HEALTH [DISORDER]

  • PSYCHOLOGY

  • Public Administration

  • Public Health, Environmental and Occupational Health

  • PULMONARY and RESPIRATORY MEDICINE

  • RADIATION

  • Radiological and Ultrasound Technology

  • Radiology, NUCLEAR MEDICINE and Imaging

  • Rehabilitation

  • Religious Studies

  • RENEWABLE Energy, Sustainability and the Environment

  • Reproductive MEDICINE

  • Research and Theory

  • RESPIRATORY Care

  • Review and Exam Preparation

  • Reviews and References (medical)

  • Rheumatology

  • SAFETY Research

  • SAFETY, RISK, Reliability and QUALITY

  • SENSORY SYSTEMs

  • SIGNAL Processing

  • Small Animals

  • SOCIAL PSYCHOLOGY

  • SOCIAL Sciences [DEBATES, WEAPON]

  • SOCIAL Work [LABOR]

  • Sociology and POLITICAL Science

  • SOFTWARE

  • SOIL Science

  • SPACE and PLANETARY Science

  • SPECTROSCOPY

  • SPEECH and Hearing

  • SPORTS Science

  • Statistical and NONLINEAR PHYSICS [LINEAR]

  • STATISTICS and PROBABILITY [INFERENCE]

  • STATISTICS, PROBABILITY and UNCERTAINTLY [EVIDENCE, ESTIMATION]

  • STRATEGY and MANAGEMENT

  • Stratigraphy

  • Structural BIOLOGY

  • Surfaces and INTERFACEs

  • Surfaces, Coatings and FILMS

  • SURGERY

  • THEORETICAL Computer Science

  • TOURISM, Leisure and Hospitality MANAGEMENT

  • TOXICOLOGY

  • Transplantation

  • Transportation

  • URBAN Studies

  • UROLOGY

  • VETERINARY

  • VIROLOGY

  • VISUAL ARTS and Performing Arts

  • WASTE MANAGEMENT and Disposal

  • WATER Science and Technology

Science subject areas in the Scopus web search engine

  1. 1.

    Agricultural and Biological Sciences

  2. 2.

    Arts and Humanities

  3. 3.

    Biochemistry, Genetics and Molecular Biology

  4. 4.

    Business, Management and Accounting

  5. 5.

    Chemistry and Chemical Engineering

  6. 6.

    Computer Science

  7. 7.

    Decision Sciences

  8. 8.

    Dentistry

  9. 9.

    Earth and Planetary Sciences

  10. 10.

    Economics, Econometrics and Finance

  11. 11.

    Energy

  12. 12.

    Engineering

  13. 13.

    Environmental Science

  14. 14.

    Health Professions

  15. 15.

    Immunology and Microbiology

  16. 16.

    Materials Science

  17. 17.

    Mathematics

  18. 18.

    Medicine

  19. 19.

    Multidisciplinary

  20. 20.

    Neuroscience

  21. 21.

    Nursing

  22. 22.

    Pharmacology, Toxicology and Pharmaceutics

  23. 23.

    Physics and Astronomy

  24. 24.

    Psychology

  25. 25.

    Social Sciences

  26. 26.

    Undefined

  27. 27.

    Veterinary

Part-of-speech tags from the Penn Treebank Project

POS

Description

Examples

CC

coordinating conjunction

and, but, or

CD

cardinal digit

1, one

DT

determiner

the, a, an

EX

existential

‘there’ is

FW

foreign word

chercheur (fr), muestra (es)

IN

preposition/subordinating conjunction

in, on, before

JJ

adjective

big

JJR

adjective, comparative

bigger

JJS

adjective, superlative

biggest

LS

list marker

1)

MD

modal

could, will

NN

noun, singular

desk

NNS

noun plural

desks

NNP

proper noun, singular

Harrison

NNPS

proper noun, plural

Americans

PDT

predeterminer

’all the kids’

PNC

punctuation mark

‘.,;: ...’

POS

possessive ending

parent’s

PRP

personal pronoun

I, he, she

PRP$

possessive pronoun

my, his, hers

RB

adverb

very, silently

RBR

adverb, comparative

better

RBS

adverb, superlative

best

RP

particle

give up

TO

to

go ’to’ the store.

UH

interjection

errrrrrrrm

VB

verb, base form

take

VBD

verb, past tense

took

VBG

verb, gerund/present participle

taking

VBN

verb, past participle

taken

VBP

verb, sing. present, non-3d

take

VBZ

verb, 3rd person sing. present

takes

WDT

wh-determiner

which

WP

wh-pronoun

who, what

WP$

possessive wh-pronoun

whose

WRB

wh-abverb

where, when

Stopword list from the Natural Language Toolkit (NLTK)

a

can

here

myself

shouldn

was

about

couldn

hers

needn

so

wasn

above

d

herself

no

some

we

after

did

him

nor

such

were

again

didn

himself

not

t

weren

against

do

his

now

than

what

ain

does

how

o

that

when

all

doesn

i

of

the

where

am

doing

if

off

their

which

an

don

in

on

theirs

while

and

down

into

once

them

who

any

during

is

only

themselves

whom

are

each

isn

or

then

why

aren

few

it

other

there

will

as

for

its

our

these

with

at

from

itself

ours

they

won

be

further

just

ourselves

this

wouldn

because

had

ll

out

those

y

been

hadn

m

over

through

you

before

has

ma

own

to

your

being

hasn

me

re

too

yours

below

have

mightn

s

under

yourself

between

haven

more

same

until

yourselves

both

having

most

shan

up

 

but

he

mustn

she

ve

 

by

her

my

should

very

 

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jimenez, S., Avila, Y., Dueñas, G. et al. Automatic prediction of citability of scientific articles by stylometry of their titles and abstracts. Scientometrics 125, 3187–3232 (2020). https://doi.org/10.1007/s11192-020-03526-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-020-03526-1

Keywords

Navigation