As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Authors evaluated supervised automatic classification algorithms for determination of health related web-page compliance with individual HONcode criteria of conduct using varying length character n-gram vectors to represent healthcare web page documents. The training/testing collection comprised web page fragments extracted by HONcode experts during the manual certification process. The authors compared automated classification performance of n-gram tokenization to the automated classification performance of document words and Porter-stemmed document words using a Naive Bayes classifier and DF (document frequency) dimensionality reduction metrics. The study attempted to determine whether the automated, language-independent approach might safely replace word-based classification. Using 5-grams as document features, authors also compared the baseline DF reduction function to Chi-square and Z-score dimensionality reductions. Overall study results indicate that n-gram tokenization provided a potentially viable alternative to document word stemming.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.