Controlled vocabularies are increasingly made available on the Web of Data using the Simple Knowledge Organization System (SKOS) ontology. Assessment of vocabulary quality is important for determining the suitability of vocabularies for reuse in applications and for improving vocabulary development processes. We define 26 quality issues, i.e., computable functions that expose potential quality problems. In an analysis of a representative set of 24 SKOS vocabularies, we found all of them to contain structural errors and/or other quality problems. We propose a set of correction heuristics which we have used to automatically correct a significant proportion of the identified problems. Our reference implementations of these methods, the quality assessment tool qSKOS and the quality improvement tool Skosify, are available for reuse as open-source software.

In particular, neither OWL nor OWL 2 include any means to express the integrity condition S14: “A resource has no more than one value of skos:prefLabel per language tag.”
e.g., public-esw-thes@w3.org and public-lod@w3.org.
The script, sparqldump.py, is included in the Skosify distribution.
Missing namespace declarations were added manually for UMBEL. In NYTL, the invalid language tag fr_1793 was manually changed into fr-1793 in order to comply with BCP47 and the Turtle specification. In Reegle, an unparseable line in the original RDF dump was manually removed. For GEMET, the source file containing Arabic labels was excluded as it contained labels with improper Unicode encoding that caused the Jena toolkit to fail in parsing it.
The Turtle files were condensed by removing extra whitespace, including all indentation, and using short 0–2 character namespace prefixes.
Typographical note: words set in typewriter style that do not include a namespace prefix, such as Concept and prefLabel, refer to terms defined by SKOS [28].
http://sindice.com/ indexes the Web of Data, which is composed of pages with semantic markup in RDF, RDFa, Microformats or Microdata. Currently, it covers approximately 230 M documents with over 11 billion triples.
http://datahub.io/ is a “community-run catalogue” of currently 5,045 datasets, many of them following the Linked Data guidelines.
SKOS-XL is an extension schema to SKOS that enhances the labeling capabilities by treating labels as resources and not as literals.
TheSoz Thesaurus for the Social Sciences, http://datahub.io/dataset/gesis-thesoz
In the most common case, there is only one concept scheme (often the one created in the previous step), and that will be selected as the default concept scheme; otherwise, the default concept scheme will be chosen arbitrarily and a warning message shown by Skosify.
