Text analysis of academic papers archived in institutional repositories | IEEE Conference Publication | IEEE Xplore

Text analysis of academic papers archived in institutional repositories

Free

Abstract:

This paper attempts to use more than a million academic papers available in institutional repositories in Japan, and their possibilities as a language resource are verifi...Show More

Abstract:

This paper attempts to use more than a million academic papers available in institutional repositories in Japan, and their possibilities as a language resource are verified. The verification is performed with (1) word length distribution of extracted noun words from collected academic papers in PDF and (2) word analogy tasks by using word2vec. The results suggest that words are separated at meaningless positions without pre-processing where full text are extracted from academic papers in PDF, and related words can be analogized by similarity calculation among words on a feature space. We conclude that academic papers in institutional repositories in Japan have possibilities as a language resource, although preprocessing for distributed files should be addressed.
Date of Conference: 26-29 June 2016
Date Added to IEEE Xplore: 25 August 2016
ISBN Information:
Conference Location: Okayama, Japan

References

References is not available for this document.