Abstract
This study is trying to determine the time-frame in which the author of a given document lived. The documents are rabbinic documents written in Hebrew-Aramaic languages. The documents are undated and do not contain a bibliographic section, which leaves us with an interesting challenge. To do this, we define a set of key-phrases and formulate various types of rules: “Iron-clad”, Heuristic and Greedy, to define the time-frame. These rules are based on key-phrases and key-words in the documents of the authors. Identifying the time-frame of an author can help us determine the generation in which specific documents were written, can help in the examination of documents, i.e., to conclude if documents were edited, and can also help us identify an anonymous author. We tested these rules on two corpora containing responsa documents. The results are promising and are better for the larger corpus than for the smaller corpus.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Contained in the Global Jewish Database (The Responsa Project at Bar-Ilan University). http://www.biu.ac.il/ICJI/Responsa.
References
Powley, B., Dale, R.: Evidence-based information extraction for high accuracy citation and author name identification. In: RIAO 2007 (2007)
Wintner, S.: Hebrew computational linguistics: past and future. Artif. Intell. Rev. 21(2), 113–138 (2004)
HaCohen-Kerner, Y., Kass, A., Peretz, A.: HAADS: A Hebrew Aramaic abbreviation disambiguation system. J. Am. Soc. Inf. Sci. Technol. JASIST 61(9), 1923–1932 (2010)
Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with key-phrase indexes. Decis. Support Syst. 27(1), 81–104 (1999)
Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst. 2(1), 39–53 (2004)
Hulth, A., Megyesi, B.B.: A study on automatically extracted key-words in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 537–544 (2006)
Kim, S.N., Baldwin, T.: Extracting key-words from multi-party live chats. In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, pp. 199–208 (2012)
Berend, G.: Opinion expression mining by exploiting key-phrase extraction. In: IJCNLP, pp. 1162–1170 (2011)
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic key-phrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 366–376. ACL (2010)
Hasan, K.S., Ng, V.: Conundrums in unsupervised key-phrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 365–373. ACL (2010)
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic key-phrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1318–1327. ACL (2009)
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Automatic key-phrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)
Yih, W.T., Goodman, J., Carvalho, V.R.: Finding advertising key-words on web pages. In: Proceedings of the 15th International Conference on World Wide Web, pp. 213–222. ACM (2006)
Schomaker, L., Bulacu, M.: Automatic writer identification using connected-component contours and edge-based features of uppercase western script. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 787–798 (2004)
Said, H., Tan, T., Baker, K.: Personal identification based on handwriting. Pattern Recogn. 33(1), 149–160 (2000)
Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)
Bar-Yosef, I., Beckman, I., Kedem, K., Dinstein, I.: Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents. IJDAR 9(2–4), 89–99 (2007)
Garfield, E.: Can citation indexing be automated? In: Stevens, M. (ed.) Statistical Association Methods for Mechanical Documentation, Symposium Proceedings, vol. 269, pp. 189–192. National Bureau of Standards Miscellaneous Publication, Washington, D.C. (1965)
Berkowitz, E., Elkhadiri, M.R.: Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research, pp. 68–73 (2004)
Giuffrida, G., Shek, E.C., Yang, J.: Knowledge-based metadata extraction from postscript files. In: Proceedings of the 5th ACM conference on Digital libraries, pp. 77–84. ACM (2000)
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: AAAI-1999 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)
Ritchie, A., Robertson, S., Teufel, S.: Comparing citation contexts for information retrieval. In: The 17th ACM Conference on Information and Knowledge Management (CIKM), pp. 213–222 (2008)
Bradshaw, S.: Reference directed indexing: redeeming relevance for subject search in citation indexes. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 499–510. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45175-4_45
HaCohen-Kerner, Y., Beck, H., Yehudai, E., Rosenstein, M., Mughaz, D.: Cuisine: classification using stylistic feature sets and/or name-based feature sets. J. Am. Soc. Inf. Sci. Technol. (JASIST) 61(8), 1644–1657 (2010)
HaCohen-Kerner, Y., Mughaz, D.: Estimating the birth and death years of authors of undated documents using undated citations. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS, vol. 6233, pp. 138–149. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14770-8_17
HaCohen-Kerner, Y., Schweitzer, N., Mughaz, D.: Automatically identifying citations in Hebrew-Aramaic documents. Cybern. Syst.: Int. J. 42(3), 180–197 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Data Set Information.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Mughaz, D., HaCohen-Kerner, Y., Gabbay, D. (2017). Mining and Using Key-Words and Key-Phrases to Identify the Era of an Anonymous Text. In: Nguyen, N., Kowalczyk, R., Pinto, A., Cardoso, J. (eds) Transactions on Computational Collective Intelligence XXVI. Lecture Notes in Computer Science(), vol 10190. Springer, Cham. https://doi.org/10.1007/978-3-319-59268-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-59268-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59267-1
Online ISBN: 978-3-319-59268-8
eBook Packages: Computer ScienceComputer Science (R0)