Mining and Using Key-Words and Key-Phrases to Identify the Era of an Anonymous Text

Mughaz, Dror; HaCohen-Kerner, Yaakov; Gabbay, Dov

doi:10.1007/978-3-319-59268-8_6

Dror Mughaz^17,18,
Yaakov HaCohen-Kerner¹⁸ &
Dov Gabbay^17,19

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 10190))

322 Accesses
1 Citations

Abstract

This study is trying to determine the time-frame in which the author of a given document lived. The documents are rabbinic documents written in Hebrew-Aramaic languages. The documents are undated and do not contain a bibliographic section, which leaves us with an interesting challenge. To do this, we define a set of key-phrases and formulate various types of rules: “Iron-clad”, Heuristic and Greedy, to define the time-frame. These rules are based on key-phrases and key-words in the documents of the authors. Identifying the time-frame of an author can help us determine the generation in which specific documents were written, can help in the examination of documents, i.e., to conclude if documents were edited, and can also help us identify an anonymous author. We tested these rules on two corpora containing responsa documents. The results are promising and are better for the larger corpus than for the smaller corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Contained in the Global Jewish Database (The Responsa Project at Bar-Ilan University). http://www.biu.ac.il/ICJI/Responsa.

References

Powley, B., Dale, R.: Evidence-based information extraction for high accuracy citation and author name identification. In: RIAO 2007 (2007)
Google Scholar
Wintner, S.: Hebrew computational linguistics: past and future. Artif. Intell. Rev. 21(2), 113–138 (2004)
Article MATH Google Scholar
HaCohen-Kerner, Y., Kass, A., Peretz, A.: HAADS: A Hebrew Aramaic abbreviation disambiguation system. J. Am. Soc. Inf. Sci. Technol. JASIST 61(9), 1923–1932 (2010)
Article Google Scholar
Gutwin, C., Paynter, G., Witten, I., Nevill-Manning, C., Frank, E.: Improving browsing in digital libraries with key-phrase indexes. Decis. Support Syst. 27(1), 81–104 (1999)
Article Google Scholar
Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst. 2(1), 39–53 (2004)
Google Scholar
Hulth, A., Megyesi, B.B.: A study on automatically extracted key-words in text categorization. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 537–544 (2006)
Google Scholar
Kim, S.N., Baldwin, T.: Extracting key-words from multi-party live chats. In: Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, pp. 199–208 (2012)
Google Scholar
Berend, G.: Opinion expression mining by exploiting key-phrase extraction. In: IJCNLP, pp. 1162–1170 (2011)
Google Scholar
Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic key-phrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 366–376. ACL (2010)
Google Scholar
Hasan, K.S., Ng, V.: Conundrums in unsupervised key-phrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 365–373. ACL (2010)
Google Scholar
Medelyan, O., Frank, E., Witten, I.H.: Human-competitive tagging using automatic key-phrase extraction. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1318–1327. ACL (2009)
Google Scholar
Kim, S.N., Medelyan, O., Kan, M.Y., Baldwin, T.: Automatic key-phrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)
Article Google Scholar
Yih, W.T., Goodman, J., Carvalho, V.R.: Finding advertising key-words on web pages. In: Proceedings of the 15th International Conference on World Wide Web, pp. 213–222. ACM (2006)
Google Scholar
Schomaker, L., Bulacu, M.: Automatic writer identification using connected-component contours and edge-based features of uppercase western script. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 787–798 (2004)
Article Google Scholar
Said, H., Tan, T., Baker, K.: Personal identification based on handwriting. Pattern Recogn. 33(1), 149–160 (2000)
Article Google Scholar
Bulacu, M., Schomaker, L.: Text-independent writer identification and verification using textural and allographic features. IEEE Trans. Pattern Anal. Mach. Intell. 29(4), 701–717 (2007)
Article Google Scholar
Bar-Yosef, I., Beckman, I., Kedem, K., Dinstein, I.: Binarization, character extraction, and writer identification of historical Hebrew calligraphy documents. IJDAR 9(2–4), 89–99 (2007)
Article Google Scholar
Garfield, E.: Can citation indexing be automated? In: Stevens, M. (ed.) Statistical Association Methods for Mechanical Documentation, Symposium Proceedings, vol. 269, pp. 189–192. National Bureau of Standards Miscellaneous Publication, Washington, D.C. (1965)
Google Scholar
Berkowitz, E., Elkhadiri, M.R.: Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research, pp. 68–73 (2004)
Google Scholar
Giuffrida, G., Shek, E.C., Yang, J.: Knowledge-based metadata extraction from postscript files. In: Proceedings of the 5th ACM conference on Digital libraries, pp. 77–84. ACM (2000)
Google Scholar
Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: AAAI-1999 Workshop on Machine Learning for Information Extraction, pp. 37–42 (1999)
Google Scholar
Ritchie, A., Robertson, S., Teufel, S.: Comparing citation contexts for information retrieval. In: The 17th ACM Conference on Information and Knowledge Management (CIKM), pp. 213–222 (2008)
Google Scholar
Bradshaw, S.: Reference directed indexing: redeeming relevance for subject search in citation indexes. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 499–510. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45175-4_45
Chapter Google Scholar
HaCohen-Kerner, Y., Beck, H., Yehudai, E., Rosenstein, M., Mughaz, D.: Cuisine: classification using stylistic feature sets and/or name-based feature sets. J. Am. Soc. Inf. Sci. Technol. (JASIST) 61(8), 1644–1657 (2010)
Google Scholar
HaCohen-Kerner, Y., Mughaz, D.: Estimating the birth and death years of authors of undated documents using undated citations. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds.) NLP 2010. LNCS, vol. 6233, pp. 138–149. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14770-8_17
Chapter Google Scholar
HaCohen-Kerner, Y., Schweitzer, N., Mughaz, D.: Automatically identifying citations in Hebrew-Aramaic documents. Cybern. Syst.: Int. J. 42(3), 180–197 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Bar-Ilan University, 5290002, Ramat-Gan, Israel
Dror Mughaz & Dov Gabbay
Department of Computer Science, Lev Academic Center, 9116001, Jerusalem, Israel
Dror Mughaz & Yaakov HaCohen-Kerner
Department of Informatics, Kings College London, Strand, London, WC2R 2LS, UK
Dov Gabbay

Authors

Dror Mughaz
View author publications
You can also search for this author in PubMed Google Scholar
Yaakov HaCohen-Kerner
View author publications
You can also search for this author in PubMed Google Scholar
Dov Gabbay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dror Mughaz .

Editor information

Editors and Affiliations

Institute of Informatics, Wroclaw University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Swinburne University of Technology , Hawthorn, South Australia, Australia
Ryszard Kowalczyk
University of Lisbon , Lisbon, Portugal
Alexandre Miguel Pinto
Huawei German Research Center, Munich, Germany
Jorge Cardoso

Appendix

Data Set Information.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mughaz, D., HaCohen-Kerner, Y., Gabbay, D. (2017). Mining and Using Key-Words and Key-Phrases to Identify the Era of an Anonymous Text. In: Nguyen, N., Kowalczyk, R., Pinto, A., Cardoso, J. (eds) Transactions on Computational Collective Intelligence XXVI. Lecture Notes in Computer Science(), vol 10190. Springer, Cham. https://doi.org/10.1007/978-3-319-59268-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-59268-8_6
Published: 15 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59267-1
Online ISBN: 978-3-319-59268-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mining and Using Key-Words and Key-Phrases to Identify the Era of an Anonymous Text

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation