Abstract
In this paper we present an in depth discussion of the architecture of a new plagiarism detection platform developed by a consortium of Polish universities. The algorithms used by the platform are briefly described in Sect. 3. The main goal of this paper is to present high level structures of services resulting from a very nontrivial attempt to strike an appropriate balance between locality and centralization, while working under strict constraint, both of technological and legal nature.
This work is supported by MUCI (Międzyuniwersyteckie Centrum Informatyzacji — Interuniversity Centre for IT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aragon, A.M.: A C++11 implementation of arbitrary-rank tensors for high-performance computing. Comput. Phys. Commun. 185(11), 3065–3066 (2014)
Devi, S.L., Rao, P.R., Ram, V.S., Akilandeswari, A.: External plagiarism detection. Lab report for PAN at CLEF (2010)
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles, SOSP 2003, pp. 29–43. ACM, New York (2003). http://doi.acm.org/10.1145/945445.945450
Gipp, B., Beel, J.: Citation based plagiarism detection: a new approach to identify plagiarized work language independently. In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, pp. 273–274. ACM (2010)
Hoad, T.C., Zobel, J.: Methods for identifying versioned and plagiarized documents. J. Am. Soc. Inf. Sci. Technol. 54(3), 203–215 (2003)
Juola, P.: Authorship attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2006)
Kowalski, M.: Imitacja i ignorancja. Zeszyty Naukowe Politechniki Rzeszowskiej 15, 69–74 (2008)
Kowalski, M., Kruszyński, P., Sobieski, S., Sysak, M.: Geneza, architekturai testy otwartego systemu antyplagiatowego. In: Hołyst, B., Pomykała, J., Potejko, P. (eds.) Nowe techniki badań kryminalistycznych a bezpieczeństwo informacji, pp. 257–273. PWN (2014)
Kowalski, M., Szczepański, M.: Identity of academic theses. In: Dobrzynska T., Kuncheva R. (eds.) Resemblance and Difference. The Problem of Identity, pp. 259-278. IBL PAN, IL BAN (2015)
Kowalski, M., Szczepański, M.: Akademicka przestępczość wcyberprzestrzeni. In: Hołyst, B., Pomykała, J. (eds.) Cyberprzestępczość i ochrona informacji, pp. 113–126. WydawnictwoWyższej Szkoły Menedżerskiej w Warszawie (2011)
Łysoń, P., Golaszewska, H., Maślankowski, J., Franecka, A., Jaworski, P., Kamińska, M., Rutkowska, M., Rybicka, K., Ulatowska, M., Wiktor, M.: Szkoły wyższe i ich finanse w 2012 r. Higher education institutions and their finances in 2012. In: Informacje i opracowania statystyczne, Statistical Information and Elaborations. Zakład Wydawnictw Statystycznych (2013)
Meuschke, N., Gipp, B.: State-of-the-art in detecting academic plagiarism. Int. J. for Educ. Integrity 9(1) (2013)
Monostori, K., Zaslavsky, A., Schmidt, H.: Identifying overlapping documents in semi-structured text collections. In: Australasian Computer Science Conference (2000)
Pagh, R., Rodler, F.F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004). http://dx.doi.org/10.1016/j.jalgor.2003.12.002
Rocchio, J.J.: Relevance feedback in information retrieval (1971)
Salton, G.: Developments in automatic text retrieval. Science (New York, N.Y.) 253(5023), 974–980 (1991). http://dx.doi.org/10.1126/science.253.5023.974
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988). http://dx.doi.org/10.1016/0306-4573(88)90021-0
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975). http://doi.acm.org/10.1145/361219.361220
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Schleimer, S., Wilkerson, D.S., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85. ACM (2003)
Si, A., Leong, H.V., Lau, R.W.: Check: a document plagiarism detection system. In: Proceedings of the 1997 ACM Symposium on Applied Computing, pp. 70–77. ACM (1997)
Sindhu, L., Thomas, B.B., Idicula, S.M.: Automated plagiarism detection system for malayalam text documents. Int. J. Comput. Appl. 106(15), 13–16 (2014)
Szczepański, M.: Testy skuteczności algorytmu preselekcji otwartego systemu antyplagiatowego In: Holyst, B., Pomykala, J., Potejko, P. (eds.) Nowe techniki badan kryminalistycznych a bezpieczenstwo informacji, pp. 248–256. PWN (2014)
Szczepański, M.: Algorytmy klasyfikacji tekstów i ich wykorzystanie w systemie wykrywania plagiatów. Oficyna Wydawnicza Politechniki Warszawskiej (2002)
Szmit, R.: Fast plagiarism detection in large-scale data (submitted for publicaton)
Wu, H., Salton, G.: A comparison of search term weighting: term relevance vs.inverse document frequency. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Information Storage and Retrieval: Theoretical Issues in Information Retrieval, SIGIR 1981, pp. 30–39. ACM, New York (1981). http://doi.acm.org/10.1145/511754.511759
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Sobieski, Ś., Kowalski, M.A., Kruszyński, P., Sysak, M., Zieliński, B., Maślanka, P. (2016). OSA Architecture. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. BDAS BDAS 2015 2016. Communications in Computer and Information Science, vol 613. Springer, Cham. https://doi.org/10.1007/978-3-319-34099-9_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-34099-9_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-34098-2
Online ISBN: 978-3-319-34099-9
eBook Packages: Computer ScienceComputer Science (R0)