Abstract
Due to the ever-increasing number of digital lecture libraries and lecture video portals, the challenge of retrieving lecture videos has become a very significant and demanding task in recent years. Accordingly, the literature presents different techniques for video retrieval by considering video contents as well as signal data. Here, we propose a lecture video retrieval system using multimodal features and probability extended nearest neighbor (PENN) classification. There are two modalities utilized for feature extraction. One is textual information, which is determined from the lecture video using optical character recognition. The second modality utilized to preserve video content is local vector pattern. These two modal features are extracted, and the retrieval of videos is performed using the proposed PENN classifier, which is the extension of the extended nearest neighbor classifier, by considering the different weightages for the first-level and second-level neighbors. The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure, which are computed by matching the retrieved videos and the manually classified videos. From the experimentation, we proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods.
1 Introduction
The current video search and video retrieval systems such as Google, YouTube, Bing, etc., retrieve videos based on available textual metadata such as title, genre, person, and tags given by the users, which may not be available or relevant to the video content at all times [8]. In general, such type of metadata has to be generated by a human to make sure that it is of high quality; however, the generation step may consume some time and cost. Moreover, the metadata provided by the human is brief, high level, and subjective in nature. Hence, apart from the existing techniques, the upcoming video retrieval systems concentrate on automatically generating the metadata by using video analysis technologies. Thus, more efficient content-based metadata can be created [6], [16], [19]. Generally, video retrieval methods are classified as text-based methods or content-based methods. The text-based methods take texts as inputs and take traditional textual search methodologies to search for textual information linked within a video, while the content-based methods take images or videos as input and search the similar visual contents within a video [7].
In the field of images and videos, content-based video retrieval is most commonly used in various applications, such as video editing, composition, surveillance, object manipulation, scene composition, and health informatics [15]. Here, the first step for video retrieval is the partitioning of a video sequence into shots. A shot is an image sequence that provides continuous action; it is captured from a single operation of a single camera. Key frames can also be used to represent video features and the retrieval can be performed based on visual features of key frames, and queries may be directed at key frames using query by retrieval algorithms. After extracting the key frames, the next step is to extract the features. The features are generally extracted offline, and so computation is not an important factor; however, computation of features still takes a long time [9]. The most common methods usually adopt low-level visual features such as color, texture, shape, and motion to measure the similarity between videos [5]. The last step is about matching the features with query image to obtain the desired videos. Following these key steps, different methods are presented in the literature for video retrieval, which has a wide range of application based on the video taken for retrieval purposes.
Yang and Meinel [16] proposed character and speech recognition methods for retrieval. This method has the advantage of textual content and improved the retrieval performance, but considering both sources of information would increase the computational overhead. Chen et al. [2] proposed the latent variable-based technique for retrieval. It performs better – even noisy text is available – but the major drawback is that visual features are not considered. Cooper [3] proposed character and speech recognition methods for retrieval. It combines both modalities to improve the performance but it has the problem of a weak indexing method, which affects the strength of retrieval. Yang et al. [18] proposed the weighted Discrete Cosine transform-based method for retrieval. Due to time-based text occurrence information, it smooths the retrieval performance but it faces the problem due to the text detector, which is not adaptable to noisy pixels. Yang et al. [19] proposed character and speech recognition methods for retrieval. It search indices to improve the searching performance in video retrieval but it requires manual annotation of videos. Yang et al. [17] proposed the video segmenter and geometry-based optical character recognition (OCR). This technique improved due to video indexing; however, dictionary-based learning requires more training sequences. Che et al. [1] proposed character recognition-based retrieval, which has the advantage of logical correlation among slides, but it fails to include the textual characteristics- and capturing characteristics-dependent analysis.
In this paper, a lecture video retrieval system is developed using multimodal features and probability extended nearest neighbor (PENN) classification. The proposed content-based lecture video retrieval system utilizes textual information and content features for the retrieval purpose. At first, input videos are read out and frames are extracted from the input videos. Then, key frames are identified from input frames. Once the key frames are identified, two levels of information are extracted from the frames. The first level of information is the textual contents, which are extracted using OCR methods [10], [12]. The second set of information is based on the visual content that is extracted based on the texture strength. Texture consistency is effectively estimated using a local pattern descriptor [4], which is one of the recent and effective techniques for texture description of images. These two sets of information are extracted from every video, and they are stored on the indexed database. When a query frame or text information is given as input, the proposed system extracts these two levels of information from the input, and they are matched with the database using the proposed PENN – the modified method from extended nearest neighbor (ENN) classification [13] – using probability modeling.
The major contributions made in the paper are given as follows:
A lecture video retrieval system is developed by combining the local vector pattern (LVP) and OCR with a classifier.
A new classifier is developed by modifying the ENN classifier including the membership degree.
The paper is organized as follows: Section 2 presents the motivation behind the approach. Section 3 explains the proposed video retrieval technique, and Section 4 presents the experimentation of the proposed technique. Finally, the conclusion is given in Section 5.
2 Motivation Behind the Approach
2.1 Problem Definition
Let us assume that the lecture video database D contains N videos subjected to various categories. The aim here is to retrieve k similar videos by inputting the query Q, which may be a video VQ or text TQ. The input database can be represented as follows:
where Vi is a video containing M number of frames. Every video is composed of a set of frames that are represented as follows:
Frames are two-dimensional vectors containing the pixel information, gx,y. Frames are the group of pixels having the dimension of mxn. It can be represented as
Finally, the objective of retrieving the similar videos VR from the input database for the input query can be indicated as follows:
2.2 Challenges
Due to the ever-increasing storing of information through videos, finding suitable videos for the user’s intent from the database is indispensable in the current world. This presents the challenge of searching and finding suitable and user-intent videos through the user query, which may be frames or videos.
In today’s world, lecture videos are playing a major role among students, for understanding and clarifying the algorithms or concepts published by eminent professors. This poses an additional challenge of identifying the most suitable lecture video for their query input, which may be image frame or text string.
The important challenge of converting the core content of the video to textual information has more practical challenges. The textual characteristics of the contents presented in every lecture video are completely different in line spacing, font, and size. In addition, the capturing characteristics such as illumination and intensity are also completely different. These practical challenges need to be considered.
In Ref. [16], content-based lecture video retrieval was developed using character recognition and speech processing. Considering both information seem a computation overhead and, mostly, both sources have the same information in two different formats. Thus, considering a single source of information to extract the suitable video is an important research issue to be solved.
3 Proposed Methodology: Multimodal Features and PENN for Content-Based Lecture Video Retrieval
This section presents the proposed methodology for lecture video retrieval using multimodal features and PENN classification. The input for the proposed technique is the lecture video database containing different subjects. The feature library is then constructed from the input videos after extracting keywords and texture content. The constructed feature library is utilized with the PENN classifier for finding the neighbor videos of the input query, which may be video or text. The PENN classifier finds the probability of belonging for every video based on the distance matching with query. Also, it considers the finding of two-level neighbors for the probability computation. Based on the neighbor videos found from the PENN classifier, the retrieval is performed. Figure 1 shows the block diagrams of the proposed video retrieval technique using the PENN classifier.
3.1 Extraction of Key Frames
This step is to extract the key frames from the input videos to find the feature information. The extraction of key frame is important for feature extraction because the input video may have large numbers of frames, so the extraction of feature information from every frame is difficult because it requires much computational complexity. Therefore, the right selection of key frames and the extraction of features only from the key frames signify better retrieval efficiency as well as effectiveness. In order to include this objective, each and every frame is subtracted from their corresponding previous frames, and the frames that have more difference are taken out as key frames. The frame having much difference with its previous frames also carries the significant information; it may be the next slide if the input is a presentation video. Now, the input video is shrunken with only the important frames, which are known as key frames:
where KFl denotes the key frames and L denotes the number of key frames. The number of key frames L should be fewer than the number of frames M in the input video Vi. Figure 2 shows the visualization of key frames from four categories of videos. Figure 2A is a video related to data mining, Figure 2B is related to image processing, Figure 2C is related to networking, and Figure 2D is related soft computing.
3.2 Construction of the Feature Library
The important phase of the video retrieval scheme is to construct the feature library that is the used to find the relevant videos. In this paper, we utilized two modalities for feature extraction. The first modality is textual information that is found from the lecture video using OCR [10], [11], [12], which is the most common method for finding the textual contents from image or video data. The OCR methods are given in Refs. [10], [11], [12]. We directly applied them to the video to find out the keywords present in the frames. The second modality utilized in this paper is visual content, which is extracted using the texture descriptor.
3.2.1 OCR on Key Frames
Once we identify the key frames from the input video, OCR is applied to the key frames to extract the keywords presented. The reason for selecting OCR as a feature vector is that the texts in the lecture slides are closely related to the lecture topic, and can thus provide important information for the retrieval task. The literature presents various algorithms for OCR. This paper utilizes the popular algorithm given in Refs. [10], [11], [12], which is based on the benchmark OCR framework called Tesseract. The extraction of keywords present in the lecture videos is explained using five important steps:
Step 1. Line finding: It directly reads the key frames and the lines are extracted using two main processes, called blob filtering and line construction. In blob filtering, the size of the characters is identified by finding median heights, which are then utilized for safely filtering out blobs. In the second process, line creation is performed by merging the blobs that overlap by at least half horizontally.
Step 2. Baseline fitting: Quadratic spline is utilized here to fit the baseline more accurately after finding the text lines. Here, blobs are partitioned into groups and the baselines are fitted with a realistically continuous displacement for the original straight baseline.
Step 3. Fixed pitch detection and chopping: Here, characters are segmented by checking the pitch of the text. The determination of the pitch information is carried out using Tesseract, which is then utilized to chop the words into characters for the word recognition step.
Step 4. Segmentation and search: Once the word is segmented, the potentiality is not good enough for the segmented words; the associator reads the words and performs the A* (best first) search on the segmentation graph to find the candidate characters and select the optimal character from the search results.
Step 5. Shape classification: Once the character is segmented, the recognition of the word is performed by finding the features using polygonal approximation [12]. The features were initially trained with different set of words, and the classification of the words is now found out using the trained classifier.
After performing the above steps, the words are recognized or found out from the key frames. Then, stop words like, “an,” “the,” “he,” “she,” “can,” and so on are removed from the recognized text to find the important keywords from the key frames:
where OCR(KFl) is the OCR on the key frames, Wp is keywords extracted, and Nw is the total number of keywords. Figure 3 shows the sample set of keywords extracted from the videos using OCR recognition.
3.2.2 LVP on Key Frames
To extract the visual features, a texture descriptor called LVP [4] is utilized here to find the important contents for effective retrieval. The reason for selecting texture feature is that texture can play a major role in computer recognition tasks because texture features are easy to understand, model, and process, and ultimately to simulate the human visual learning process using computer technologies. Here, key frames are directly given to the LVP operator, which provides texture histogram as feature content. The LVP of the key frame in δ direction of vector at r is mathematically given below:
where LVPd(•) refers to the LVP at neighborhood distance d and δ is the index angle.
where KF(δ,d) is the intensity of the pixel, which is located at d distance and δ angle from reference pixel r. Once we identify the texture image, the texture vector is found out by finding the histogram of the texture image. The texture histogram of the key frame is represented as follows:
where Rq is the count of the bin and 255 is the total number of bins. Figure 4 shows the visualization of the LVP of four videos from four different categories.
3.2.3 Feature Concatenation
Feature concatenation is a step used to store the features extracted from the videos in an organized way. The features from every video consist of the OCR words and LVP for every frame. For example, every video has L number of key frames, and every key frame has a vector of LVP feature and a set of keywords as feature elements. The feature vector for the input video Vi can be represented as follows:
The feature library f contains the feature of every videos fi given in the input database:
3.3 PENN Classifier for Video Retrieval
This section presents the proposed PENN classifier for video retrieval by matching the query video or text query with the feature library. The proposed PENN classifier is newly proposed here by extending the ENN classifier proposed in Ref. [13]. The ENN method considered neighbors of the retrieved neighbors for taking the decision of classification. However, the decisions based on the neighbor and on neighbors of neighbors were considered with equal importance to classify the data objects. In order to give different degrees of membership for the neighbors as well as the neighbors of neighbor, we have proposed a new mathematical model for better classification. The proposed PENN classifier considers different weightages for the first-level and second-level neighbors, and the membership degree is computed using the probability of assignment. Table 1 shows the algorithmic description of the PENN classifier.
1 | Algorithm: PENN classifier |
2 | Input: Feature library f, query Q, K |
3 | Output: Retrieved videos |
4 | Algorithm |
5 | Start |
6 | For i=1 to N do |
7 | Compute the probability, |
8 | Compute the cumulative probability, |
9 | end for |
10 | Compute a set R containing the K smallest from the cumulative probability |
11 | Return K videos |
12 | End |
Let us assume that the input query Q is passed through the PENN classifier for the retrieval of K neighbors from the feature library. At first, the query is matched to the feature library that contains the features of all the videos. Then, the top K neighbors are selected from the matching process using the following probability formula:
where Sim(Q,fi) is the similarity measure, which is computed by matching the feature of query video with the ith video in the feature library. The similarity measure is computed using the following equation:
Based on the above equation, the similarity measurement is performed for all the key frames and the frames having the minimum value are taken as the final similarity value of the query video with ith video. The similarity measure between two frames is the summation of the distance between the LVP vector and the keyword. Suppose a text query is given as the query input, then the keywords are only used for similarity measurement. The formula for computing the similarity between two frames is given as follows:
Once we find the probability measure for the query video with the videos in the database, the videos having the minimum probability is taken out as the K relevant videos of the input query. Then, these K videos act as query and their corresponding relevant videos are found out using the following equation:
The similarity of these two videos is found out using the above equation, and the cumulative probability to decide the relevant videos is found using the following equation:
where α and β are the weighted constants. Here, the first term references the probability of membership based on the first level of neighbors and the second term refers to the probability of membership degree based on the second level of neighbors. The probability of membership for the query video with all the videos can be obtained, and the top K video with the minimum probability is taken as the relevant video for the input query. Figure 5 shows the visualization of query videos and the retrieved results.
4 Results and Discussion
This section presents the experimental results and the comparative discussion with the existing methods using three different metrics.
4.1 Experimental Setup
The proposed multimodal features and PENN classification for content-based lecture video retrieval is implemented using MATLAB, and the performance of the proposed system and the existing system will be validated using the metrics called precision, recall, and F-measure.
Dataset description: The videos utilized for video retrieval is collected from the publicly available resources. In total, 40 videos are taken with four different categories, such as data mining, image processing, soft computing, and wireless communication. Every category contains 10 lecturer presentation videos.
Evaluation metrics: The performance of the proposed video retrieval is evaluated using precision, recall, and F-measure. The definitions of these metrics are given as follows:
where Nrel is number of relevant videos and Nret is the number of retrieved videos. Here, relevant videos are the manually classified videos and retrieved videos are the outputs obtained by the methods.
Parameters fixed: The parameters considered in the proposed method are k and radius (R). The k-value is the user-desired parameter because it is the number of videos the user wants to retrieve from the database. The value of R from LVP is analyzed, and the best value is suggested. The experimentation is here performed with four videos and four text queries. The four video queries are the queries taken from the input database of each category. The text queries utilized in the experimentation are {“data”, “image”, “network”, “computing”}. The comparison is performed with the ENN classifier described in Ref. [13] and the KNN classifier given in Ref. [14].
4.2 Analysis of k-Value from the PENN Classifier
This section presents the extensive analysis of the proposed video retrieval scheme for various numbers of k-value along with the video and text input query. Figure 6A shows the precision graph of four different video queries. Every query is from the four different categories of videos. After inputting the video query, the k-value, which means the number of retrieved images, is varied from 2 to 6 and the results are analyzed. From the results, we understand that VQ3 and VQ4 obtained the maximum accuracy of 83.3%, which is higher than that for the other video queries. VQ1 and VQ2 obtained the precision of 80%. Similarly, Figure 6B shows the precision graph for the text query. Here, the maximum accuracy of 90% is obtained when TQ4 is given as input and the k-value is 2. From both the precision graphs, we clearly understand that the precision value decreases whenever the k-value increases.
Figure 7 shows the recall graph for the four different queries with the various numbers of neighbors. From the results, we understand that the recall value is decreased when the k-value is increased. The maximum recall value of 76% is obtained when the input is VQ4 and the number of neighbors is 2. The maximum recall for VQ1, VQ2, and VQ3 are 74%, 74%, and 76%, respectively. Similarly, the recall values for the text query are analyzed in Figure 7B. This graph shows that the recall value for the input query TQ4 is 80%, which is higher when compared with that of other test queries. Also, the recall value decreases with increasing k-value.
Figure 8 shows the F-measure graphs for the video query and text query. For the video query, the maximum F-measure of 77.5% is obtained for the video query VQ4. The accuracy for VQ1, VQ2, VQ3, and VQ4 is 75.71%, 75.71%, 77.5%, and 77.5%, respectively, when the number of retrieved videos is equivalent to 2. Similarly, Figure 8B shows the F-measure graph for the text queries. Here, F-measure is decreased when the number of retrieved videos is increased. The maximum F-measure for the text queries TQ1, TQ2, TQ3, and TQ4 is 77.5%, 78%, 73.08%, and 83.3%, respectively. The minimum accuracy for all text queries is obtained when the number of retrieved images is equal to 6. The minimum accuracy value for all the text queries is 70%.
4.3 Analysis of Radius from LVP
This section presents the extensive analysis of the proposed video retrieval scheme for text and video queries. Here, the radial parameter R is analyzed for 1, 2, and 3. Figure 9A shows the precision graph for the video query for various values of radius. From the figure, we understand that the maximum precision is achieved when the radius is equal to 2 for video query VQ4. The minimum accuracy is obtained when the radius is equal to 1 for VQ2 and VQ3. Similarly, the precision graph is plotted for the various values of radius in Figure 9B. From the figure, we identify that the maximum accuracy of 90% is obtained when the radius is equal to 1 for text query TQ4.
Figure 10 shows the recall graph for the video and text queries for the various values of radius. For the radius value of 1, the maximum accuracy is 74%, which is obtained for video query VQ1, and 76% is obtained when the radius is equal to 2. Also, the maximum accuracy of 72% is achieved when the radius is equal to 3. Figure 10B shows the recall values of text queries TQ1, TQ2, TQ3, and TQ4. Here, the maximum recall value for radius values of 1, 2, and 3 is 80%, 72%, and 70%, respectively.
Figure 11 shows the F-measure values of video and text queries for various radius values. From Figure 11A, the better performance of 75.3% for the radius value of 1 is obtained for VQ1 and VQ4. Also, the better F-measure of 78% for the radius value of 2 is obtained for VQ4, and the maximum accuracy for the maximum radius is 72.67%, which is constant for all the video queries. Similarly, Figure 11B shows the F-measure graph for the text queries. Here, the maximum performance of 72.67% in TQ1 is obtained for the radius value of 3 and the overall maximum performance of 83.3% is obtained when the radius is equal to 1 for TQ4.
4.4 Comparative Analysis
The comparative analysis of the proposed technique with the existing technique is discussed in this section. Here, two proposed methods – PENN+VQ and PENN+TQ – are utilized. For the existing works, KNN+VQ, KNN+TQ, ENN+VQ, and ENN+TQ are taken for the comparative analysis. Here, KNN [14] and ENN [13] are the two existing classification methods. TQ and VQ are the text and video query methods utilized with the existing classifier. Here, four different video and text queries are taken for computing precision, recall, and F-measure. Then, the average performance on these four queries is utilized for plotting the graphs for various values of k. Figure 12A shows the precision comparison of the existing methods. From Figure 12A, we understand that the maximum precision of KNN+VQ, ENN+VQ, and PENN+VQ is 77.5%, 77.5%, and 78.3%, respectively. This shows that the proposed PENN obtained a higher precision value. Similarly, the maximum precision value for the KNN+TQ, ENN+TQ, and PENN+TQ is 77.5%, 76.67%, and 78%, respectively.
Figure 12B shows the recall plot for the proposed methods with the existing methods. From the figure, we understand that the maximum accuracy of 74% for the k-value of 2 is reached by PENN+TQ. The maximum accuracy of 73.5% for the k-value of 3 is reached by PENN+TQ; however, when the k-value is equal to 5, the maximum accuracy is reached by PENN+VQ. Overall, the maximum accuracy for the different k-values is reached either by PENN+VQ or PENN+TQ. Figure 13 shows the comparison of the proposed and existing methods using F-measure. Here, the maximum F-measure of 75.33% is obtained by the proposed PENN+TQ, which is much higher than that of the existing methods taken for the comparison. From the results, we clearly proved that the proposed method achieved better performance than the existing methods. The reason for the improvement over the existing methods is that the proposed PENN classifier only considers the different degrees of membership for the neighbors as well as the neighbors of neighbor within the PENN classifier, but the existing methods considered the equal weights.
5 Conclusion
We have presented a PENN classifier for video retrieval by inputting either video or text. Here, we combined the multiple modalities such OCR and texture-based video content features for the retrieval of lecture videos. For the identification of characters from the videos, we utilized the well-known recognition method called Tesseract, and the video content-based texture was extracted using an LVP descriptor. Finally, the retrieval of the user-required number of videos was performed using the proposed PENN classifier. The proposed PENN classifier considered the neighbors of the retrieved neighbors for taking the decision of classification by computing the probability of assignment. The experimentation was performed with the lecture video database collected from the publicly available resources, and the performance of the proposed video retrieval was evaluated using precision, recall, and F-measure. The results proved that the average precision of the proposed PENN+VQ is 78.3%, which is higher than that of the existing methods. In the future, this method can be enhanced with intent aware optimization, which will be applied after obtaining the retrieved videos.
Bibliography
[1] X. Che, H. Yang and C. Meinel, Lecture video segmentation by automatically analyzing the synchronized slides, in: Proceedings of the 21st ACM International Conference on Multimedia, pp. 345–348, ACM, 2013.10.1145/2502081.2508115Search in Google Scholar
[2] H. Chen, M. Cooper, D. Joshi and B. Girod, Multi-modal language models for lecture video retrieval, in: Proceedings of the ACM International Conference on Multimedia, pp. 1081–1084, 2014.10.1145/2647868.2654964Search in Google Scholar
[3] M. Cooper, Presentation video retrieval using automatically recovered slide and spoken text, in: Proceedings of SPIE, Multimedia Content and Mobile Devices, 2013.10.1117/12.2008433Search in Google Scholar
[4] K. C. Fan and T. Y. Hung, A novel local pattern descriptor-local vector pattern in high-order derivative space for face recognition, IEEE Trans. Image Process.23 (2014), 2877–2891.10.1109/TIP.2014.2321495Search in Google Scholar PubMed
[5] J. Han, X. Ji, X. Hu, J. Han and T. Liu, Clustering and retrieval of video shots based on natural stimulus fMRI, Neurocomputing144 (2014), 128–137.10.1016/j.neucom.2013.11.052Search in Google Scholar
[6] C. Kofler, M. Larson and A. Hanjalic, Intent-aware video search result optimization, IEEE Trans. Multimedia16 (2014), 1421–1433.10.1109/TMM.2014.2315777Search in Google Scholar
[7] Y. H. Lai and C. K. Yang, Video object retrieval by trajectory and appearance, IEEE Trans. Circuits Syst. Video Technol.25 (2015), 1026–1037.10.1109/TCSVT.2014.2358022Search in Google Scholar
[8] T. C. Lin, M. C. Yang, C. Y. Tsai and Y. C. F. Wang, Query-adaptive multiple instance learning for video instance retrieval, IEEE Trans. Image Process.24 (2015), 1330–1340.10.1109/TIP.2015.2403236Search in Google Scholar PubMed
[9] B. V. Patel and B. B. Meshram, Content based video retrieval systems, Int. J. UbiComp(IJU) 3 (2012), 13–30.10.5121/iju.2012.3202Search in Google Scholar
[10] R. Smith, An overview of the Tesseract OCR engine, in: Proceedings of the Ninth International Conference on Document Analysis and Recognition(ICDAR), vol. 2, pp. 629–633, 2007.10.1109/ICDAR.2007.4376991Search in Google Scholar
[11] R. Smith, Hybrid page layout analysis via tab-stop detection, in: Proceedings of 10th International Conference on Document Analysis and Recognition, 2009.10.1109/ICDAR.2009.257Search in Google Scholar
[12] R. Smith, D. Antonova and D. Lee, Adapting the Tesseract open source OCR engine for multilingual OCR, in: Proceedings of the International Workshop on Multilingual OCR, 2009.10.1145/1577802.1577804Search in Google Scholar
[13] B. Tang and H. He, ENN: extended nearest neighbor method for pattern recognition, Comput. Intell. Mag. IEEE10 (2015), 52–60.10.1109/MCI.2015.2437512Search in Google Scholar
[14] X. Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M. Steinbach, D. J. Hand and D. Steinberg, Top 10 algorithms in data mining, Knowl. Inform. Syst.14 (2007), 1–37.10.1007/s10115-007-0114-2Search in Google Scholar
[15] P. Yadav, Case retrieval algorithm using similarity measure and adaptive fractional brain storm optimization for health informaticians, Arab. J. Sci. Eng.41 (2016), 829–840.10.1007/s13369-015-1928-ySearch in Google Scholar
[16] H. Yang and C. Meinel, Content based lecture video retrieval using speech and video text information, IEEE Trans. Learn. Technol.7 (2014), 142–154.10.1109/TLT.2014.2307305Search in Google Scholar
[17] H. Yang, H. Sack and C. Meinel, Lecture video indexing and analysis using video OCR technology, in: Proceedings of Seventh International Conference on Signal-Image Technology and Internet-Based Systems (SITIS), pp. 54–61, 2011.10.1109/SITIS.2011.20Search in Google Scholar
[18] H. Yang, M. Siebert, P. Lühne, H. Sack and C. Meinel, Automatic lecture video indexing using video OCR technology, in: Proceedings of IEEE International Symposium on Multimedia(ISM), pp. 111–116, 2011.10.1109/ISM.2011.26Search in Google Scholar
[19] H. Yang, F. Grünewald, M. Bauer and C. Meinel, Lecture video browsing using multimodal information resources, in: Lecture Notes in Computer Science, Advances in Web-Based Learning, vol. 8167, pp. 204–213, 2013.Search in Google Scholar
©2017 Walter de Gruyter GmbH, Berlin/Boston
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.