Abstract
Specifying exact query concepts has become increasingly challenging to end-users. This is because many query concepts (e.g., those for looking up a multimedia object) can be hard to articulate, and articulation can be subjective. In this study, we propose a query-concept learner that learns query criteria through an intelligent sampling process. Our concept learner aims to fulfill two primary design objectives: (1) it has to be expressive in order to model most practical query concepts and (2) it must learn a concept quickly and with a small number of labeled data since online users tend to be too impatient to provide much feedback. To fulfill the first goal, we model query concepts in k-CNF, which can express almost all practical query concepts. To fulfill the second design goal, we propose our maximizing expected generalization algorithm (MEGA), which converges to target concepts quickly by its two complementary steps: sample selection and concept refinement. We also propose a divide-and-conquer method that divides the concept-learning task into G subtasks to achieve speedup. We notice that a task must be divided carefully, or search accuracy may suffer. Through analysis and mining results, we observe that organizing image features in a multiresolution manner, and minimizing intragroup feature correlation, can speed up query-concept learning substantially while maintaining high search accuracy. Through examples, analysis, experiments, and a prototype implementation, we show that MEGA converges to query concepts significantly faster than traditional methods.
- Ankerst, M., Elsen, C., Ester, M., and Kriegel, H.-P. 1999. Visual classification: An interactive approach to decision tree construction. In Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining. ACM, New York, 392--396.]] Google ScholarDigital Library
- Bartolini, I., Ciaccia, P., and Waas, F. 2001. Feedbackbypass: A new approach to interactive similarity query processing. In Proceedings of the 27th VLDB Conference, 201--210.]] Google ScholarDigital Library
- Breiman, L. 1996. Bagging predicators. Mach. Learn. 24, 123--140.]] Google ScholarDigital Library
- Breiman, L. 1998. Arcing classifiers. Ann. Stat. 26, 801--849.]]Google ScholarCross Ref
- Chang, E., Cheng, K.-T., and Chang, L. 2001a. PBIR---Perception-based image retrieval. In Proceedings of the ACM SIGMOD (Demo). ACM New York, 613--614.]] Google ScholarDigital Library
- Chang, E., Cheng, K.-T., Lai, W.-C., Wu, C.-T., Chang, C.-W., and Wu, Y.-L. 2001b. PBIR---A system that learns subjective image query concepts. In Proceedings of ACM Multimedia, http://www.mmdb.ece.ucsb.edu/∼demo/corelacm/. ACM, New York, 611--614.]] Google ScholarDigital Library
- Chang, E., Li, B., and Li, C. 2000. Towards perception-based image retrieval. In Proceedings of IEEE Content-Based Access of Image and Video Libraries. IEEE Computer Society Press, Los Alamitos, Calif., 101--105.]] Google ScholarDigital Library
- Chen, C. 1996. Fuzzy Logic and Neural Network Handbook. McGraw-Hill, New York.]] Google ScholarDigital Library
- Cox, I. J., Miller, M. L., Minka, T. P., Papathomas, T. V., and Yianilos, P. N. 2000. The bayesian image retrieval system, pichunter: Theory, implementation and psychological experiments. IEEE Trans. Image Proc. 9, 1, 20--31.]]Google ScholarDigital Library
- Fagin, R. 1998. Fuzzy queries in multimedia database systems. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, 1--10.]] Google ScholarDigital Library
- Fagin, R. and Wimmers, E. L. 1997. A formula for incorporating weights into scoring rules. In Proceedings of the International Conference on Database Theory. 247--261.]] Google ScholarDigital Library
- Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., and Equitz, W. 1994. Efficient and effective querying by image content. J. Int. Inf. Syst. Integ. Artif. Intel. Datab. Tech. 3, 3-4, 231--262.]] Google ScholarDigital Library
- Freund, Y., Seung, H. S., Shamir, E., and Tishby, N. 1997. Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133--168.]] Google ScholarDigital Library
- Fukanaga, K. 1990. Introduction to Statistical Pattern Recognition. Academic Press, Orlands, Fla.]] Google ScholarDigital Library
- Goh, K., Li, B., and Chang, E. 2002. Dyndex---An dynamic and nonmetric space indexer. In Proceedings of the ACM International Conference on Multimedia. ACM, New York, 466--475.]] Google ScholarDigital Library
- Ishikawa, Y., Subramanya, R., and Faloutsos, C. 1998. Mindreader: Querying databases through multiple examples. In Proceedings of the Symposium on Very Large DataBases (VLDB). 218--227.]] Google ScholarDigital Library
- Jones, K. S. and Willet, P. W. 1997. Readings in Information Retrieval. Morgan-Kaufman, San Mateo, Calif.]]Google Scholar
- Kearns, M., Li, M., and Valiant, L. 1994. Learning Boolean formulae. J. ACM, 41, 6, 1298--1328.]] Google ScholarDigital Library
- Kearns, M. and Vazirani, U. 1994. An Introduction to Computational Learning Theory. MIT Press, Cambridge Mass.]] Google ScholarDigital Library
- Li, B., Lai, W.-C., Chang, E., and Cheng, K.-T. 2001. Mining image features for efficient query processing. In Proceedings of the 1st IEEE Data Mining Conference (San Jose, Calif.). IEEE Computer Society Press, Los Alamitos, Calif., 353--360.]] Google ScholarDigital Library
- Li, C., Chang, E., Garcia-Molina, H., and Wiederhold, G. 2002. Clustering for approximate similarity queries in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. 14, 4, 792--808.]] Google ScholarDigital Library
- Li, J., Wang, J. Z., and Wiederhold, G. 2000. IRM: Integrated region matching for image retrieval. ACM Multimedia, 147--156.]] Google ScholarDigital Library
- Mitchell, T. M. 1997. Machine Learning. McGraw-Hill, New York.]] Google ScholarDigital Library
- Natsev, A., Rastogi, R., and Shim, K. 1999. Walrus: A similarity retrieval algorithm for image databases. In Proceedings of the ACM SIGMOD. ACM, New York, 395--406.]] Google ScholarDigital Library
- Ortega, M., Rui, Y., Chakrabarti, K., Warshavsky, A., Mehrotra, S., and Huang, T. S. 1999. Supporting ranked Boolean similarity queries in mars. IEEE Trans. Knowl. Data Eng. 10, 6 (Dec.), 905--925.]] Google ScholarDigital Library
- Perner, P., Zscherpel, U., and Jacobsen, C. 2001. A comparision between neural networks and decision trees based on data from industrial radiographic testing. Patt. Recog. Lett. 22, 47--54.]] Google ScholarDigital Library
- Porkaew, K., Chakrabarti, K., and Mehrotra, S. 1999a. Query refinement for multimedia similarity retrieval in mars. In Proceedings of ACM Multimedia. ACM, New York, 235--238.]] Google ScholarDigital Library
- Porkaew, K., Mehrota, S., and Ortega, M. 1999b. Query reformulation for content based multimedia retrieval in mars. In Proceedings of the IEEE International Conference on Multimedia Computing and Systems. IEEE Computer Society Press, Los Alamitos, Calif., 747--751.]] Google ScholarDigital Library
- Raghu, P., Poongodi, R., and Yegnanarayana, B. 1997. Unsupervised texture classification using vector quantization and deterministic relaxation neural network. IEEE Trans. Image Proc. 6, 10, 1376--1387.]]Google ScholarDigital Library
- Robinson, J. A. and Voronkov, A. 2000. Handbook of Automated Reasoning. Elsevier Science Publishers, Amsterdam, The Netherlands.]] Google ScholarDigital Library
- Rui, Y. and Huang, T. 2000. Optimizing learning in image retrieval. In Proceedings of IEEE Computer Vision and Pattern Recognition. IEEE Computer Society Press, Los Alamitos, Calif., 236--243.]]Google Scholar
- Rui, Y., Huang, T. S., Ortega, M., and Mehrotra, S. 1998. Relevance feedback: A power tool in interactive content-based image retrieval. IEEE Trans. Circ. Syst. Video Tech. 8, 5 (Sept.), 644--655.]]Google Scholar
- Schapire, R. 1999. Theoretical views of boosting and applications. In Proceedings of the 10th International Conference on Algorithmic Learning Theory. 13--25.]] Google ScholarDigital Library
- Schapire, R., Freund, Y., Bartlett, P., and Lee, W. 1997. Boosting the margin: A new explanation for the effectiveness of voting methods. In Proceeding of the 14th International Conference on Machine Learning. Morgan-Kaufmann, San Mateo, Calif.]] Google ScholarDigital Library
- Smith, J. and Chang, S.-F. 1997. An image and video search engine for the world-wide web. Storage and Retrieval for Image and Video Databases V, Proc SPIE 3022, 84--95.]]Google Scholar
- Tong, S. and Chang, E. 2001. Support vector machine active learning for image retrieval. In Proceedings of the ACM Multimedia. ACM, New York, 107--118.]] Google ScholarDigital Library
- Valiant, L. 1984. A theory of learnable. In Proceedings of the 16th Annual ACM Symposium on Theory of Computing. ACM, New York, 436--445.]] Google ScholarDigital Library
- Vapnik, V. 1998. Statistical Learning Theory. Wiley, New York.]] Google ScholarDigital Library
- Wu, L., Faloutsos, C., Sycara, K., and Payne, T. R. 2000. Falcon: Feedback adaptive loop for content-based retrieval. In Proceedings of the 26th VLDB Conference. 297--306.]] Google ScholarDigital Library
- Zadeh, L. A. 1965. Fuzzy sets. Inf. Cont. 338--353.]]Google Scholar
- Zemke, S. 1999. Bagging imperfect predictors. In Proceedings of Artificial Neural Networks in Engineering. 1067--1072.]]Google Scholar
Index Terms
MEGA---the maximizing expected generalization algorithm for learning complex query concepts
Recommendations
On scalability of active learning for formulating query concepts
CVDB '04: Proceedings of the 1st international workshop on Computer vision meets databasesQuery-by-example and query-by-keyword both suffer from the problem of "aliasing," meaning that example-images and keywords potentially have variable interpretations or multiple semantics. For discerning which semantic is appropriate for a given query, ...
Support vector machine active learning for image retrieval
MULTIMEDIA '01: Proceedings of the ninth ACM international conference on MultimediaRelevance feedback is often a critical component when designing image databases. With these databases it is difficult to specify queries directly and explicitly. Relevance feedback interactively determinines a user's desired output or query concept by ...
Mining Image Features for Efficient Query Processing
ICDM '01: Proceedings of the 2001 IEEE International Conference on Data MiningThe number of feature required to depict an image can be very large. Using all features simultaneously to measure image similarity and to learn image query-concepts can suffer from the problem of dimensionality curse ,which degrades both search accuracy ...
Comments