CSQuaRE: Approach for Quality Control in Crowdsourcing

Mohan Sanagavarapu, Lalit

doi:10.1007/978-3-319-60131-1_47

Lalit Mohan Sanagavarapu¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10360))

Included in the following conference series:

International Conference on Web Engineering

2247 Accesses

Abstract

Quality control of responses enhances sustainability and adoption of crowdsourcing. Expert and peer reviews, majority voting, machine learning, game theory, etc. are some of the practices for quality control in crowdsourcing. However, quality of crowdsourced responses is still a concern. We propose a quality control approach drawing inspiration from Requirements Engineering quality attributes - Completeness, Consistency and Correctness (3Cs). The 3Cs of a response are assessed and displayed as a CSQuaRE score based on coverage with reference to knowledge base (ontology), cohesiveness and contributor credibility in the domain. The knowledge base would evolve with continued extraction of instances of information and thus responses would be re-calibrated for relevance. The suggested approach would be demonstrated for Information Security related Question and Answers on a crowdsourcing platform. The evaluation of the approach would be based on comparison with existing quality control techniques and feedback from security experts.

You have full access to this open access chapter, Download conference paper PDF

Factors That Influence the Quality of Crowdsourcing

A Crowdsourcing Methodology to Measure Algorithmic Bias in Black-Box Systems: A Case Study with COVID-Related Searches

Improving detection of web service antipatterns using crowdsourcing

Article 18 October 2021

Keywords

1 Motivation and Background

Improving penetration of internet^{Footnote 1} and digital literacy is leading to the growth of crowdsourcing (image recognition, language translation, responses to questions and other micro and macro tasks). For quick money or due to lack of complete and correct knowledge, responses suffer in quality. The current quality control processes use majority voting, peer-reviews, data mining, fault-tolerant sub-tasks, game theory and other hybrid modes [6]. Crowdsourcing Q&A platforms such as StackExchange and Quora use majority voting for assessing the quality of responses, this would mean manual intervention and latency till viewers rate/vote. We extracted 135 responses of a sample 50 questions^{Footnote 2} on ‘Phishing’ from Quora. More than 77% of the responses were not related answers. Among the related and relevant responses, greater than 90% responses had less/no votes from crowd though they were semantically same from a relevant and higher rated answer. Lack of recognition (viewer rating) could de-motivate intrinsic contributors. Also, we conducted an online survey^{Footnote 3} in Sep’2016 approach to understand the quality concerns in crowdsourcing. Social media sites including CrowdsourcingWeek^{Footnote 4} Linkedin group were used for survey participation. The majority of the survey respondents were IT savvy working professionals from Asian countries. 76% of these 212 respondents stated that crowdsourced responses have poor quality. The quality gaps in crowdsourcing of Q&A is the motivation for this research.

We propose Completeness, Consistency and Correctness (3Cs) approach adopted from Software Requirements Engineering (RE) for quality control of crowdsourced Question and Answers. Like software products that are built based on stakeholder(s) requirements (understanding and knowledge level), the responses in crowdsourcing are also based on workers’ knowledge level. Hence, we hypothesize that the rigor of 3Cs will differentiate good from bad responses, thus leads to quality control. Though there are many other RE quality characteristics such as traceability, modifiability, unambiguity, etc., the importance of 3Cs is unequivocally stated in research publications [1, 10, 14], ISO/IEC 25010:2011^{Footnote 5} standards and Gartner^{Footnote 6} market research.

An example of completeness of a response for a question on ‘what are the key characteristics of Information Security’ is ‘Confidentially, Integrity and Availability’. Completeness has puritan view with many forms such as Functional, Syntactic, Semantic, etc. Taking direction from Gabriel’s comments in ‘The rise of worse is better’, we measure Completeness as the degree of coverage of real world situations in the response(s) ensuring unnecessary or irrelevant features are not captured. Obtaining complete information for a domain is never ending problem [3]. Hence, our completeness measure for a response is with reference to extracted knowledge base (KB), termed as Adequate Completeness ACP.

Consistency is the measure of conflict free sentences of the response with respect to the objective (question). An example of consistency in a response as extracted from a crowdsourcing platform for a question on ‘What are the security features of Amex credit card’ is ‘Amex credit card has 2 levels of security: they have the normal CVV (Card Verification Value) and the 3 digits are a CID (Customer Card Identity). CVV is a calculated highly secure 4 digit code based on your card number that is not contained in the card magnetic strip’. Based on evolving Ontology^{Footnote 7} with increasing instances of KB, our consistency ACN is measure of conflict free tuples (Concept + Relationship + Concept) in the response. This would also mean that the response is not just a bag of words but sentences that are cohesive and are conflict free. The history of past contributions (credibility) of workers’ in the topic (Question-Answer) is also a factor in our consistency measure. We relate credibility to consistency rather correctness as they are measures of trust rather rightness.

Correctness is the degree to which a response contains conditions and limitations for the desired capability (question). Hence, a response correctness is not necessarily a binary (Yes/No or True/False) but a degree of match/similarity. An example of correctness of a response is ‘Authentication is used for providing an access entry into the system’. Like completeness, our correctness [3] is a measurement with respect to the extracted KB. The adequate correctness ACR of a crowdsourced response is based on the occurrences of semantically similar content in the extracted KB and the relation to question type (What, Why, When, Where, Who and How - 5W & 1H).

We propose CSQuaRE score based on ACP, ACN and ACR for assessing CrowdSourced responses Quality using Requirements Engineering approach of information security related questions. Our proposed approach would be demonstrated for ‘Information Security’ crowdsourced responses. Our past experiences^{Footnote 8} and existence of security related information exchange platforms such as StackExchange, AlienVault^{Footnote 9}, etc. provide confidence that individuals are comfortable in seeking and responding to security related questions on public platforms. Crowdsourcing Week, a leading website on crowdsourcing has identified security information exchange as one of the top emerging trends.

2 Research Questions

Addressing the following research questions would provide a quantitative measure CSQuaRE, for assessing 3Cs in a response.

(Q1) What are the dimensions of completeness, correctness and consistency of a response that can be measured automatically?

Increasing KB for completeness leads to inconsistency. Completeness and consistency of a response enhances correctness. The interplay among these 3Cs has to be identified to avoid double-counting or negation in CSQuaRE calculation, this includes the degree of relationship - linear/polynomial.
(Q2) What is the credibility of a worker in the past while responding to questions in a specific topic/domain?

Most of the existing crowdsourcing platforms limit credibility assessment at platform and/or tasks level. The crowd worker may not be active on the crowdsourcing platform but may have deep knowledge in the question domain and could be prolific contributor on other internet sites. This research includes identification of the person and his/her credibility in the specific question domain from the obtained KB.
(Q3) What is the temporal effect on the response of a question with respect to completeness, consistency and correctness?

As KB increases with time, the response that had a certain CSQuaRE may change over a period. As an example, strength of cryptography algorithms has improved from SHA1 to SHA2 and so on. Hence, responses’ CSQuaRE requires re-calibration to maintain 3Cs.

Part of our research, we also plan to crawl internet to extract security related information, conduct a study on importance of text cohesion in crowdsourced responses and develop an evolving ontology (KB) based on newer instances of extracted information.

3 Related Work

The related work describes quality control in crowdsourcing, 3Cs and its attributes for quality control, credibility assessment and Q&A platforms.

3.1 Quality in Crowdsourcing

Afra et al. [8] used credibility based on past contributions and contributors mobility pattern for quality control. Aroyo et al. [11] performed quality assessments of Q&A postings on disagreement-based metrics to harness human interpretation. In a recent study, Bernstein et al. [12] discusses on reputation of crowd workers and importance of peer-reviews. The existing quality control mechanisms use hard wired mechanism and are not multi-dimensional model. Some of the other related literature discusses usage of game theory such as multi-armed bandit, better task clarity, effect of cascade model, Groundtruth, experience and language nativity for evaluating quality of workers/tasks. We plan to use reputation/credibility of crowd workers based on past contributions and Ground truth in the form of KB in our quality control approach.

3.2 Completeness, Consistency and Correctness

Siegemund et al. [9] uses ontology model for identification of consistency and completeness of evolving requirements. Lami et al. [7] presented a methodology and tool for evaluation of natural language based requirements for consistency and completeness. The work of McCalls quality model and Zowghi et al. [15] identified the interplay among 3Cs that we plan to extend in our CSQuaRE measurement. The behavioural aspects of the worker such as credibility based on past contributions and profile attribute to consistency in the quality. We plan to extend the work of Kumaraguru et al. [4] on Twitter tweets for credibility score of responses specific to the question domain.

3.3 Question and Answers

Question Answering systems transformed much in the last four decades on par with natural language processing (NLP) techniques. In 1978, the first classic Q&A book published based on Lehnert’s thesis provided fundamental basis for research. Availability of TREC corpus, research in Biomedical domain gave impetus to Q&A platforms. Hirschman et al. [5] factored importance of completeness and correctness in Q&A platforms. The articles and publications of AnswerBus [13], START from MIT, etc. state the usage of advancements in NLP and AI for providing the services. Publicly available literature on IBM Watson, Apple Siri, etc. states the Q&A usefulness and the importance of continuous evolution/training based on KB. While there is no involvement of crowd workers in any of these platforms for quality control, their success on relying evolving KB and ensuring cohesion in responses aligns with our approach.

As evident from the reviewed literature, there is no comprehensive approach for quality control of crowdsourced responses using credibility of worker in the internet, temporal affect on the quality of response, domain knowledge for completeness, consistency and correctness measurement.

4 Proposed Approach

To demonstrate 3Cs approach for quality control, we plan to build a crowdsourcing platform prototype using available open source Q&A software after technical and functional evaluation^{Footnote 10}. The following sections describe the progress of work in terms of ‘In-progress’ and ‘Yet to begin’ for addressing the research questions. The schematic Fig. 1 depicts the approach for implementing quality control of crowdsourced responses.

4.1 Building Domain Repository: In-Progress

The basis for measurement of CSQuaRE is based on the extracted domain content. 934,000+ security related URLs^{Footnote 11} are obtained from Wikipedia and Twitter. These URLs are categorized into 14 groups and 114 controls as available in ISO/IEC 27001:2013^{Footnote 12} to ensure representation across sub-domains. The crawled content based on seed URLs are cleansed (stop word removal and stemming) and classified into security sub-domains. As there is no prevalent security search engine, we plan to provide an interface (Search Engine) to the extracted domain content by June 2017. This would also provide a user base and an opportunity to seek feedback on data relevance by security experts associated with Banking Community and DSCI^{Footnote 13}, a voluntary organization with security experts as its members.

4.2 Ontology Evolution: In-Progress

The extracted domain content would be represented as an ordered pair of TBox (Concepts and Relationships) and ABox (Assertions) using existing Security Ontology [2] and Word2Vec. We use Word2Vec for similarity mapping between Ontology terms and extracted internet content, Word2Vec is trained on 100 Billion words of GoogleNews Archives and provides. This ordered pair (TBox and ABox) of ontology would be considered as knowledge base (KB). This KB would be used for evaluating the ACP, ACN and ACR of crowdsourced responses. However, the KB needs update with change in time as the concepts and relationships of a domain evolve or change and thus re-calibration of the past responses. Automatically updating ontology based on increased domain content may not be acceptable to ontologists. An observable pattern of KB would be identified for ontologists to update the security ontology. The observable pattern of extracted domain content would be assessed on text cohesion, relevance of the text to 114 controls of security and credibility of the information source. The text of the responses will also be used for ontology evolution as the responses may contain information that is not available in the crawled content and this text could be used for quality control of related future questions.

4.3 Credibility: Yet to Begin

As stated earlier, credibility in a question domain is part of our consistency measure. We plan to have login to our crowdsourcing platform based on user’s Twitter ID. The crowd worker credibility would be based on past contributions on crowdsourcing platform and the credibility score on Twitter in the question-answer related topic. Also, we are in the process of evaluating credibility^{Footnote 14} of website containing the information security content, to ensure not every available content is being used for ontology evolution.

$$Credibility = \lbrace Twitter Cred, Site, Contributions... \rbrace $$

4.4 Assignment of CSQuaRE: Yet to Begin

A question posted on the crowdsourcing platform may have one or more responses. Every response would be assigned a CSQuaRE score based on ACP, ACN and ACR. Information Retrieval evaluation metrics such as Recall, Latent Semantic Index, etc. are being explored for measuring Completeness (ACP). NLP techniques such as cohesiveness (part of discourse analysis to ensure responses are not just bag of words but sentences that are related) and individual credibility in the domain based on past contributions would be part of Consistency (ACN) measure. The users in our proposed crowdsourcing platform will also have voting option for the responses, this voting would act as feedback loop for improving the credibility of the crowd worker. For the Correctness (ACR) measure, we are exploring Machine Learning (Decision Tree and SVM) approaches for matching Question Type vis-a-vis response and FrameNet^{Footnote 15} to obtain semantic similarity of response with reference to the KB.

The initial weights for each of the components (ACP, ACN and ACR) of CSQuaRE score would be equal, scaled to 10 (0 being unrelated and 10 being highest) and will refine based on the feedback loop. Some of the factors in calculating score are

$$ACP = \lbrace TermCoverage, OntologyDepth, .... \rbrace $$

$$ACN = \lbrace DLMatch, Individual Credibility, ... \rbrace $$

$$ACR = \lbrace QuestionType, Response Similarity, ... \rbrace $$

$$CSQuaRE = \lbrace ACP, ACN, ACR \rbrace $$

5 Evaluation Plan

An empirical approach would be used for validating the solutioning of research problems. We plan to extract questions and responses from StackExchange that are related to 114 control groups of ISO 27001, with more than 3 respondents and are rated by viewers. We provide these responses to Security Experts and ask them to evaluate the relevance of response on a scale 0–10, 0 - being unrelated and 10 - being highest. We assess the CSQuaRE of these responses on our crowdsourcing platform making credibility score a constant. We hypothesize that CSQuaRE score should be similar to the score assessed by security experts. The viewer rating of StackExchange responses may also be high for responses that are assessed high by our approach. We also plan to perform a control data experiment to assess CSQuaRE applicability. As part of the evaluation, each of the measures such as ACP, ACN and ACR would be made constant to measure their effectiveness for quality control.

Also, a survey would be conducted to get the feedback on CSQuaRE from security experts. This survey would guide us in identifying gaps and scope for further refinement of the approach.

Notes

References

Boehm, B.W., Brown, J.R., Lipow, M.: Quantitative evaluation of software quality. In: Proceedings of the 2nd International Conference on Software Engineering, pp. 592–605. IEEE Computer Society Press (1976)
Google Scholar
Ekelhart, A., Fenz, S., Klemen, M.D., Weippl, E.R.: Security ontology: simulating threats to corporate assets. In: Bagchi, A., Atluri, V. (eds.) ICISS 2006. LNCS, vol. 4332, pp. 249–259. Springer, Heidelberg (2006). doi:10.1007/11961635_17
Chapter Google Scholar
Gardner, M., Talukdar, P.P., Kisiel, B., Mitchell, T.: Improving learning and inference in a large knowledge-base using latent syntactic cues (2013)
Google Scholar
Gupta, A., Kumaraguru, P., Castillo, C., Meier, P.: TweetCred: real-time credibility assessment of content on Twitter. In: Aiello, L.M., McFarland, D. (eds.) SocInfo 2014. LNCS, vol. 8851, pp. 228–243. Springer, Cham (2014). doi:10.1007/978-3-319-13734-6_16
Google Scholar
Hirschman, L., Gaizauskas, R.: Natural language question answering: the view from here. Nat. Lang. Eng. 7(04), 275–300 (2001)
Article Google Scholar
Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on Amazon mechanical Turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, pp. 64–67. ACM (2010)
Google Scholar
Lami, G., Gnesi, S., Fabbrini, F., Fusani, M., Trentanni, G.: An automatic tool for the analysis of natural language requirements. Informe técnico, CNR Information Science and Technology Institute, Pisa, Italia, Setiembre (2004)
Google Scholar
Mashhadi, A.J., Capra, L.: Quality control for real-time ubiquitous crowdsourcing. In: Proceedings of the 2nd International Workshop on Ubiquitous Crowdsouring, pp. 5–8. ACM (2011)
Google Scholar
Siegemund, K., Thomas, E.J., Zhao, Y., Pan, J., Assmann, U.: Towards ontology-driven requirements engineering. In: Workshop Semantic Web Enabled Software Engineering at 10th International Semantic Web Conference (ISWC), Bonn (2011)
Google Scholar
Tamai, T., Kamata, M.I.: Impact of requirements quality on project success or failure. In: Lyytinen, K., Loucopoulos, P., Mylopoulos, J., Robinson, B. (eds.) Design Requirements Engineering: A Ten-Year Perspective. LNBIP, vol. 14, pp. 258–275. Springer, Heidelberg (2009). doi:10.1007/978-3-540-92966-6_15
Chapter Google Scholar
Timmermans, B., Aroyo, L., Welty, C.: Crowdsourcing ground truth for question answering using crowdtruth. In: Proceedings of the ACM Web Science Conference, p. 61. ACM (2015)
Google Scholar
Whiting, M.E., Gamage, D., Gaikwad, S., Gilbee, A., Goyal, S., Ballav, A., Majeti, D., Chhibber, N., Richmond-Fuller, A., Vargus, F., et al.: Crowd guilds: worker-led reputation and feedback on crowdsourcing platforms. arXiv preprint arXiv:1611.01572 (2016)
Zheng, Z.: Answerbus question answering system. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 399–404. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Zowghi, D., Gervasi, V.: The three Cs of requirements: consistency, completeness, and correctness. In: International Workshop on Requirements Engineering: Foundations for Software Quality, pp. 155–164. Essener Informatik Beitiage, Essen (2002)
Google Scholar
Zowghi, D., Gervasi, V.: On the interplay between consistency, completeness, and correctness in requirements evolution. Inf. Softw. Technol. 45(14), 993–1009 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Software Engineering Research Center, International Institute of Information Technology, Gachibowli, Hyderabad, 500036, India
Lalit Mohan Sanagavarapu

Authors

Lalit Mohan Sanagavarapu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lalit Mohan Sanagavarapu .

Editor information

Editors and Affiliations

ICREA, Barcelona, Spain
Jordi Cabot
Roma Tre University, Rome, Italy
Roberto De Virgilio
Roma Tre University, Rome, Italy
Riccardo Torlone

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mohan Sanagavarapu, L. (2017). CSQuaRE: Approach for Quality Control in Crowdsourcing. In: Cabot, J., De Virgilio, R., Torlone, R. (eds) Web Engineering. ICWE 2017. Lecture Notes in Computer Science(), vol 10360. Springer, Cham. https://doi.org/10.1007/978-3-319-60131-1_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-60131-1_47
Published: 01 June 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-60130-4
Online ISBN: 978-3-319-60131-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

CSQuaRE: Approach for Quality Control in Crowdsourcing

Abstract

Similar content being viewed by others

Factors That Influence the Quality of Crowdsourcing

A Crowdsourcing Methodology to Measure Algorithmic Bias in Black-Box Systems: A Case Study with COVID-Related Searches

Improving detection of web service antipatterns using crowdsourcing

Keywords

1 Motivation and Background

2 Research Questions

3 Related Work

3.1 Quality in Crowdsourcing

3.2 Completeness, Consistency and Correctness

3.3 Question and Answers

4 Proposed Approach

4.1 Building Domain Repository: In-Progress

4.2 Ontology Evolution: In-Progress

4.3 Credibility: Yet to Begin

4.4 Assignment of CSQuaRE: Yet to Begin

5 Evaluation Plan

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

CSQuaRE: Approach for Quality Control in Crowdsourcing

Abstract

Similar content being viewed by others

Factors That Influence the Quality of Crowdsourcing

A Crowdsourcing Methodology to Measure Algorithmic Bias in Black-Box Systems: A Case Study with COVID-Related Searches

Improving detection of web service antipatterns using crowdsourcing

Keywords

1 Motivation and Background

2 Research Questions

3 Related Work

3.1 Quality in Crowdsourcing

3.2 Completeness, Consistency and Correctness

3.3 Question and Answers

4 Proposed Approach

4.1 Building Domain Repository: In-Progress

4.2 Ontology Evolution: In-Progress

4.3 Credibility: Yet to Begin

4.4 Assignment of CSQuaRE: Yet to Begin

5 Evaluation Plan

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation