Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China

Authors:
Zachary Weinberg

University of Massachusetts, Amherst, USA

University of Massachusetts, Amherst, USA
View Profile

,
Diogo Barradas

INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal

INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Portugal
View Profile

,
Nicolas Christin

Carnegie Mellon University, USA

Carnegie Mellon University, USA
View Profile

Authors Info & Claims

WWW '21: Proceedings of the Web Conference 2021April 2021Pages 472–483https://doi.org/10.1145/3442381.3450076

Published:03 June 2021Publication History

WWW '21: Proceedings of the Web Conference 2021

Pages 472–483

ABSTRACT

The Great Firewall of China (GFW) prevents Chinese citizens from accessing online content deemed objectionable by the Chinese government. One way it does this is to search for forbidden keywords in unencrypted packet streams. When it detects them, it terminates the offending stream by injecting TCP RST packets, and blocks further traffic between the same two hosts for a few minutes.

We report on a detailed investigation of the GFW’s application-layer understanding of HTTP. Forbidden keywords are only detected in certain locations within an HTTP request. Requests that contain the English word “search” are inspected for a longer list of forbidden keywords than requests without this word. The firewall can be evaded by bending the rules of the HTTP specification. We observe censorship based on the cleartext TLS Server Name Indication (SNI), but we find no evidence for bulk decryption of HTTPS.

We also report on changes since 2014 in the contents of the forbidden keyword list. Over 85% of the forbidden keywords have been replaced since 2014, with the surviving terms referring to perennially sensitive topics. The new keywords refer to recent events and controversies. The GFW’s keyword list is not kept in sync with the blocklists used by Chinese chat clients.

References

Nicholas Aase, Jedidiah R. Crandall, Álvaro Díaz, Jeffrey Knockel, Jorge Ocaña Molinero, Jared Saia, Dan Wallach, and Tao Zhu. 2012. Whiskey, Weed, and Wukan on the World Wide Web: On Measuring Censors’ Resources and Motivations. In Free and Open Communications on the Internet. USENIX, Berkeley, CA, Article 17, 7 pages. https://www.usenix.org/system/files/conference/foci12/foci12-final17.pdfGoogle Scholar
Anonymous. 2014. Towards a Comprehensive Picture of the Great Firewall’s DNS Censorship. In Free and Open Communications on the Internet. USENIX, San Diego, CA, 7 pages. https://www.usenix.org/conference/foci14/workshop-program/presentation/anonymousGoogle Scholar
Brice Augustin, Xavier Cuvellier, Benjamin Orgogozo, Fabien Viger, Timur Friedman, Matthieu Latapy, Clémence Magnien, and Renata Teixeira. 2006. Avoiding traceroute anomalies with Paris traceroute. In Internet Measurement Conference. ACM, New York, NY, 153–158. https://doi.org/10.1145/1177080.1177100Google ScholarDigital Library
Geremie R. Barme and Sang Ye. 1997. The Great Firewall of China. Wired 5, 6 (June 1997), 13 pages. https://www.wired.com/1997/06/china-3/Google Scholar
Jake Bathman. 2016–. The 10,000 most common English words in order of frequency. https://github.com/first20hours/google-10000-englishGoogle Scholar
T. Berners-Lee, R. Fielding, and L. Masinter. 2005. Uniform Resource Identifier (URI): Generic Syntax. RFC 3986. RFC Editor. https://www.rfc-editor.org/rfc/rfc3986.txtGoogle Scholar
Kevin Bock, George Hughey, Xiao Qiang, and Dave Levin. 2019. Geneva: Evolving Censorship Evasion Strategies. In Computer and Communications Security. ACM, New York, NY, 2199–2214. https://doi.org/10.1145/3319535.3363189Google ScholarDigital Library
Kevin Bock, iyouport, Anonymous, Louis-Henri Merino, David Fifield, Amir Houmansadr, and Dave Levin. 2020. Exposing and Circumventing China’s Censorship of ESNI. Technical Report. University of Maryland. https://geneva.cs.umd.edu/posts/china-censors-esni/esni/Google Scholar
Zimo Chai, Amirhossein Ghafari, and Amir Houmansadr. 2019. On the Importance of Encrypted-SNI to Censorship Circumvention. In Free and Open Communications on the Internet. USENIX, Santa Clara, CA, 8 pages. https://www.usenix.org/conference/foci19/presentation/chaiGoogle Scholar
Xia Chu. 2014. Complete GFW Rulebook for Wikipedia Plus Comprehensive List for Websites, IPs, IMDB and AppStore. (2014). https://goo.gl/zKslcuGoogle Scholar
Richard Clayton, Steven J. Murdoch, and Robert N. M. Watson. 2006. Ignoring the Great Firewall of China. In Privacy Enhancing Technologies. Springer, Berlin, Heidelberg, 20–35. https://doi.org/10.1007/11957454_2Google ScholarDigital Library
Jedidiah R. Crandall, Masashi Crete-Nishihata, Jeffrey Knockel, Sarah McKune, Adam Senft, Diana Tseng, and Greg Wiseman. 2013. Chat program censorship and surveillance in China: Tracking TOM-Skype and Sina UC. First Monday 18, 7 (June 2013), 56 pages. https://doi.org/10.5210/fm.v18i7.4628Google Scholar
Jedidiah R. Crandall, Daniel Zinn, Michael Byrd, Earl Barr, and Rich East. 2007. ConceptDoppler: A Weather Tracker for Internet Censorship. In Computer and Communications Security. ACM, New York, NY, 352–365. https://doi.org/10.1145/1315245.1315290Google ScholarDigital Library
Masashi Crete-Nishihata, marmight, Jakub Dałek, Jason Q. Ng, Greg Wiseman, and Katie Kleemola. 2020. Data related to investigation of chat client censorship. https://github.com/citizenlab/chat-censorshipGoogle Scholar
Alexander Darer, Oliver Farnan, and Joss Wright. 2017. FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs. In Network Traffic Measurement and Analysis. IEEE, Dublin, 9 pages. https://doi.org/10.23919/TMA.2017.8002914 arxiv:1704.07185 [cs.CY]Google ScholarCross Ref
Alexander Darer, Oliver Farnan, and Joss Wright. 2018. Automated Discovery of Internet Censorship by Web Crawling. In Web Science. ACM, New York, NY, 195–204. https://doi.org/10.1145/3201064.3201091 arxiv:1804.03056 [cs.CY]Google ScholarDigital Library
Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The Second-Generation Onion Router. In USENIX Security Symposium. USENIX, San Diego, CA, 17 pages. https://www.usenix.org/conference/13th-usenix-security-symposium/tor-second-generation-onion-routerGoogle ScholarCross Ref
Maximilian Dornseif. 2003. Government mandated blocking of foreign Web content. In DFN-Arbeitstagung über Kommunikationsnetze. Gesellschaft für Informatik e.V., Bonn, 617–647. arxiv:cs/0404005 [cs.CY]Google Scholar
Roya Ensafi, David Fifield, Philipp Winter, Nick Feamster, Nicholas Weaver, and Vern Paxson. 2015. Examining how the Great Firewall discovers hidden circumvention servers. In Internet Measurement Conference. ACM, New York, NY, 445–458. https://doi.org/10.1145/2815675.2815690Google ScholarDigital Library
Roya Ensafi, Philipp Winter, Abdullah Mueen, and Jedidiah R. Crandall. 2015. Analyzing the Great Firewall of China Over Space and Time. In Privacy Enhancing Technologies. Sciendo, Berlin, 61–76. https://doi.org/10.1515/popets-2015-0005Google Scholar
R. Fielding and J. Reschke. 2014. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. RFC 7230. RFC Editor. https://www.rfc-editor.org/rfc/rfc7230.txtGoogle Scholar
Arturo Filastò and Jacob Appelbaum. 2012. OONI: Open Observatory of Network Interference. In Free and Open Communications on the Internet. USENIX, Bellevue, WA, 8 pages. https://www.usenix.org/conference/foci12/workshop-program/presentation/filast%C3%B2Google Scholar
Devashish Gosain, Anshika Agarwal, Sahil Shekhawat, H. B. Acharya, and Sambuddho Chakravarty. 2018. Mending Wall: On the Implementation of Censorship in India. In Security and Privacy in Communication Networks. Springer, Cham, 418–437. https://doi.org/10.1007/978-3-319-78813-5_21 arxiv:1806.06518 [cs.CR]Google Scholar
Austin Hounsel, Prateek Mittal, and Nick Feamster. 2018. Automatically Generating a Large, Culture-Specific Blocklist for China. In Free and Open Communications on the Internet. USENIX, Baltimore, MD, 8 pages. https://www.usenix.org/conference/foci18/presentation/hounselGoogle Scholar
Eric Joyce, Matthew Goldeck, Christopher S. Leberknight, and Anna Feldman. 2018. Apollo: A System for Tracking Internet Censorship. In Workshop on Information Security and Privacy. AIS, San Francisco, CA, 19 pages. https://aisel.aisnet.org/wisp2018/7Google Scholar
J. Klensin. 2010. Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework. RFC 5890. RFC Editor. https://www.rfc-editor.org/rfc/rfc5890.txtGoogle Scholar
Klzgrad, yingyingcui, Elysion, 2010. West Chamber Project. https://code.google.com/archive/p/scholarzhang/Google Scholar
Jeffrey Knockel, Jedidiah R. Crandall, and Jared Saia. 2011. Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance. In Free and Open Communications on the Internet. USENIX, San Francisco, CA, 8 pages. http://www.usenix.org/events/foci11/tech/final_files/Knockel.pdfGoogle Scholar
Jeffrey Knockel, Masashi Crete-Nishihata, and Lotus Ruan. 2018. The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects. In Natural Language Processing for Internet Freedom. ACL, Santa Fe, NM, 1–11. https://www.aclweb.org/anthology/W18-4201.pdfGoogle Scholar
Jeffrey Knockel, Lotus Ruan, and Masashi Crete-Nishihata. 2017. Measuring decentralization of Chinese keyword censorship via mobile games. In Free and Open Communications on the Internet. USENIX, Vancouver, BC, 9 pages. https://www.usenix.org/conference/foci17/workshop-program/presentation/knockelGoogle Scholar
Bill Marczak, Nicholas Weaver, Jakub Dałek, Roya Ensafi, David Fifield, Sarah McKune, Arn Rey, John Scott-Railton, Ron Deibert, and Vern Paxson. 2015. An analysis of China’s “great cannon”. In Free and Open Communications on the Internet. USENIX, Washington, DC, 11 pages. https://www.usenix.org/conference/foci15/workshop-program/presentation/marczakGoogle Scholar
K. Moore. 1996. MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text. RFC 2047. RFC Editor. https://www.rfc-editor.org/rfc/rfc2047.txtGoogle ScholarDigital Library
Jason Q. Ng. 2014–. Sensitive Chinese keywords. https://github.com/jasonqng/chinese-keywords/blob/master/csv/individual/gfw(gb2312).csvGoogle Scholar
Kei Yin Ng, Anna Feldman, and Chris Leberknight. 2018. Detecting Censorable Content on Sina Weibo: A Pilot Study. In Hellenic Conference on Artificial Intelligence. ACM, New York, NY, 5 pages. https://doi.org/10.1145/3200947.3201037Google ScholarDigital Library
Arian Akhavan Niaki, Shinyoung Cho, Zachary Weinberg, Nguyen Phong Hoang, Abbas Razaghpanah, Nicolas Christin, and Phillipa Gill. 2020. ICLab: A Global, Longitudinal Internet Censorship Measurement Platform. In Symposium on Security and Privacy. IEEE, San Francisco, CA, 135–151. https://doi.org/10.1109/SP40000.2020.00014Google Scholar
Jong Chun Park and Jedidiah R. Crandall. 2010. Empirical study of a national-scale distributed intrusion detection system: Backbone-level filtering of HTML responses in China. In Distributed Computing Systems. IEEE, Genova, Italy, 315–326. https://doi.org/10.1109/ICDCS.2010.46Google ScholarDigital Library
Paul Pearce, Ben Jones, Frank Li, Roya Ensafi, Nick Feamster, Nick Weaver, and Vern Paxson. 2017. Global Measurement of DNS Manipulation. In USENIX Security Symposium. USENIX, Vancouver, BC, 307–323. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/pearceGoogle Scholar
Eric Rescorla, Kazuho Oku, Nick Sullivan, and Christopher A. Wood. 2020. TLS Encrypted Client Hello. (2020). https://datatracker.ietf.org/doc/draft-ietf-tls-esni/ Internet-Draft.Google Scholar
Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. 2016. Satellite: Joint analysis of CDNs and network-level interference. In USENIX Annual Technical Conference. USENIX, Denver, CO, 195–208. https://www.usenix.org/conference/atc16/technical-sessions/presentation/scottGoogle Scholar
Andreas Sfakianakis, Elias Athanasopoulos, and Sotiris Ioannidis. 2011. Censmon: A web censorship monitor. In Free and Open Communications on the Internet. USENIX, San Francisco, CA, 6 pages. https://www.usenix.org/events/foci11/tech/final_files/Sfakianakis.pdfGoogle Scholar
Sukhbir Singh, Arturo Filastò, and Maria Xynou. 2019. China is now blocking all language editions of Wikipedia. OONI. https://ooni.io/post/2019-china-wikipedia-blocking/Google Scholar
Standardization Administration of China. 1980. 信息交换用汉字编码字符集基本集 (Chinese ideogram coded character set for information interchange). GB 2312. https://archive.org/details/GB2312-1980/Google Scholar
Ram Sundara Raman, Prerana Shenoy, Katharina Kohls, and Roya Ensafi. 2020. Censored Planet: An Internet-wide, Longitudinal Censorship Observatory. In Computer and Communications Security. ACM, New York, NY, 49–66. https://doi.org/10.1145/3372297.3417883Google ScholarDigital Library
teawithcarl. 2013. GitHub SSL replaced by self-signed certificate in China. Y Combinator. https://news.ycombinator.com/item?id=5124784Google Scholar
Benjamin VanderSloot, Allison McDonald, Will Scott, J. Alex Halderman, and Roya Ensafi. 2018. Quack: Scalable remote measurement of application-layer censorship. In USENIX Security Symposium. USENIX, Baltimore, MD, 187–202. https://www.usenix.org/conference/usenixsecurity18/presentation/vanderslootGoogle Scholar
Kevin Vermeulen, Stephen D. Strowes, Olivier Fourmaux, and Timur Friedman. 2018. Multilevel MDA-Lite Paris Traceroute. In Internet Measurement Conference. ACM, New York, NY, 29–42. https://doi.org/10.1145/3278532.3278536Google ScholarDigital Library
Zhongjie Wang, Yue Cao, Zhiyun Qian, Chengyu Song, and Srikanth V. Krishnamurthy. 2017. Your state is not mine: a closer look at evading stateful Internet censorship. In Internet Measurement Conference. ACM, New York, NY, 114–127. https://doi.org/10.1145/3131365.3131374Google ScholarDigital Library
Zachary Weinberg, Mahmood Sharif, Janos Szurdi, and Nicolas Christin. 2017. Topics of Controversy: An Empirical Analysis of Web Censorship Lists. In Privacy Enhancing Technologies. Sciendo, Berlin, 42–61. https://doi.org/10.1515/popets-2017-0004Google Scholar
Joss Wright. 2014. Regional Variation in Chinese Internet Filtering. Information, Communication & Society 17, 1 (2014), 121–141. https://doi.org/10.1080/1369118X.2013.853818Google ScholarCross Ref
Ruohan Xiong and Jeffrey Knockel. 2019. An Efficient Method to Determine which Combination of Keywords Triggered Automatic Filtering of a Message. In Free and Open Communications on the Internet. USENIX, Santa Clara, CA, 9 pages. https://www.usenix.org/conference/foci19/presentation/xiongGoogle Scholar
Xueyang Xu, Z. Morley Mao, and J. Alex Halderman. 2011. Internet censorship in China: Where does the filtering occur?. In Passive and Active Measurement. Springer, Berlin, Heidelberg, 133–142. https://doi.org/10.1007/978-3-642-19260-9_14Google Scholar

Recommendations

How does localization influence online visibility of user-generated encyclopedias?: a study on Chinese-language search engine result pages (SERPs)
WikiSym '13: Proceedings of the 9th International Symposium on Open Collaboration

Prior empirical and theoretical work has discussed the role of dominant search engine plays in the function of information gatekeeping on the Web, and there are reports on the high ranking of Wikipedia website among the search engine result pages (SERP)...
Read More
Web search using dynamic keyword suggestion

Web search has become an essential task for most people. As the Web grows rapidly, effective searches have grown increasingly important. Most of us, however, have experienced frustration in trying to search for something on the Web. In existing keyword-...
Read More
Keyword++: a framework to improve keyword search over entity databases

Keyword search over entity databases (e.g., product, movie databases) is an important problem. Current techniques for keyword search on databases may often return incomplete and imprecise results. On the one hand, they either require that relevant ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '21: Proceedings of the Web Conference 2021
April 2021
4054 pages
ISBN:9781450383127
DOI:10.1145/3442381
Editors:
Jure Leskovec
Stanford
,
Marko Grobelnik
Jožef Stefan Institute
,
Marc Najork
Google
,
Jie Tang
Tsinghua University
,
Leila Zia
Wikimedia Foundation
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 June 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Censorship
Keyword filtering
Measurement
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Upcoming Conference
WWW '24

Sponsor:

sigweb

The ACM Web Conference 2024

May 13 - 17, 2024

Singapore , Singapore
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 2,080
  Total Downloads
- Downloads (Last 12 months)1,381
- Downloads (Last 6 weeks)597
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China

WWW '21: Proceedings of the Web Conference 2021

ABSTRACT

References

Cited By

Recommendations

How does localization influence online visibility of user-generated encyclopedias?: a study on Chinese-language search engine result pages (SERPs)

Web search using dynamic keyword suggestion

Keyword++: a framework to improve keyword search over entity databases