skip to main content
10.1145/3442381.3450076acmconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article
Open Access

Chinese Wall or Swiss Cheese? Keyword filtering in the Great Firewall of China

Published:03 June 2021Publication History

ABSTRACT

The Great Firewall of China (GFW) prevents Chinese citizens from accessing online content deemed objectionable by the Chinese government. One way it does this is to search for forbidden keywords in unencrypted packet streams. When it detects them, it terminates the offending stream by injecting TCP RST packets, and blocks further traffic between the same two hosts for a few minutes.

We report on a detailed investigation of the GFW’s application-layer understanding of HTTP. Forbidden keywords are only detected in certain locations within an HTTP request. Requests that contain the English word “search” are inspected for a longer list of forbidden keywords than requests without this word. The firewall can be evaded by bending the rules of the HTTP specification. We observe censorship based on the cleartext TLS Server Name Indication (SNI), but we find no evidence for bulk decryption of HTTPS.

We also report on changes since 2014 in the contents of the forbidden keyword list. Over 85% of the forbidden keywords have been replaced since 2014, with the surviving terms referring to perennially sensitive topics. The new keywords refer to recent events and controversies. The GFW’s keyword list is not kept in sync with the blocklists used by Chinese chat clients.

References

  1. Nicholas Aase, Jedidiah R. Crandall, Álvaro Díaz, Jeffrey Knockel, Jorge Ocaña Molinero, Jared Saia, Dan Wallach, and Tao Zhu. 2012. Whiskey, Weed, and Wukan on the World Wide Web: On Measuring Censors’ Resources and Motivations. In Free and Open Communications on the Internet. USENIX, Berkeley, CA, Article 17, 7 pages. https://www.usenix.org/system/files/conference/foci12/foci12-final17.pdfGoogle ScholarGoogle Scholar
  2. Anonymous. 2014. Towards a Comprehensive Picture of the Great Firewall’s DNS Censorship. In Free and Open Communications on the Internet. USENIX, San Diego, CA, 7 pages. https://www.usenix.org/conference/foci14/workshop-program/presentation/anonymousGoogle ScholarGoogle Scholar
  3. Brice Augustin, Xavier Cuvellier, Benjamin Orgogozo, Fabien Viger, Timur Friedman, Matthieu Latapy, Clémence Magnien, and Renata Teixeira. 2006. Avoiding traceroute anomalies with Paris traceroute. In Internet Measurement Conference. ACM, New York, NY, 153–158. https://doi.org/10.1145/1177080.1177100Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Geremie R. Barme and Sang Ye. 1997. The Great Firewall of China. Wired 5, 6 (June 1997), 13 pages. https://www.wired.com/1997/06/china-3/Google ScholarGoogle Scholar
  5. Jake Bathman. 2016–. The 10,000 most common English words in order of frequency. https://github.com/first20hours/google-10000-englishGoogle ScholarGoogle Scholar
  6. T. Berners-Lee, R. Fielding, and L. Masinter. 2005. Uniform Resource Identifier (URI): Generic Syntax. RFC 3986. RFC Editor. https://www.rfc-editor.org/rfc/rfc3986.txtGoogle ScholarGoogle Scholar
  7. Kevin Bock, George Hughey, Xiao Qiang, and Dave Levin. 2019. Geneva: Evolving Censorship Evasion Strategies. In Computer and Communications Security. ACM, New York, NY, 2199–2214. https://doi.org/10.1145/3319535.3363189Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kevin Bock, iyouport, Anonymous, Louis-Henri Merino, David Fifield, Amir Houmansadr, and Dave Levin. 2020. Exposing and Circumventing China’s Censorship of ESNI. Technical Report. University of Maryland. https://geneva.cs.umd.edu/posts/china-censors-esni/esni/Google ScholarGoogle Scholar
  9. Zimo Chai, Amirhossein Ghafari, and Amir Houmansadr. 2019. On the Importance of Encrypted-SNI to Censorship Circumvention. In Free and Open Communications on the Internet. USENIX, Santa Clara, CA, 8 pages. https://www.usenix.org/conference/foci19/presentation/chaiGoogle ScholarGoogle Scholar
  10. Xia Chu. 2014. Complete GFW Rulebook for Wikipedia Plus Comprehensive List for Websites, IPs, IMDB and AppStore. (2014). https://goo.gl/zKslcuGoogle ScholarGoogle Scholar
  11. Richard Clayton, Steven J. Murdoch, and Robert N. M. Watson. 2006. Ignoring the Great Firewall of China. In Privacy Enhancing Technologies. Springer, Berlin, Heidelberg, 20–35. https://doi.org/10.1007/11957454_2Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jedidiah R. Crandall, Masashi Crete-Nishihata, Jeffrey Knockel, Sarah McKune, Adam Senft, Diana Tseng, and Greg Wiseman. 2013. Chat program censorship and surveillance in China: Tracking TOM-Skype and Sina UC. First Monday 18, 7 (June 2013), 56 pages. https://doi.org/10.5210/fm.v18i7.4628Google ScholarGoogle Scholar
  13. Jedidiah R. Crandall, Daniel Zinn, Michael Byrd, Earl Barr, and Rich East. 2007. ConceptDoppler: A Weather Tracker for Internet Censorship. In Computer and Communications Security. ACM, New York, NY, 352–365. https://doi.org/10.1145/1315245.1315290Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Masashi Crete-Nishihata, marmight, Jakub Dałek, Jason Q. Ng, Greg Wiseman, and Katie Kleemola. 2020. Data related to investigation of chat client censorship. https://github.com/citizenlab/chat-censorshipGoogle ScholarGoogle Scholar
  15. Alexander Darer, Oliver Farnan, and Joss Wright. 2017. FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs. In Network Traffic Measurement and Analysis. IEEE, Dublin, 9 pages. https://doi.org/10.23919/TMA.2017.8002914 arxiv:1704.07185 [cs.CY]Google ScholarGoogle ScholarCross RefCross Ref
  16. Alexander Darer, Oliver Farnan, and Joss Wright. 2018. Automated Discovery of Internet Censorship by Web Crawling. In Web Science. ACM, New York, NY, 195–204. https://doi.org/10.1145/3201064.3201091 arxiv:1804.03056 [cs.CY]Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The Second-Generation Onion Router. In USENIX Security Symposium. USENIX, San Diego, CA, 17 pages. https://www.usenix.org/conference/13th-usenix-security-symposium/tor-second-generation-onion-routerGoogle ScholarGoogle ScholarCross RefCross Ref
  18. Maximilian Dornseif. 2003. Government mandated blocking of foreign Web content. In DFN-Arbeitstagung über Kommunikationsnetze. Gesellschaft für Informatik e.V., Bonn, 617–647. arxiv:cs/0404005 [cs.CY]Google ScholarGoogle Scholar
  19. Roya Ensafi, David Fifield, Philipp Winter, Nick Feamster, Nicholas Weaver, and Vern Paxson. 2015. Examining how the Great Firewall discovers hidden circumvention servers. In Internet Measurement Conference. ACM, New York, NY, 445–458. https://doi.org/10.1145/2815675.2815690Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Roya Ensafi, Philipp Winter, Abdullah Mueen, and Jedidiah R. Crandall. 2015. Analyzing the Great Firewall of China Over Space and Time. In Privacy Enhancing Technologies. Sciendo, Berlin, 61–76. https://doi.org/10.1515/popets-2015-0005Google ScholarGoogle Scholar
  21. R. Fielding and J. Reschke. 2014. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. RFC 7230. RFC Editor. https://www.rfc-editor.org/rfc/rfc7230.txtGoogle ScholarGoogle Scholar
  22. Arturo Filastò and Jacob Appelbaum. 2012. OONI: Open Observatory of Network Interference. In Free and Open Communications on the Internet. USENIX, Bellevue, WA, 8 pages. https://www.usenix.org/conference/foci12/workshop-program/presentation/filast%C3%B2Google ScholarGoogle Scholar
  23. Devashish Gosain, Anshika Agarwal, Sahil Shekhawat, H. B. Acharya, and Sambuddho Chakravarty. 2018. Mending Wall: On the Implementation of Censorship in India. In Security and Privacy in Communication Networks. Springer, Cham, 418–437. https://doi.org/10.1007/978-3-319-78813-5_21 arxiv:1806.06518 [cs.CR]Google ScholarGoogle Scholar
  24. Austin Hounsel, Prateek Mittal, and Nick Feamster. 2018. Automatically Generating a Large, Culture-Specific Blocklist for China. In Free and Open Communications on the Internet. USENIX, Baltimore, MD, 8 pages. https://www.usenix.org/conference/foci18/presentation/hounselGoogle ScholarGoogle Scholar
  25. Eric Joyce, Matthew Goldeck, Christopher S. Leberknight, and Anna Feldman. 2018. Apollo: A System for Tracking Internet Censorship. In Workshop on Information Security and Privacy. AIS, San Francisco, CA, 19 pages. https://aisel.aisnet.org/wisp2018/7Google ScholarGoogle Scholar
  26. J. Klensin. 2010. Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework. RFC 5890. RFC Editor. https://www.rfc-editor.org/rfc/rfc5890.txtGoogle ScholarGoogle Scholar
  27. Klzgrad, yingyingcui, Elysion, 2010. West Chamber Project. https://code.google.com/archive/p/scholarzhang/Google ScholarGoogle Scholar
  28. Jeffrey Knockel, Jedidiah R. Crandall, and Jared Saia. 2011. Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance. In Free and Open Communications on the Internet. USENIX, San Francisco, CA, 8 pages. http://www.usenix.org/events/foci11/tech/final_files/Knockel.pdfGoogle ScholarGoogle Scholar
  29. Jeffrey Knockel, Masashi Crete-Nishihata, and Lotus Ruan. 2018. The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects. In Natural Language Processing for Internet Freedom. ACL, Santa Fe, NM, 1–11. https://www.aclweb.org/anthology/W18-4201.pdfGoogle ScholarGoogle Scholar
  30. Jeffrey Knockel, Lotus Ruan, and Masashi Crete-Nishihata. 2017. Measuring decentralization of Chinese keyword censorship via mobile games. In Free and Open Communications on the Internet. USENIX, Vancouver, BC, 9 pages. https://www.usenix.org/conference/foci17/workshop-program/presentation/knockelGoogle ScholarGoogle Scholar
  31. Bill Marczak, Nicholas Weaver, Jakub Dałek, Roya Ensafi, David Fifield, Sarah McKune, Arn Rey, John Scott-Railton, Ron Deibert, and Vern Paxson. 2015. An analysis of China’s “great cannon”. In Free and Open Communications on the Internet. USENIX, Washington, DC, 11 pages. https://www.usenix.org/conference/foci15/workshop-program/presentation/marczakGoogle ScholarGoogle Scholar
  32. K. Moore. 1996. MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text. RFC 2047. RFC Editor. https://www.rfc-editor.org/rfc/rfc2047.txtGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jason Q. Ng. 2014–. Sensitive Chinese keywords. https://github.com/jasonqng/chinese-keywords/blob/master/csv/individual/gfw(gb2312).csvGoogle ScholarGoogle Scholar
  34. Kei Yin Ng, Anna Feldman, and Chris Leberknight. 2018. Detecting Censorable Content on Sina Weibo: A Pilot Study. In Hellenic Conference on Artificial Intelligence. ACM, New York, NY, 5 pages. https://doi.org/10.1145/3200947.3201037Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Arian Akhavan Niaki, Shinyoung Cho, Zachary Weinberg, Nguyen Phong Hoang, Abbas Razaghpanah, Nicolas Christin, and Phillipa Gill. 2020. ICLab: A Global, Longitudinal Internet Censorship Measurement Platform. In Symposium on Security and Privacy. IEEE, San Francisco, CA, 135–151. https://doi.org/10.1109/SP40000.2020.00014Google ScholarGoogle Scholar
  36. Jong Chun Park and Jedidiah R. Crandall. 2010. Empirical study of a national-scale distributed intrusion detection system: Backbone-level filtering of HTML responses in China. In Distributed Computing Systems. IEEE, Genova, Italy, 315–326. https://doi.org/10.1109/ICDCS.2010.46Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Paul Pearce, Ben Jones, Frank Li, Roya Ensafi, Nick Feamster, Nick Weaver, and Vern Paxson. 2017. Global Measurement of DNS Manipulation. In USENIX Security Symposium. USENIX, Vancouver, BC, 307–323. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/pearceGoogle ScholarGoogle Scholar
  38. Eric Rescorla, Kazuho Oku, Nick Sullivan, and Christopher A. Wood. 2020. TLS Encrypted Client Hello. (2020). https://datatracker.ietf.org/doc/draft-ietf-tls-esni/ Internet-Draft.Google ScholarGoogle Scholar
  39. Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. 2016. Satellite: Joint analysis of CDNs and network-level interference. In USENIX Annual Technical Conference. USENIX, Denver, CO, 195–208. https://www.usenix.org/conference/atc16/technical-sessions/presentation/scottGoogle ScholarGoogle Scholar
  40. Andreas Sfakianakis, Elias Athanasopoulos, and Sotiris Ioannidis. 2011. Censmon: A web censorship monitor. In Free and Open Communications on the Internet. USENIX, San Francisco, CA, 6 pages. https://www.usenix.org/events/foci11/tech/final_files/Sfakianakis.pdfGoogle ScholarGoogle Scholar
  41. Sukhbir Singh, Arturo Filastò, and Maria Xynou. 2019. China is now blocking all language editions of Wikipedia. OONI. https://ooni.io/post/2019-china-wikipedia-blocking/Google ScholarGoogle Scholar
  42. Standardization Administration of China. 1980. 信息交换用汉字编码字符集基本集 (Chinese ideogram coded character set for information interchange). GB 2312. https://archive.org/details/GB2312-1980/Google ScholarGoogle Scholar
  43. Ram Sundara Raman, Prerana Shenoy, Katharina Kohls, and Roya Ensafi. 2020. Censored Planet: An Internet-wide, Longitudinal Censorship Observatory. In Computer and Communications Security. ACM, New York, NY, 49–66. https://doi.org/10.1145/3372297.3417883Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. teawithcarl. 2013. GitHub SSL replaced by self-signed certificate in China. Y Combinator. https://news.ycombinator.com/item?id=5124784Google ScholarGoogle Scholar
  45. Benjamin VanderSloot, Allison McDonald, Will Scott, J. Alex Halderman, and Roya Ensafi. 2018. Quack: Scalable remote measurement of application-layer censorship. In USENIX Security Symposium. USENIX, Baltimore, MD, 187–202. https://www.usenix.org/conference/usenixsecurity18/presentation/vanderslootGoogle ScholarGoogle Scholar
  46. Kevin Vermeulen, Stephen D. Strowes, Olivier Fourmaux, and Timur Friedman. 2018. Multilevel MDA-Lite Paris Traceroute. In Internet Measurement Conference. ACM, New York, NY, 29–42. https://doi.org/10.1145/3278532.3278536Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Zhongjie Wang, Yue Cao, Zhiyun Qian, Chengyu Song, and Srikanth V. Krishnamurthy. 2017. Your state is not mine: a closer look at evading stateful Internet censorship. In Internet Measurement Conference. ACM, New York, NY, 114–127. https://doi.org/10.1145/3131365.3131374Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Zachary Weinberg, Mahmood Sharif, Janos Szurdi, and Nicolas Christin. 2017. Topics of Controversy: An Empirical Analysis of Web Censorship Lists. In Privacy Enhancing Technologies. Sciendo, Berlin, 42–61. https://doi.org/10.1515/popets-2017-0004Google ScholarGoogle Scholar
  49. Joss Wright. 2014. Regional Variation in Chinese Internet Filtering. Information, Communication & Society 17, 1 (2014), 121–141. https://doi.org/10.1080/1369118X.2013.853818Google ScholarGoogle ScholarCross RefCross Ref
  50. Ruohan Xiong and Jeffrey Knockel. 2019. An Efficient Method to Determine which Combination of Keywords Triggered Automatic Filtering of a Message. In Free and Open Communications on the Internet. USENIX, Santa Clara, CA, 9 pages. https://www.usenix.org/conference/foci19/presentation/xiongGoogle ScholarGoogle Scholar
  51. Xueyang Xu, Z. Morley Mao, and J. Alex Halderman. 2011. Internet censorship in China: Where does the filtering occur?. In Passive and Active Measurement. Springer, Berlin, Heidelberg, 133–142. https://doi.org/10.1007/978-3-642-19260-9_14Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    WWW '21: Proceedings of the Web Conference 2021
    April 2021
    4054 pages
    ISBN:9781450383127
    DOI:10.1145/3442381

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 June 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

    Upcoming Conference

    WWW '24
    The ACM Web Conference 2024
    May 13 - 17, 2024
    Singapore , Singapore

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format