ABSTRACT
The Great Firewall of China (GFW) prevents Chinese citizens from accessing online content deemed objectionable by the Chinese government. One way it does this is to search for forbidden keywords in unencrypted packet streams. When it detects them, it terminates the offending stream by injecting TCP RST packets, and blocks further traffic between the same two hosts for a few minutes.
We report on a detailed investigation of the GFW’s application-layer understanding of HTTP. Forbidden keywords are only detected in certain locations within an HTTP request. Requests that contain the English word “search” are inspected for a longer list of forbidden keywords than requests without this word. The firewall can be evaded by bending the rules of the HTTP specification. We observe censorship based on the cleartext TLS Server Name Indication (SNI), but we find no evidence for bulk decryption of HTTPS.
We also report on changes since 2014 in the contents of the forbidden keyword list. Over 85% of the forbidden keywords have been replaced since 2014, with the surviving terms referring to perennially sensitive topics. The new keywords refer to recent events and controversies. The GFW’s keyword list is not kept in sync with the blocklists used by Chinese chat clients.
- Nicholas Aase, Jedidiah R. Crandall, Álvaro Díaz, Jeffrey Knockel, Jorge Ocaña Molinero, Jared Saia, Dan Wallach, and Tao Zhu. 2012. Whiskey, Weed, and Wukan on the World Wide Web: On Measuring Censors’ Resources and Motivations. In Free and Open Communications on the Internet. USENIX, Berkeley, CA, Article 17, 7 pages. https://www.usenix.org/system/files/conference/foci12/foci12-final17.pdfGoogle Scholar
- Anonymous. 2014. Towards a Comprehensive Picture of the Great Firewall’s DNS Censorship. In Free and Open Communications on the Internet. USENIX, San Diego, CA, 7 pages. https://www.usenix.org/conference/foci14/workshop-program/presentation/anonymousGoogle Scholar
- Brice Augustin, Xavier Cuvellier, Benjamin Orgogozo, Fabien Viger, Timur Friedman, Matthieu Latapy, Clémence Magnien, and Renata Teixeira. 2006. Avoiding traceroute anomalies with Paris traceroute. In Internet Measurement Conference. ACM, New York, NY, 153–158. https://doi.org/10.1145/1177080.1177100Google ScholarDigital Library
- Geremie R. Barme and Sang Ye. 1997. The Great Firewall of China. Wired 5, 6 (June 1997), 13 pages. https://www.wired.com/1997/06/china-3/Google Scholar
- Jake Bathman. 2016–. The 10,000 most common English words in order of frequency. https://github.com/first20hours/google-10000-englishGoogle Scholar
- T. Berners-Lee, R. Fielding, and L. Masinter. 2005. Uniform Resource Identifier (URI): Generic Syntax. RFC 3986. RFC Editor. https://www.rfc-editor.org/rfc/rfc3986.txtGoogle Scholar
- Kevin Bock, George Hughey, Xiao Qiang, and Dave Levin. 2019. Geneva: Evolving Censorship Evasion Strategies. In Computer and Communications Security. ACM, New York, NY, 2199–2214. https://doi.org/10.1145/3319535.3363189Google ScholarDigital Library
- Kevin Bock, iyouport, Anonymous, Louis-Henri Merino, David Fifield, Amir Houmansadr, and Dave Levin. 2020. Exposing and Circumventing China’s Censorship of ESNI. Technical Report. University of Maryland. https://geneva.cs.umd.edu/posts/china-censors-esni/esni/Google Scholar
- Zimo Chai, Amirhossein Ghafari, and Amir Houmansadr. 2019. On the Importance of Encrypted-SNI to Censorship Circumvention. In Free and Open Communications on the Internet. USENIX, Santa Clara, CA, 8 pages. https://www.usenix.org/conference/foci19/presentation/chaiGoogle Scholar
- Xia Chu. 2014. Complete GFW Rulebook for Wikipedia Plus Comprehensive List for Websites, IPs, IMDB and AppStore. (2014). https://goo.gl/zKslcuGoogle Scholar
- Richard Clayton, Steven J. Murdoch, and Robert N. M. Watson. 2006. Ignoring the Great Firewall of China. In Privacy Enhancing Technologies. Springer, Berlin, Heidelberg, 20–35. https://doi.org/10.1007/11957454_2Google ScholarDigital Library
- Jedidiah R. Crandall, Masashi Crete-Nishihata, Jeffrey Knockel, Sarah McKune, Adam Senft, Diana Tseng, and Greg Wiseman. 2013. Chat program censorship and surveillance in China: Tracking TOM-Skype and Sina UC. First Monday 18, 7 (June 2013), 56 pages. https://doi.org/10.5210/fm.v18i7.4628Google Scholar
- Jedidiah R. Crandall, Daniel Zinn, Michael Byrd, Earl Barr, and Rich East. 2007. ConceptDoppler: A Weather Tracker for Internet Censorship. In Computer and Communications Security. ACM, New York, NY, 352–365. https://doi.org/10.1145/1315245.1315290Google ScholarDigital Library
- Masashi Crete-Nishihata, marmight, Jakub Dałek, Jason Q. Ng, Greg Wiseman, and Katie Kleemola. 2020. Data related to investigation of chat client censorship. https://github.com/citizenlab/chat-censorshipGoogle Scholar
- Alexander Darer, Oliver Farnan, and Joss Wright. 2017. FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs. In Network Traffic Measurement and Analysis. IEEE, Dublin, 9 pages. https://doi.org/10.23919/TMA.2017.8002914 arxiv:1704.07185 [cs.CY]Google ScholarCross Ref
- Alexander Darer, Oliver Farnan, and Joss Wright. 2018. Automated Discovery of Internet Censorship by Web Crawling. In Web Science. ACM, New York, NY, 195–204. https://doi.org/10.1145/3201064.3201091 arxiv:1804.03056 [cs.CY]Google ScholarDigital Library
- Roger Dingledine, Nick Mathewson, and Paul Syverson. 2004. Tor: The Second-Generation Onion Router. In USENIX Security Symposium. USENIX, San Diego, CA, 17 pages. https://www.usenix.org/conference/13th-usenix-security-symposium/tor-second-generation-onion-routerGoogle ScholarCross Ref
- Maximilian Dornseif. 2003. Government mandated blocking of foreign Web content. In DFN-Arbeitstagung über Kommunikationsnetze. Gesellschaft für Informatik e.V., Bonn, 617–647. arxiv:cs/0404005 [cs.CY]Google Scholar
- Roya Ensafi, David Fifield, Philipp Winter, Nick Feamster, Nicholas Weaver, and Vern Paxson. 2015. Examining how the Great Firewall discovers hidden circumvention servers. In Internet Measurement Conference. ACM, New York, NY, 445–458. https://doi.org/10.1145/2815675.2815690Google ScholarDigital Library
- Roya Ensafi, Philipp Winter, Abdullah Mueen, and Jedidiah R. Crandall. 2015. Analyzing the Great Firewall of China Over Space and Time. In Privacy Enhancing Technologies. Sciendo, Berlin, 61–76. https://doi.org/10.1515/popets-2015-0005Google Scholar
- R. Fielding and J. Reschke. 2014. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. RFC 7230. RFC Editor. https://www.rfc-editor.org/rfc/rfc7230.txtGoogle Scholar
- Arturo Filastò and Jacob Appelbaum. 2012. OONI: Open Observatory of Network Interference. In Free and Open Communications on the Internet. USENIX, Bellevue, WA, 8 pages. https://www.usenix.org/conference/foci12/workshop-program/presentation/filast%C3%B2Google Scholar
- Devashish Gosain, Anshika Agarwal, Sahil Shekhawat, H. B. Acharya, and Sambuddho Chakravarty. 2018. Mending Wall: On the Implementation of Censorship in India. In Security and Privacy in Communication Networks. Springer, Cham, 418–437. https://doi.org/10.1007/978-3-319-78813-5_21 arxiv:1806.06518 [cs.CR]Google Scholar
- Austin Hounsel, Prateek Mittal, and Nick Feamster. 2018. Automatically Generating a Large, Culture-Specific Blocklist for China. In Free and Open Communications on the Internet. USENIX, Baltimore, MD, 8 pages. https://www.usenix.org/conference/foci18/presentation/hounselGoogle Scholar
- Eric Joyce, Matthew Goldeck, Christopher S. Leberknight, and Anna Feldman. 2018. Apollo: A System for Tracking Internet Censorship. In Workshop on Information Security and Privacy. AIS, San Francisco, CA, 19 pages. https://aisel.aisnet.org/wisp2018/7Google Scholar
- J. Klensin. 2010. Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework. RFC 5890. RFC Editor. https://www.rfc-editor.org/rfc/rfc5890.txtGoogle Scholar
- Klzgrad, yingyingcui, Elysion, 2010. West Chamber Project. https://code.google.com/archive/p/scholarzhang/Google Scholar
- Jeffrey Knockel, Jedidiah R. Crandall, and Jared Saia. 2011. Three Researchers, Five Conjectures: An Empirical Analysis of TOM-Skype Censorship and Surveillance. In Free and Open Communications on the Internet. USENIX, San Francisco, CA, 8 pages. http://www.usenix.org/events/foci11/tech/final_files/Knockel.pdfGoogle Scholar
- Jeffrey Knockel, Masashi Crete-Nishihata, and Lotus Ruan. 2018. The effect of information controls on developers in China: An analysis of censorship in Chinese open source projects. In Natural Language Processing for Internet Freedom. ACL, Santa Fe, NM, 1–11. https://www.aclweb.org/anthology/W18-4201.pdfGoogle Scholar
- Jeffrey Knockel, Lotus Ruan, and Masashi Crete-Nishihata. 2017. Measuring decentralization of Chinese keyword censorship via mobile games. In Free and Open Communications on the Internet. USENIX, Vancouver, BC, 9 pages. https://www.usenix.org/conference/foci17/workshop-program/presentation/knockelGoogle Scholar
- Bill Marczak, Nicholas Weaver, Jakub Dałek, Roya Ensafi, David Fifield, Sarah McKune, Arn Rey, John Scott-Railton, Ron Deibert, and Vern Paxson. 2015. An analysis of China’s “great cannon”. In Free and Open Communications on the Internet. USENIX, Washington, DC, 11 pages. https://www.usenix.org/conference/foci15/workshop-program/presentation/marczakGoogle Scholar
- K. Moore. 1996. MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text. RFC 2047. RFC Editor. https://www.rfc-editor.org/rfc/rfc2047.txtGoogle ScholarDigital Library
- Jason Q. Ng. 2014–. Sensitive Chinese keywords. https://github.com/jasonqng/chinese-keywords/blob/master/csv/individual/gfw(gb2312).csvGoogle Scholar
- Kei Yin Ng, Anna Feldman, and Chris Leberknight. 2018. Detecting Censorable Content on Sina Weibo: A Pilot Study. In Hellenic Conference on Artificial Intelligence. ACM, New York, NY, 5 pages. https://doi.org/10.1145/3200947.3201037Google ScholarDigital Library
- Arian Akhavan Niaki, Shinyoung Cho, Zachary Weinberg, Nguyen Phong Hoang, Abbas Razaghpanah, Nicolas Christin, and Phillipa Gill. 2020. ICLab: A Global, Longitudinal Internet Censorship Measurement Platform. In Symposium on Security and Privacy. IEEE, San Francisco, CA, 135–151. https://doi.org/10.1109/SP40000.2020.00014Google Scholar
- Jong Chun Park and Jedidiah R. Crandall. 2010. Empirical study of a national-scale distributed intrusion detection system: Backbone-level filtering of HTML responses in China. In Distributed Computing Systems. IEEE, Genova, Italy, 315–326. https://doi.org/10.1109/ICDCS.2010.46Google ScholarDigital Library
- Paul Pearce, Ben Jones, Frank Li, Roya Ensafi, Nick Feamster, Nick Weaver, and Vern Paxson. 2017. Global Measurement of DNS Manipulation. In USENIX Security Symposium. USENIX, Vancouver, BC, 307–323. https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/pearceGoogle Scholar
- Eric Rescorla, Kazuho Oku, Nick Sullivan, and Christopher A. Wood. 2020. TLS Encrypted Client Hello. (2020). https://datatracker.ietf.org/doc/draft-ietf-tls-esni/ Internet-Draft.Google Scholar
- Will Scott, Thomas Anderson, Tadayoshi Kohno, and Arvind Krishnamurthy. 2016. Satellite: Joint analysis of CDNs and network-level interference. In USENIX Annual Technical Conference. USENIX, Denver, CO, 195–208. https://www.usenix.org/conference/atc16/technical-sessions/presentation/scottGoogle Scholar
- Andreas Sfakianakis, Elias Athanasopoulos, and Sotiris Ioannidis. 2011. Censmon: A web censorship monitor. In Free and Open Communications on the Internet. USENIX, San Francisco, CA, 6 pages. https://www.usenix.org/events/foci11/tech/final_files/Sfakianakis.pdfGoogle Scholar
- Sukhbir Singh, Arturo Filastò, and Maria Xynou. 2019. China is now blocking all language editions of Wikipedia. OONI. https://ooni.io/post/2019-china-wikipedia-blocking/Google Scholar
- Standardization Administration of China. 1980. 信息交换用汉字编码字符集基本集 (Chinese ideogram coded character set for information interchange). GB 2312. https://archive.org/details/GB2312-1980/Google Scholar
- Ram Sundara Raman, Prerana Shenoy, Katharina Kohls, and Roya Ensafi. 2020. Censored Planet: An Internet-wide, Longitudinal Censorship Observatory. In Computer and Communications Security. ACM, New York, NY, 49–66. https://doi.org/10.1145/3372297.3417883Google ScholarDigital Library
- teawithcarl. 2013. GitHub SSL replaced by self-signed certificate in China. Y Combinator. https://news.ycombinator.com/item?id=5124784Google Scholar
- Benjamin VanderSloot, Allison McDonald, Will Scott, J. Alex Halderman, and Roya Ensafi. 2018. Quack: Scalable remote measurement of application-layer censorship. In USENIX Security Symposium. USENIX, Baltimore, MD, 187–202. https://www.usenix.org/conference/usenixsecurity18/presentation/vanderslootGoogle Scholar
- Kevin Vermeulen, Stephen D. Strowes, Olivier Fourmaux, and Timur Friedman. 2018. Multilevel MDA-Lite Paris Traceroute. In Internet Measurement Conference. ACM, New York, NY, 29–42. https://doi.org/10.1145/3278532.3278536Google ScholarDigital Library
- Zhongjie Wang, Yue Cao, Zhiyun Qian, Chengyu Song, and Srikanth V. Krishnamurthy. 2017. Your state is not mine: a closer look at evading stateful Internet censorship. In Internet Measurement Conference. ACM, New York, NY, 114–127. https://doi.org/10.1145/3131365.3131374Google ScholarDigital Library
- Zachary Weinberg, Mahmood Sharif, Janos Szurdi, and Nicolas Christin. 2017. Topics of Controversy: An Empirical Analysis of Web Censorship Lists. In Privacy Enhancing Technologies. Sciendo, Berlin, 42–61. https://doi.org/10.1515/popets-2017-0004Google Scholar
- Joss Wright. 2014. Regional Variation in Chinese Internet Filtering. Information, Communication & Society 17, 1 (2014), 121–141. https://doi.org/10.1080/1369118X.2013.853818Google ScholarCross Ref
- Ruohan Xiong and Jeffrey Knockel. 2019. An Efficient Method to Determine which Combination of Keywords Triggered Automatic Filtering of a Message. In Free and Open Communications on the Internet. USENIX, Santa Clara, CA, 9 pages. https://www.usenix.org/conference/foci19/presentation/xiongGoogle Scholar
- Xueyang Xu, Z. Morley Mao, and J. Alex Halderman. 2011. Internet censorship in China: Where does the filtering occur?. In Passive and Active Measurement. Springer, Berlin, Heidelberg, 133–142. https://doi.org/10.1007/978-3-642-19260-9_14Google Scholar
Recommendations
How does localization influence online visibility of user-generated encyclopedias?: a study on Chinese-language search engine result pages (SERPs)
WikiSym '13: Proceedings of the 9th International Symposium on Open CollaborationPrior empirical and theoretical work has discussed the role of dominant search engine plays in the function of information gatekeeping on the Web, and there are reports on the high ranking of Wikipedia website among the search engine result pages (SERP)...
Web search using dynamic keyword suggestion
Web search has become an essential task for most people. As the Web grows rapidly, effective searches have grown increasingly important. Most of us, however, have experienced frustration in trying to search for something on the Web. In existing keyword-...
Keyword++: a framework to improve keyword search over entity databases
Keyword search over entity databases (e.g., product, movie databases) is an important problem. Current techniques for keyword search on databases may often return incomplete and imprecise results. On the one hand, they either require that relevant ...
Comments