Massively scalable near duplicate detection in streams of documents using MDSH | IEEE Conference Publication | IEEE Xplore