Chrome Extension
WeChat Mini Program
Use on ChatGLM
AI Reads Science
Chat
编组 4Search
Chat
编组 3ChatPaper

57,339,259

Researchers

310,206,138

Publications

8,932,894

Concepts

2,216,761,963

Citations
Follow
Explore
Report
Trend
Input keywords, let AI filter and summarize latest papers
The following are popular content recommendations, and the recommendations are more accurate after adding subscriptions
Topic
Hardware-Aligned and Natively Trainable Sparse Attention
The latest paper from DeepSeek introduces a new attention mechanism — NSA, a locally trainable sparse attention mechanism for ultra-fast long-context training and inference.
YiFan Zhang,Shanglin Lei,Runqi Qiao,Zhuoma GongQue,Xiaoshuai Song,Guanting Dong, Qiuna Tan, Zhe Wei, Peiqing Yang, Ye Tian, Yadong Xue, Xiaofei Wang,
CoRR (2024)
Cited0Views7775
Download
Bibtex
ChatPaper
4.5 Star
0
7775
Computing Research Repository (2024)
Cited5Views1243
Download
Bibtex
ChatPaper
Rate
5
1243
Expand all 5 New Papers
Topic
Mixture of Block Attention for Long-Context LLMs
Kimi proposed a new attention mechanism, MoBA, which combines the principles of MoE and improves the efficiency of LLMs in long-text scenarios without sacrificing performance.
Minghao Xu, Lichuan Xiang,Xu Cai,Hongkai Wen
CoRR (2024)
Cited2Views1319
Download
Bibtex
ChatPaper
Rate
2
1319
Benjamin Warner, Antoine Chaffin,Benjamin Clavié,Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas,Faisal Ladhak, Tom Aarsen,Nathan Cooper,Griffin Adams,
CoRR (2024)
Cited53Views888
Download
Bibtex
ChatPaper
Rate
53
888
Frank F. Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Z. Wang,Xuhui Zhou, Zhitong Guo, Murong Cao, Mingyang Yang, Hao Yang Lu,
Computing Research Repository (2024)
Cited17Views739
Download
Bibtex
ChatPaper
Rate
17
739
Expand all 5 New Papers
Hot
Top100 papers viewd in last 7 days
Top 100 viewed papers in latest 30 days
Luka Ribar,Ivan Chelombiev, Luke Hudlass-Galley,Charlie Blake, Carlo Luschi, Douglas Orr
Generative large language models (LLMs) have opened up numerous novel possibilities, but due to their significant computational requirements their ubiquitous use remains challenging. Some of the most useful applications require processing large numbers of samples at a time and using long contexts, both significantly increasing the memory communication load of the models. We introduce SparQ Attention, a technique for increasing the inference throughput of LLMs by reducing the memory bandwidth requirements within the attention blocks through selective fetching of the cached history. Our proposed technique can be applied directly to off-the-shelf LLMs during inference, without requiring any modification to the pre-training setup or additional fine-tuning. We show how SparQ Attention can decrease the attention memory bandwidth requirements up to eight times without any loss in accuracy by evaluating Llama 2 and Pythia models on a wide range of downstream tasks.
CoRR (2023)
Cited47Views9728
Download
Bibtex
ChatPaper
Rate
47
9728
Top 100 viewed papers in latest 30 days
Baifeng Shi, Ziyang Wu, Maolin Mao,Xin Wang,Trevor DarrellTop Scholar
Scaling up the size of vision models has been the de facto standard to obtain more powerful visual representations. In this work, we discuss the point beyond which larger vision models are not necessary. First, we demonstrate the power of Scaling on Scales (S^2), whereby a pre-trained and frozen smaller vision model (e.g., ViT-B or ViT-L), run over multiple image scales, can outperform larger models (e.g., ViT-H or ViT-G) on classification, segmentation, depth estimation, Multimodal LLM (MLLM) benchmarks, and robotic manipulation. Notably, S^2 achieves state-of-the-art performance in detailed understanding of MLLM on the V* benchmark, surpassing models such as GPT-4V. We examine the conditions under which S^2 is a preferred scaling approach compared to scaling on model size. While larger models have the advantage of better generalization on hard examples, we show that features of larger vision models can be well approximated by those of multi-scale smaller models. This suggests most, if not all, of the representations learned by current large pre-trained models can also be obtained from multi-scale smaller models. Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S^2 can match or even exceed the advantage of larger models. We release a Python package that can apply S^2 on any vision model with one line of code: https://github.com/bfshi/scaling_on_scales.
arXiv (2024)
Cited49Views8431
Download
Bibtex
ChatPaper
Rate
49
8431
Top 100 viewed papers in latest 30 days
Ziyin Zhang, Chaoyu Chen,Bingchang Liu, Cong Liao, Zi Gong,Hang Yu,Jianguo Li,Rui Wang
In this work we systematically review the recent advancements in code processing with language models, covering 50+ models, 30+ evaluation tasks, and 500 related works. We break down code processing models into general language models represented by the GPT family and specialized models that are specifically pretrained on code, often with tailored objectives. We discuss the relations and differences between these models, and highlight the historical transition of code modeling from statistical models and RNNs to pretrained Transformers and LLMs, which is exactly the same course that had been taken by NLP. We also discuss code-specific features such as AST, CFG, and unit tests, along with their application in training code language models, and identify key challenges and potential future directions in this domain. We keep the survey open and updated on github repository at https://github.com/codefuse-ai/Awesome-Code-LLM.
CoRR (2023)
Cited62Views18516
Download
Bibtex
ChatPaper
Rate
62
18516
加载更多
Popular Recommendation
Popular Viewed Papers&Topics
This paper introduces a new technique called SparQ Attention, which can significantly reduce the memory bandwidth requirements of generative large language models during inference, thereby improving the throughput of LLM inference.
Luka Ribar,Ivan Chelombiev, Luke Hudlass-Galley,Charlie Blake, Carlo Luschi, Douglas Orr
CoRR (2023)
Cited0Views9719
Download
Bibtex
ChatPaper
3.5 Star
0
9719
Scaling up the size of vision models has become a practical trend to obtain more powerful visual representations. But is "bigger" always "better" in the future? This paper discusses the aspects of larger vision models that may not be necessary.
Baifeng Shi, Ziyang Wu, Maolin Mao,Xin Wang,Trevor DarrellTop Scholar
arXiv (2024)
Cited0Views8428
Download
Bibtex
ChatPaper
5.0 Star
0
8428
Ziyin Zhang, Chaoyu Chen,Bingchang Liu, Cong Liao, Zi Gong,Hang Yu,Jianguo Li,Rui Wang
CoRR (2023)
Cited4Views18502
Download
Bibtex
ChatPaper
4.5 Star
4
18502
Minghua Liu,Ruoxi Shi,Linghao Chen, Zhuoyang Zhang,Chao Xu,Xinyue Wei,Hansheng Chen, Chong Zeng, Jiayuan Gu,Hao SuTop Scholar
CVPR 2024 (2023)
Cited41Views7003
Download
Bibtex
ChatPaper
4.3 Star
41
7003
CoRR (2023)
Cited15Views4743
Download
Bibtex
ChatPaper
4.0 Star
15
4743
Hongxuan Zhang,Zhining Liu, Jiaqi Zheng ,Chenyi Zhuang, Jinjie Gu,Guihai ChenTop Scholar
CoRR (2023)
Cited0Views3378
Download
Bibtex
ChatPaper
3.5 Star
0
3378

Loading more RecommendationsGet more recommendations Get More RecommendationsAdd KeywordSet your interests to get accurate recommendation

gongan
期刊/会议
查看更多期刊/会议
京ICP备20011824号-11gongan京公网安备11010802035176号© 2005-2025 AMiner