3
LISA: Layerwise Importance Sampling for Memory-Efficient LLM Fine-Tuning (arxiv.org)
4 weeks ago | convexstrictly | arxiv.org | newest
2
NTIA AI Open Model Weights RFC (regulations.gov)
4 weeks ago | convexstrictly | regulations.gov | newest
1
Mechanics of Next Token Prediction with Self-Attention (arxiv.org)
a month ago | convexstrictly | arxiv.org | newest
1
Dive Deeper into Yi-9B (huggingface.co)
a month ago | convexstrictly | huggingface.co | newest
3
You can now train a 70B language model at home (answer.ai)
a month ago | convexstrictly | answer.ai | newest
1
Shape Suffixes – Good Coding Style (medium.com/noamshazeer)
a month ago | convexstrictly | medium.com | newest
3
Star Trek prompt optimal for grade school math on Llama-70B (twitter.com/emollick)
a month ago | convexstrictly | twitter.com | newest
1
(US Dept of Commerce) NTIA Solicits Comments on Open-Weight AI Models (commerce.gov)
2 months ago | convexstrictly | commerce.gov | newest
4
BitDelta: Your Fine-Tune May Only Be Worth One Bit (arxiv.org)
2 months ago | convexstrictly | arxiv.org | newest
37
Time is encoded in the weights of finetuned language models (arxiv.org)
4 months ago | convexstrictly | arxiv.org | best
2
Zoology 1: Measuring and Improving Recall in Efficient Language Models (stanford.edu)
4 months ago | convexstrictly | stanford.edu | newest
2
TinyGSM: Achieving >80% on GSM8k with small language models (arxiv.org)
4 months ago | convexstrictly | arxiv.org | newest
2
Androids built to meet the labor demands (1x.tech)
4 months ago | convexstrictly | 1x.tech | newest
6
Sam Altman will likely start another company with researchers leaving OpenAI (twitter.com/emilychangtv)
5 months ago | convexstrictly | twitter.com | newest
492
Three senior researchers have resigned from OpenAI
5 months ago | convexstrictly | ycombinator.com | best
1
Ron Conway disapproves of Sam Altman's firing (twitter.com/ronconway)
5 months ago | convexstrictly | twitter.com | newest
111
Sutskever: OpenAI board doing its mission to build AGI that benefits all (twitter.com/garymarcus)
5 months ago | convexstrictly | twitter.com | best
3
Kara Swisher: OpenAI dev day and store were "pushing too fast (twitter.com/karaswisher)
5 months ago | convexstrictly | twitter.com | newest
1
GPT4 coding regression claims misleading (twitter.com/si_boehm)
9 months ago | convexstrictly | twitter.com | frontpage
3
Model 4 bit inference 4.2x faster than 16 bit with full HF support (twitter.com/tim_dettmers)
9 months ago | convexstrictly | twitter.com | newest
3
SqueezeLLM: Dense-and-Sparse Quantization (arxiv.org)
10 months ago | convexstrictly | arxiv.org | newest
2
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model (arxiv.org)
10 months ago | convexstrictly | arxiv.org | newest
2
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 (arxiv.org)
10 months ago | convexstrictly | arxiv.org | newest
4
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression (arxiv.org)
10 months ago | convexstrictly | arxiv.org | newest
2
Azure GPT 3.5 completion endpoint bumps HumanEval from <50% to 74% (twitter.com/amanrsanger)
10 months ago | convexstrictly | twitter.com | newest
76
Falcon 40B LLM (which beats Llama) now Apache 2.0 (twitter.com/thom_wolf)
11 months ago | convexstrictly | twitter.com | best
3
Tim Dettmers: QLoRA finetunes a 65B model on a single 48 GB GPU (twitter.com/tim_dettmers)
11 months ago | convexstrictly | twitter.com | newest