gpjt - Hacker News

46

10Gb/s Ethernet: what I did to get it working in my home (gilesthomas.com)

10 hours ago gpjt gilesthomas.com

1

10Gb Ethernet: what I had to (re)learn (gilesthomas.com)

2 days ago gpjt gilesthomas.com

3

LLM from scratch, part 33 – what I learned from the appendices (gilesthomas.com)

a week ago gpjt gilesthomas.com

1

LLM from scratch (32l) – Interventions: updated instruction fine-tuning results (gilesthomas.com)

a week ago gpjt gilesthomas.com

1

How an LLM becomes more coherent as we train it (gilesthomas.com)

a week ago gpjt gilesthomas.com

2

LLM from scratch, part 32k – Interventions: gradient accumulation (gilesthomas.com)

2 weeks ago gpjt gilesthomas.com

2

Provision: LLM-powered server setup from Markdown (provision.sh)

2 weeks ago gpjt provision.sh

2

LLM from scratch, part 32j – trying to train a better model in the cloud (gilesthomas.com)

3 weeks ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 32i – Interventions: what is in the noise? (gilesthomas.com)

3 weeks ago gpjt gilesthomas.com

2

Writing an LLM from scratch, part 32h – Interventions: full fat float32 (gilesthomas.com)

3 weeks ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 32g – Interventions: weight tying (gilesthomas.com)

a month ago gpjt gilesthomas.com

5

Writing an LLM from scratch, part 32f – Interventions: weight decay (gilesthomas.com)

a month ago gpjt gilesthomas.com

3

Writing an LLM from scratch, part 32e – Interventions: the learning rate (gilesthomas.com)

a month ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 32d – Interventions: adding attention bias (gilesthomas.com)

3 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 32c – Interventions: removing dropout (gilesthomas.com)

3 months ago gpjt gilesthomas.com

2

Writing an LLM from scratch, part 32B – Interventions: gradient clipping (gilesthomas.com)

3 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 32a – Interventions: training a baseline model (gilesthomas.com)

3 months ago gpjt gilesthomas.com

1

Getting a Custom PyTorch LLM onto the Hugging Face Hub (gilesthomas.com)

4 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 31 – the models are now on Hugging Face (gilesthomas.com)

4 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (gilesthomas.com)

4 months ago gpjt gilesthomas.com

2

LLM from scratch, part 29 – using DDP to train a base model in the cloud (gilesthomas.com)

4 months ago gpjt gilesthomas.com

46

LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (gilesthomas.com)

5 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 27 – what's left, and what's next? (gilesthomas.com)

6 months ago gpjt gilesthomas.com

3

Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (gilesthomas.com)

6 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 25 – instruction fine-tuning (gilesthomas.com)

7 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 24 – the transcript hack (gilesthomas.com)

7 months ago gpjt gilesthomas.com

2

Retro Language Models: Rebuilding Karpathy's RNN in PyTorch (gilesthomas.com)

7 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 23 – fine-tuning for classification (gilesthomas.com)

7 months ago gpjt gilesthomas.com

60

Writing an LLM from scratch, part 22 – training our LLM (gilesthomas.com)

7 months ago gpjt gilesthomas.com

2

Revisiting Karpathy's 'Unreasonable Effectiveness of Recurrent Neural Networks' (gilesthomas.com)

7 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 21 – perplexed by perplexity (gilesthomas.com)

7 months ago gpjt gilesthomas.com

7

Writing an LLM from scratch, part 20 – starting training, and cross entropy loss (gilesthomas.com)

7 months ago gpjt gilesthomas.com

2

How Do LLMs Work? (gilesthomas.com)

8 months ago gpjt gilesthomas.com

63

The maths you need to start understanding LLMs (gilesthomas.com)

8 months ago gpjt gilesthomas.com

2

What AI chatbots are doing under the hood (gilesthomas.com)

9 months ago gpjt gilesthomas.com

1

LLM from scratch, part 18 – residuals, shortcut connections, and the Talmud (gilesthomas.com)

9 months ago gpjt gilesthomas.com

1

The fixed length bottleneck and the feed forward network (gilesthomas.com)

9 months ago gpjt gilesthomas.com

3

Writing an LLM from scratch, part 17 – the feed-forward network (gilesthomas.com)

9 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 16 – layer normalisation (gilesthomas.com)

10 months ago gpjt gilesthomas.com

1

Leaving PythonAnywhere (gilesthomas.com)

11 months ago gpjt gilesthomas.com

2

Writing an LLM from scratch, part 15 – from context vectors to logits (gilesthomas.com)

12 months ago gpjt gilesthomas.com

1

Writing an LLM from scratch, part 14 – the complexity of self-attention at scale (gilesthomas.com)

12 months ago gpjt gilesthomas.com

41