724 followers 141 articles/week
Exploration vs Exploitation in Tuning Playbook - Need Help Understanding the Process [D]

[edited] I'm reading through "Tuning Playbook" and I'm having some trouble understanding the concept of exploration vs exploitation in the context of hyperparameter tuning. Is there anyone who can explain this concept in a more concrete manner rather being abstract, or maybe provide an example of how exploration is conducted in hyperparameter tuning?...

Sat Jun 1, 2024 06:59
[D] Can other areas researches such as the recent mapping of a cubic millimeter of a human brain tissue, help the Machine Learning field?

https://www.scientificamerican.com/article/a-cubic-millimeter-of-a-human-brain-has-been-mapped-in-spectacular-detail/ Can the researches surrounding the human brain, such as this latest map of a human brain provide some insight for the Machine Learning field, in order to build more efficient models of AI/algorithms? Lay person here. submitted by...

Sat Jun 1, 2024 04:00
[D] Cheaper way to do model inference?

Does anyone know of any solutions for saving GPU compute during server downtime? I'm currently doing model inference and most of the time I'm just paying for compute without serving any user requests. submitted by /u/Fun_Win_6054 [link] [comments]

Sat Jun 1, 2024 04:00
[D] Need help to use dedicated GPU on vscode jupyter notebook.

Hey, I am currently doing my works in both colab and vscode jupyter extension. Since I have a Nvidia card I want to use that for training all kinds of models(deep,simple) using the jupyter notebook in the vscode. How to set up this requirement? To make it simple I want to use dedicated GPU for the vscode jupyter notebook. submitted by /u/Saheenus...

Sat Jun 1, 2024 04:00
[D] Bigram tokenizers better than status quo? Especially for for multilingual

[My first post here, a better place for this, in some LLM subreddit? It was removed, until I added [D], [R] better for my sort of research?] The tokenizer (tiktoken) for gpt-4o is seemingly adding tokens (at least for Icelandic), since gpt-4, and I believe it's going in the opposite direction we should be going. I don't think it's sustainable, nor needed,...

Sat Jun 1, 2024 04:00
[D] Is sequence packing common for training transformers?

Hi all, I want to train a small transformer language model from scratch and I'm trying to squeeze out as much training efficiency as reasonably possible. I was thinking about how to build the training batches which brought me to this paper EFFICIENT SEQUENCE PACKING WITHOUT CROSS-CONTAMINATION: ACCELERATING LARGE LANGUAGE MODELS WITHOUT IMPACTING PERFORMANCE...

Sat Jun 1, 2024 04:00

Build your own newsfeed

Ready to give it a go?
Start a 14-day trial, no credit card required.

Create account