Vendor-recommended LLM parameter quick reference

2025Q2.

I've been kicking the tires on various LLMs lately, and like many have been quite taken by the pace of new releases especially of models with weights distributed under open licenses, always with impressive benchmark results. I don't have local GPUs so trialling different models necessarily requires using an external host. There are various configuration parameters you can set when sending a query that affect generation and many vendors document recommended settings on the model card or associated documentation. For my own purposes I wanted to collect these together in one place, and also confirm in which cases common serving software like vLLM will use defaults provided alongside the model.

Main conclusions

Overview of parameters

The parameters supported by vLLM are documented here, though not all are supported in the HTTP API provided by different vendors. For instance, the subset of parameters supported by models on Parasail (an inference API provider I've been kicking the tires on recently) is documented here I cover just that subset below:

Default vLLM behaviour

The above settings are typically exposed via the API, but what if you don't explicitly set them? vllm documents that it will by default apply settings from generation_config.json distributed with the model on HuggingFace if it exists (overriding its own defaults), but you can ignore generation_config.json to just use vllm's own defaults by setting --generation-config vllm when launching the server. This behaviour was introduced in a PR that landed in early March this year. We'll explore below which models actually have a generation_config.json with their recommended settings, but what about parameters not set in that file, or if that file isn't present? As far as I can see, that's where _DEFAULT_SAMPLING_PARAMS comes in and we get temperature=1.0 and repetition_penalty, top_p, top_k and min_p set to values that have no effect on the sampler.

Although Parasail use vllm for serving most (all?) of their hosted models, it's not clear if they're running with a configuration that allows defaults to be taken from generation_config.json. I'll update this post if that is clarified.

As all of these models are distributed with benchmark results front and center, it should be easy to at least find what settings were used for these results, even if it's not an explicit recommendation on which parameters to use - right? Let's find out. I've decided to step through models groups by their level of openness.

Open weight and open dataset models

Open weight models

Weight available (non-open) models

model.yaml

As it happens, while writing this blog post I saw Simon Willison blogged about model.yaml. Model.yaml is an initiative from the LM Studio folks to provide a definition of a model and its sources that can be used with multiple local inference tools. This includes the ability to specify preset options for the model. It doesn't appear to be used by anyone else though, and looking at the LM Studio model catalog, taking qwen/qwen3-32b as an example: although the Qwen3 series have very strongly recommended default settings, the model.yaml only sets top_k and min_p, leaving temperature and top_p unset.


Article changelog