This repository contains minimal code to run Mistral models.
Blog 7B: https://mistral.ai/news/announcing-mistral-7b/
Blog 8x7B: https://mistral.ai/news/mixtral-of-experts/
Blog 8x22B: https://mistral.ai/news/mixtral-8x22b/
Blog Codestral 22B: https://mistral.ai/news/codestral
Blog Codestral Mamba 7B: https://mistral.ai/news/codestral-mamba/
Blog Mathstral 7B: https://mistral.ai/news/mathstral/
Blog Nemo: https://mistral.ai/news/mistral-nemo/
Blog Mistral Large 2: https://mistral.ai/news/mistral-large-2407/
Blog Pixtral 12B: https://mistral.ai/news/pixtral-12b/
Blog Mistral Small 3.1: https://mistral.ai/news/mistral-small-3-1/
Discord: https://discord.com/invite/mistralai
Documentation: https://docs.mistral.ai/
Guardrailing: https://docs.mistral.ai/usage/guardrailing
Note: You will use a GPU to install mistral-inference
, as it currently requires xformers
to be installed and xformers
itself needs a GPU for installation.
pip install mistral-inference
cd $HOME && git clone https://github.com/mistralai/mistral-inference
cd $HOME/mistral-inference && poetry install .
Note:
- Important:
mixtral-8x22B-Instruct-v0.3.tar
is exactly the same as Mixtral-8x22B-Instruct-v0.1, only stored in.safetensors
formatmixtral-8x22B-v0.3.tar
is the same as Mixtral-8x22B-v0.1, but has an extended vocabulary of 32768 tokens.codestral-22B-v0.1.tar
has a custom non-commercial license, called Mistral AI Non-Production (MNPL) Licensemistral-large-instruct-2407.tar
has a custom non-commercial license, called Mistral AI Research (MRL) License
- All of the listed models above support function calling. For example, Mistral 7B Base/Instruct v3 is a minor update to Mistral 7B Base/Instruct v2, with the addition of function calling capabilities.
- The "coming soon" models will include function calling as well.
- You can download the previous versions of our models from our docs.
Name | ID | URL |
---|---|---|
Pixtral Large Instruct | mistralai/Pixtral-Large-Instruct-2411 | https://huggingface.co/mistralai/Pixtral-Large-Instruct-2411 |
Pixtral 12B Base | mistralai/Pixtral-12B-Base-2409 | https://huggingface.co/mistralai/Pixtral-12B-Base-2409 |
Pixtral 12B | mistralai/Pixtral-12B-2409 | https://huggingface.co/mistralai/Pixtral-12B-2409 |
Mistral Small 3.1 24B Base | mistralai/Mistral-Small-3.1-24B-Base-2503 | https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503 |
Mistral Small 3.1 24B Instruct | mistralai/Mistral-Small-3.1-24B-Instruct-2503 | https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503 |
News!!!: Mistral Large 2 is out. Read more about its capabilities here.
Create a local folder to store models
export MISTRAL_MODEL=$HOME/mistral_models
mkdir -p $MISTRAL_MODEL
Download any of the above links and extract the content, e.g.:
export 12B_DIR=$MISTRAL_MODEL/12B_Nemo
wget https://models.mistralcdn.com/mistral-nemo-2407/mistral-nemo-instruct-2407.tar
mkdir -p $12B_DIR
tar -xf mistral-nemo-instruct-2407.tar -C $12B_DIR
or
export M8x7B_DIR=$MISTRAL_MODEL/8x7b_instruct
wget https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar
mkdir -p $M8x7B_DIR
tar -xf Mixtral-8x7B-v0.1-Instruct.tar -C $M8x7B_DIR
For Hugging Face models' weights, here is an example to download Mistral Small 3.1 24B Instruct:
from pathlib import Path
from huggingface_hub import snapshot_download
mistral_models_path = Path.home().joinpath("mistral_models")
model_path = mistral_models_path / "mistral-small-3.1-instruct"
model_path.mkdir(parents=True, exist_ok=True)
repo_id = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
snapshot_download(
repo_id=repo_id,
allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"],
local_dir=model_path,
)
The following sections give an overview of how to run the model from the Command-line interface (CLI) or directly within Python.
- Demo
To test that a model works in your setup, you can run the mistral-demo
command.
E.g. the 12B Mistral-Nemo model can be tested on a single GPU as follows:
mistral-demo $12B_DIR
Large models, such 8x7B and 8x22B have to be run in a multi-GPU setup. For these models, you can use the following command:
torchrun --nproc-per-node 2 --no-python mistral-demo $M8x7B_DIR
Note: Change --nproc-per-node
to more GPUs if available.
- Chat
To interactively chat with the models, you can make use of the mistral-chat
command.
mistral-chat $12B_DIR --instruct --max_tokens 1024 --temperature 0.35
For large models, you can make use of torchrun
.
torchrun --nproc-per-node 2 --no-python mistral-chat $M8x7B_DIR --instruct
Note: Change --nproc-per-node
to more GPUs if necessary (e.g. for 8x22B).
- Chat with Codestral
To use Codestral as a coding assistant you can run the following command using mistral-chat
.
Make sure $M22B_CODESTRAL
is set to a valid path to the downloaded codestral folder, e.g. $HOME/mistral_models/Codestral-22B-v0.1
mistral-chat $M22B_CODESTRAL --instruct --max_tokens 256
If you prompt it with "Write me a function that computes fibonacci in Rust", the model should generate something along the following lines:
Sure, here's a simple implementation of a function that computes the Fibonacci sequence in Rust. This function takes an integer `n` as an argument and returns the `n`th Fibonacci number.
fn fibonacci(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
_ => fibonacci(n - 1) + fibonacci(n - 2),
}
}
fn main() {
let n = 10;
println!("The {}th Fibonacci number is: {}", n, fibonacci(n));
}
This function uses recursion to calculate the Fibonacci number. However, it's not the most efficient solution because it performs a lot of redundant calculations. A more efficient solution would use a loop to iteratively calculate the Fibonacci numbers.
You can continue chatting afterwards, e.g. with "Translate it to Python".
- Chat with Codestral-Mamba
To use Codestral-Mamba as a coding assistant you can run the following command using mistral-chat
.
Make sure $7B_CODESTRAL_MAMBA
is set to a valid path to the downloaded codestral-mamba folder, e.g. $HOME/mistral_models/mamba-codestral-7B-v0.1
.
You then need to additionally install the following packages:
pip install packaging mamba-ssm causal-conv1d transformers
before you can start chatting:
mistral-chat $7B_CODESTRAL_MAMBA --instruct --max_tokens 256
- Chat with Mathstral
To use Mathstral as an assistant you can run the following command using mistral-chat
.
Make sure $7B_MATHSTRAL
is set to a valid path to the downloaded codestral folder, e.g. $HOME/mistral_models/mathstral-7B-v0.1
mistral-chat $7B_MATHSTRAL --instruct --max_tokens 256
If you prompt it with "Albert likes to surf every week. Each surfing session lasts for 4 hours and costs $20 per hour. How much would Albert spend in 5 weeks?", the model should answer with the correct calculation.
You can then continue chatting afterwards, e.g. with "How much would he spend in a year?".
- Chat with Mistral Small 3.1 24B Instruct
To use Mistral Small 3.1 24B Instruct as an assistant you can run the following command using mistral-chat
.
Make sure $MISTRAL_SMALL_3_1_INSTRUCT
is set to a valid path to the downloaded mistral small folder, e.g. $HOME/mistral_models/mistral-small-3.1-instruct
mistral-chat $MISTRAL_SMALL_3_1_INSTRUCT --instruct --max_tokens 256
If you prompt it with "The above image presents an image of which park ? Please give the hints to identify the park." with the following image URL https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png, the model should answer with the Yosemite park and give hints to identify it.
You can then continue chatting afterwards, e.g. with "What is the name of the lake in the image?". The model should respond that it is not a lake but a river.
- Instruction Following:
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest
tokenizer = MistralTokenizer.from_file("./mistral-nemo-instruct-v0.1/tekken.json") # change to extracted tokenizer file
model = Transformer.from_folder("./mistral-nemo-instruct-v0.1") # change to extracted model dir
prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."
completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
- Multimodal Instruction Following:
from pathlib import Path
from huggingface_hub import snapshot_download
from mistral_common.protocol.instruct.messages import ImageURLChunk, TextChunk
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_inference.generate import generate
from mistral_inference.transformer import Transformer
model_path = Path.home().joinpath("mistral_models") / "mistral-small-3.1-instruct" # change to extracted model
tokenizer = MistralTokenizer.from_file(model_path / "tekken.json")
model = Transformer.from_folder(model_path)
url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
prompt = "The above image presents an image of which park ? Please give the hints to identify the park."
user_content = [ImageURLChunk(image_url=url), TextChunk(text=prompt)]
tokens, images = tokenizer.instruct_tokenizer.encode_user_content(user_content, False)
out_tokens, _ = generate(
[tokens],
model,
images=[images],
max_tokens=256,
temperature=0.15,
eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id,
)
result = tokenizer.decode(out_tokens[0])
print("Prompt:", prompt)
print("Completion:", result)
- Function Calling:
from mistral_common.protocol.instruct.tool_calls import Function, Tool
completion_request = ChatCompletionRequest(
tools=[
Tool(
function=Function(
name="get_current_weather",
description="Get the current weather",
parameters={
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location.",
},
},
"required": ["location", "format"],
},
)
)
],
messages=[
UserMessage(content="What's the weather like today in Paris?"),
],
)
tokens = tokenizer.encode_chat_completion(completion_request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])
print(result)
- Fill-in-the-middle (FIM):
Make sure to have mistral-common >= 1.2.0
installed:
pip install --upgrade mistral-common
You can simulate a code completion in-filling as follows.
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.instruct.request import FIMRequest
tokenizer = MistralTokenizer.from_model("codestral-22b")
model = Transformer.from_folder("./mistral_22b_codestral")
prefix = """def add("""
suffix = """ return sum"""
request = FIMRequest(prompt=prefix, suffix=suffix)
tokens = tokenizer.encode_fim(request).tokens
out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])
middle = result.split(suffix)[0].strip()
print(middle)
To run logits equivalence:
python -m pytest tests
The deploy
folder contains code to build a vLLM image with the required dependencies to serve the Mistral AI model. In the image, the transformers library is used instead of the reference implementation. To build it:
docker build deploy --build-arg MAX_JOBS=8
Instructions to run the image can be found in the official documentation.
- Use Mistral models on Mistral AI official API (La Plateforme)
- Use Mistral models via cloud providers
[1]: LoRA: Low-Rank Adaptation of Large Language Models, Hu et al. 2021