Skip to content

enigma9981/OPT-TPU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

image

OPT-TPU

本项目实现BM1684X部署语言大模型OPT-6.7B。通过TPU-MLIR编译器将模型转换成bmodel,并采用c++代码将其部署到BM1684X的PCIE环境,或者SoC环境。

Build

Requirements

  • 支持C++20标准的gcc或clang编译器(demo本身只需要C++17,依赖ctre需要C++20)
  • libsophon库(默认在/opt/sophon/libsophon-current,如果不在默认路径或者需要特定版本的libsophon,需要在下面编译时指定LIBSOPHON_DIR
  • 转换好的OPT-6.7 bmodel文件,和huggingface仓库中clone的vocab.jsonmerges.txt放在一起。最好直接GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/facebook/opt-6.7b && cp *.bmodel opt-6.7b将它们放在一起

Build

cd build
cmake .. -GNinja -DLIBSOPHON_DIR=...
ninja

Inference

完成上文的编译过程后,生成build/demo/demo可执行文件,它可以完成加载bmodel并在BM1684X设备上进行推理。执行build/demo/demo -h可以查看参数含义。 加上-c表明使用用户输入,否则会自动使用默认的输入展示

模型转换

Prepreparation

修改transformers.models.opt.modeling_opt

class OPTLearnedPositionalEmbedding(nn.Embedding):
    """
    This module learns positional embeddings up to a fixed maximum size.
    """

    def __init__(self, num_embeddings: int, embedding_dim: int):
        # OPT is set up so that if padding_idx is specified then offset the embedding ids by 2
        # and adjust num_embeddings appropriately. Other models don't have this hack
        self.offset = 2
        super().__init__(num_embeddings + self.offset, embedding_dim)

    def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0):
        """`input_ids_shape` is expected to be [bsz x seqlen]."""
        attention_mask = attention_mask.long()

        # create positions depending on attention_mask
        positions = (torch.cumsum(attention_mask, dim=1).type_as(attention_mask) * attention_mask).long() - 1

        # cut positions if `past_key_values_length` is > 0
        positions = positions[:, past_key_values_length:]

        return super().forward(positions + self.offset)

OPTLearnedPositionalEmbedding如下:

class OPTLearnedPositionalEmbedding(nn.Embedding):
    """
    This module learns positional embeddings up to a fixed maximum size.
    """

    def __init__(self, num_embeddings: int, embedding_dim: int):
        # OPT is set up so that if padding_idx is specified then offset the embedding ids by 2
        # and adjust num_embeddings appropriately. Other models don't have this hack
        self.offset = 2
        super().__init__(num_embeddings + self.offset, embedding_dim)

    # def forward(self, attention_mask: torch.LongTensor, past_key_values_length: int = 0):
    #     """`input_ids_shape` is expected to be [bsz x seqlen]."""
    #     attention_mask = attention_mask.long()

    #     # create positions depending on attention_mask
    #     positions = (torch.cumsum(attention_mask, dim=1).type_as(attention_mask) * attention_mask).long() - 1

    #     # cut positions if `past_key_values_length` is > 0
    #     positions = positions[:, past_key_values_length:]

    #     return super().forward(positions + self.offset)
    def forward(self, position_ids):
        return super().forward(position_ids + self.offset)

在这里为了方便我们手动传入position_id,而不是由attention_mask计算

模型转换

python export_to_onnx.py
./compile.sh --mode int8

稍等片刻即可导出int8量化后的bmodel文件,若需要int4量化,使用./compile.sh --mode int4即可

由于脚本中有很多相对路径,一个供参考的目录layout为:

Alt text

About

The OPT series LLMs on SOPHGO BM1684X TPU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published