魔搭 ModelScope (二)

引言

效果展示

过程

模型训练和评估

如果你想进一步使用自己的数据集进行模型训练，ModelScope 提供了丰富的预训练模型，并提供了简单易用的调用接口和统一的配置文件设计，允许用户仅用十几行 Python 代码就可以启动微调任务。
以下以一个简单的文本分类任务为例，演示如何使用十几行代码启动端到端的微调任务。总体流程包括以下步骤：

Dataset Loaading
Data Preprocessing
Training
Evaluation

Dataset Loaading

ModelScope 提供了标准的 MsDataset 接口，供用户基于 ModelScope 生态加载数据源。以下是加载 NLP 领域的 afqmc（蚂蚁金融问答匹配语料库）数据集的示例。

from modelscope.msdatasets import MsDataset

# Load training data

train_dataset = MsDataset.load('afqmc_small', split='train')

# Load evaluation data

eval_dataset = MsDataset.load('afqmc_small', split='validation')

关于数据集的详细说明，请参考 [数据集介绍](../dataset/introduction.md)。

#### Data Preprocessing

在 ModelScope 中，数据预处理与模型强相关。因此，在指定模型后，ModelScope 框架会自动从对应的模型卡片中读取配置文件中的预处理器关键字，并自动完成预处理器的实例化。

# text classification ModelHub

model_id = 'damo/nlp_structbert_sentence-similarity_chinese-tiny'

配置文件

...
"preprocessor":{
    "type": "sen-cls-tokenizer",
  },
...

当然，对于高级用户，配置文件也支持用户自定义，并从任意本地路径读取。关于配置文件的详细说明，请参考文档：[配置文件解释](../configuration/explanation.md)。

#### Training

我们支持单卡训练和分布式训练。请根据机器配置选择以下方法之一。如果是新手，建议优先选择单卡方法。

#### Single Card

首先，配置训练所需的参数：

from modelscope.trainers import build_trainer

# specify the work directory

tmp_dir = "/tmp"

# parameters config

kwargs = dict(
        model=model_id,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        work_dir=tmp_dir)

其次，根据参数实例化训练器对象

```python
trainer = build_trainer(default_args=kwargs)

最后，调用训练器对象的训练接口

```python
trainer.train()

Congratulations, you have completed a model training!😀

#### Distributed

首先，准备训练脚本，并将以下代码保存为 ./train.py 脚本：

import argparse
import os

from modelscope.trainers import build_trainer

parser = argparse.ArgumentParser(description='Train a ModelHub')
parser.add_argument('--local_rank', type=int, default=0)
args = parser.parse_args()
if 'LOCAL_RANK' not in os.environ:
    os.environ['LOCAL_RANK'] = str(args.local_rank)

# specify the work directory

tmp_dir = "/tmp"

# parameters config

kwargs = dict(
        model=model_id,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        work_dir=tmp_dir,
        launcher='pytorch'  # Distributed startup
)

其次，根据参数实例化训练器对象

```python
trainer = build_trainer(default_args=kwargs)

最后，调用训练器对象的训练接口

```python
trainer.train()

然后，启动分布式训练：

PyTorch:

Single node with multiple GPU cards:

$ python -m torch.distributed.launch --nproc_per_node=${NUMBER_GPUS} --master_port=${MASTER_PORT} ./train.py

    nproc_per_node: The number of processes created by the current host (number of GPUs used), for example--nproc_per_node=8。
    master_port: The port number of the master node, for example--master_port=29527。

Multiple nodes and multiple GPU cards::

Take two nodes as an example.

Node 1:

python -m torch.distributed.launch --nproc_per_node=${NUMBER_GPUS} --nnodes=2 --node_rank=0 --master_addr=${YOUR_MASTER_IP_ADDRESS} --master_port=${MASTER_PORT} ./train.py

Node 2:

python -m torch.distributed.launch --nproc_per_node=${NUMBER_GPUS} --nnodes=2 --node_rank=1 --master_addr=${YOUR_MASTER_IP_ADDRESS} --master_port=${MASTER_PORT} ./train.py

    nproc_per_node: The number of processes created by the current host (number of GPUs used), for example --nproc_per_node=8.
    nnodes: the number of nodes.
    node_rank: the index value of the current node.
    master_addr: the ip address of the master node, for example --master_addr=104.171.200.62.
    master_port: The port number of the master node, for example --master_port=29527.

Congratulations, you have completed a model distributed training!😀
Evaluation#

After the training is completed, configure the evaluation data set and directly call the evaluate function of the trainer object to complete the evaluation of the model.

# Directly call trainer.evaluate, you can pass in the ckpt generated in the train stage

# You can also directly verify the ModelHub without passing in parameters

metrics = trainer.evaluate(checkpoint_path=None)
print(metrics)

ModelScope also supports synchronous cross-validation during training. You need to configure the EvaluationHook of train.hooks in the config file. The specific configuration is as follows:

{
   ...
  "train": {
     ...
      "hooks": [
          ...
          , {
          "type": "EvaluationHook",
          "by_epoch": false,
          "interval": 100
      }]
  },

}

Users can adjust the configuration file according to their actual situation, or register the corresponding hook by themselves and call it in the configuration file through the type field registration.
Tutorials#

congratulations! At this point you have successfully learned the complete use of a model. If you want to know more about the platform functions, you can specifically refer to the corresponding function modules. At the same time, the platform provides corresponding tutorials to help you better understand the application of the model! We also welcome you to join our community to contribute your models and ideas, and jointly build a green open source community! For detailed tutorials, please see:

## 结论

## 引用

1.