Skip to content

Request for step-by-step SFT (single & multi-task) pipeline guidance for MFTCoder #91

@alvi75

Description

@alvi75

Hello MFTCoder authors 👋,

First, thank you for releasing MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning and sharing the code. I’m a 3rd year PhD student currently attempting to reproduce your SFT experiments (both single-task and multi-task baselines) using the repo.

I’ve successfully set up the environment (conda, CUDA 12.x; multi-GPU available) and explored the repo (e.g., build_model.py, atorch_trainer.py). However, I’m still unclear on the exact SFT pipelines. Could you please clarify or provide a minimal set of example scripts/configs?

What I’m Hoping To Clarify

  1. Data Loading & Formats
    • Expected JSON/JSONL schema per task (fields for input/output, roles, label masking).
    • Where task IDs / TASK2ID are defined and how they map to datasets.

  2. SFT-Single (SFT-S-*)
    • One concrete command (e.g., CodeLlama-13B-Python + QLoRA) to fine-tune on a single task (e.g., text-to-code or code completion).
    • Example config/flags for:
    • optimizer, LR schedule
    • max sequence length
    • gradient accumulation
    • PEFT settings (LoRA/QLoRA)

  3. SFT-Mixed (Multi-task)
    • How to specify multiple datasets in one run (CLI flags vs config file).
    • Task sampling/mixing policy: uniform vs size-based?
    • How to switch between:
    • sample-count weighted loss
    • valid-token weighted loss
    • Any recommendations on per-task batch sizes or temperature scaling.

  4. Loss Functions
    • Confirmation that SFT experiments used cross-entropy with weighted loss.
    • Whether focal loss or FAMO were excluded in the official SFT baseline results.
    • The exact flag names to enable:
    • weighted by valid tokens
    • weighted by samples

  5. Evaluation
    • Commands to evaluate on:
    • HumanEval / HumanEval-X
    • MBPP
    • CodeFuseEval
    • pass@k evaluation protocol, execution-based scoring, and seeds for reproducibility.

  6. Reproducibility
    • Example run logs or expected training curves.
    • Early stopping criteria and typical step counts.
    • Any specific branches (e.g., mftcoder_accelerate vs mftcoder_atorch) that contain the canonical SFT scripts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions