Request for step-by-step SFT (single & multi-task) pipeline guidance for MFTCoder

Hello MFTCoder authors 👋,

First, thank you for releasing MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning and sharing the code. I’m a 3rd year PhD student currently attempting to reproduce your SFT experiments (both single-task and multi-task baselines) using the repo.

I’ve successfully set up the environment (conda, CUDA 12.x; multi-GPU available) and explored the repo (e.g., build_model.py, atorch_trainer.py). However, I’m still unclear on the exact SFT pipelines. Could you please clarify or provide a minimal set of example scripts/configs?

What I’m Hoping To Clarify

1. Data Loading & Formats
	•	Expected JSON/JSONL schema per task (fields for input/output, roles, label masking).
	•	Where task IDs / TASK2ID are defined and how they map to datasets.

2. SFT-Single (SFT-S-*)
	•	One concrete command (e.g., CodeLlama-13B-Python + QLoRA) to fine-tune on a single task (e.g., text-to-code or code completion).
	•	Example config/flags for:
          	•	optimizer, LR schedule
          	•	max sequence length
          	•	gradient accumulation
          	•	PEFT settings (LoRA/QLoRA)

3. SFT-Mixed (Multi-task)
	•	How to specify multiple datasets in one run (CLI flags vs config file).
	•	Task sampling/mixing policy: uniform vs size-based?
	•	How to switch between:
          	•	sample-count weighted loss
          	•	valid-token weighted loss
          	•	Any recommendations on per-task batch sizes or temperature scaling.

4. Loss Functions
	•	Confirmation that SFT experiments used cross-entropy with weighted loss.
	•	Whether focal loss or FAMO were excluded in the official SFT baseline results.
	•	The exact flag names to enable:
        	•	weighted by valid tokens
        	•	weighted by samples

5. Evaluation
	•	Commands to evaluate on:
        	•	HumanEval / HumanEval-X
        	•	MBPP
        	•	CodeFuseEval
	•	pass@k evaluation protocol, execution-based scoring, and seeds for reproducibility.

6. Reproducibility
	•	Example run logs or expected training curves.
	•	Early stopping criteria and typical step counts.
	•	Any specific branches (e.g., mftcoder_accelerate vs mftcoder_atorch) that contain the canonical SFT scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for step-by-step SFT (single & multi-task) pipeline guidance for MFTCoder #91

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Request for step-by-step SFT (single & multi-task) pipeline guidance for MFTCoder #91

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions