Training Your Own Agent in Unity ML-Agents

This guide walks you through how to run machine learning training on an agent inside Unity using the ML-Agents Toolkit. You’ll train an agent to learn behavior through reinforcement learning—where it improves by trial and error using reward feedback.

Before starting, make sure you've already set up the Python environment using venv.

Step 1: Activate the Python Environment

You must activate the virtual environment before training.

Windows Power Shell:

venv\Scripts\activate

macOS/Linux:

source venv/bin/activate

If you see (venv) in your terminal prompt, you're ready.

Step 2: Locate the Training Configuration File

Each example environment in ML-Agents has a corresponding .yaml config file that defines:

The training algorithm (PPO, SAC, etc.)
Learning rate
Batch size
Network architecture
Reward signals

For 3DBall, the config is located here:

ml-agents/config/ppo/3DBall.yaml

You can open it in any text editor if you're curious—but no changes are needed to begin. Here is an example from the 3D ball:

behaviors:
  3DBall:
    trainer_type: ppo
    hyperparameters:
      batch_size: 64
      buffer_size: 12000
      learning_rate: 0.0003
      beta: 0.001
      epsilon: 0.2
      lambd: 0.99
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: true
      hidden_units: 128
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
    keep_checkpoints: 5
    max_steps: 500000
    time_horizon: 1000
    summary_freq: 12000

Step 3: Configure Unity Editor for Training

Before you start training, make sure Unity is configured properly. These settings allow ML-Agents to communicate with the Python trainer and ensure smooth simulation.

1. Open the 3DBall Scene

In Unity:

Navigate to:
Assets/ML-Agents/Examples/3DBall/Scenes/
Double-click 3DBall.unity to open it

2. Select the Agent

In the Hierarchy, select the Agent GameObject in one of the 3DBall GameObjects, The 3DBall GameObject is the platform and ball combination that the agent controls.

3. Configure the Behavior Parameters

With the Agent selected, go to the Inspector and find the component:

Behavior Parameters

Set the following:

Setting	Value
Behavior Name	3DBall
Vector Observation	8 (already set)
Action Type	Continuous
Actions	Size = 2
Behavior Type	Default
Model	Select None to leave it blank while training

4. Add a Decision Requester Script (if missing)

Still on the Agent GameObject, make sure it has this component:

Decision Requester

If it's missing:

Click Add Component
Search for Decision Requester and add it
Set:

Decision Period = 5 (default)
Take Actions Between Decisions = ✅ (checked)
This tells Unity when to ask the Python policy for a new action.

5. Time Scale (Optional)

To make training faster:

In the top-right corner of the Game window, find the Time Scale setting (click the gear ⚙️ if needed)
Increase it to something like 10 or 20

This speeds up the physics simulation without affecting training accuracy.

Step 4: Start the Training Process

Now you’ll launch the training script that connects Unity to the ML training backend (using PyTorch).

Run this in the terminal:


mlagents-learn config/ppo/3DBall.yaml --run-id=My3DBallRun

What this command does:

It loads the 3DBall.yaml training configuration
It creates a new training run ID called My3DBallRun
It prepares to receive simulation data from Unity

You’ll see a message like:


Start training by pressing the Play button in the Unity Editor.

Step 5: Press Play in Unity to Begin Training

Return to the Unity Editor
Ensure the 3DBall scene is open
Click the Play ▶️ button at the top of the Unity window

You should see:

The platforms begin moving
Ball movement looks clumsy at first (because the agent is untrained)
Your terminal will begin printing training progress (reward scores, step count, etc.)

The agent is now learning by interacting with the environment and adjusting its neural network weights using reinforcement learning.

Step 6: Monitor the Training Progress

In the terminal, you’ll see output like:

Step: 5000. Mean Reward: -0.3. Std of Reward: 0.7.
Step: 10000. Mean Reward: 0.8. Std of Reward: 0.4.

"Mean Reward" tells you how well the agents are doing
Over time, the value should rise from near zero to ~1.8+ (perfect balance)

Training typically takes 5–15 minutes depending on your computer and settings.

Step 7: Stop Training and Save the Model

You can stop training early by pressing Ctrl + C in the terminal.

The model is saved automatically to:


ml-agents/results/My3DBallRun/

Inside, you’ll find:

My3DBallRun.onnx → the trained neural network model
Training logs (for TensorBoard)

Step 8: Use Your Trained Model in Unity

Now you'll plug in your custom-trained .onnx model and test it inside Unity.

How to do it:

Drag My3DBallRun.onnx from results/ into:

Project/Assets/ML-Agents/Examples/3DBall/TFModels/

In Unity, select the Ball3DAgent GameObject (in the Hierarchy)
In the Behavior Parameters component:
- Under Model, drag in your new .onnx file
- Set Behavior Type to: Inference Only
Press Play in Unity

You should now see a competent agent balancing the ball!

Optional Step 9: Experiment

Now that you’ve trained your own agent, try these:

Modify the reward function (in the C# Agent script)
Add obstacles to the environment
Change the .yaml training parameters (batch size, hidden layers, etc.)
Increase Unity’s Time Scale to speed up simulation

Rerun Training with Different Names

To run another training session, just change the --run-id:


mlagents-learn config/ppo/3DBall.yaml --run-id=FastBallTest

This keeps your previous model safe and avoids overwriting results.

Professor Yetty's Blog

Pages