Sim2Real Deployment¶
Overview¶
This section describes how to deploy a trained policy from simulation to a real robot.
The objective is to ensure that a policy trained in simulation can be executed on real hardware without modifying its structure or semantics.
This process includes:
- Model export
- Runtime integration
- Control loop alignment
- Safety validation
- Real-world debugging
Core Principle¶
Sim2Real is not a conversion step.
It is a consistency verification process between:
- simulation pipeline
- deployment pipeline
A successful deployment requires that both pipelines share:
- identical observation structure
- identical action interpretation
- identical timing behavior
1. Model Export¶
After training, export the policy:
instinct-play <task_name> --export-onnx
Output:
actor.onnx
metadata.json
Requirements¶
- ONNX graph must be static and deterministic
- Input/output tensor shapes must match deployment runtime
- Metadata must define:
- observation order
- action scaling
- default joint pose
2. Runtime Integration¶
The deployed model must be integrated into a runtime system:
Typical stack:
Sensor → Observation Builder → Policy (ONNX) → Action Decoder → Controller → Robot
Minimal Inference Loop¶
obs = build_observation(sensor_data)
action = onnx_model.run(obs)
target_q = decode_action(action)
send_to_controller(target_q)
3. Control Interface Mapping¶
The policy output must be mapped to the robot control interface.
Typical mapping:
target_q = q_default + scale * action
torque = pd_control(target_q, current_q, current_dq)
Requirements:
- Joint order must match exactly
- Scaling must match training configuration
- Controller gains must be consistent
4. Control Frequency Alignment¶
Simulation and real robot must run at compatible frequencies.
Example:
| Component | Frequency |
|---|---|
| Policy | 50 Hz |
| Controller | 500 Hz |
Implementation:
- policy runs at low frequency
- controller interpolates or holds command
Mismatch leads to:
- delayed response
- oscillation
- instability
5. Safety Layer (Mandatory)¶
Before enabling full control, implement safety constraints:
Joint Limits¶
target_q = clip(target_q, lower_limits, upper_limits)
Torque Limits¶
torque = clip(torque, -tau_limit, tau_limit)
Emergency Stop¶
- hardware-level stop
- software watchdog timeout
6. Zero-Action Validation¶
Before running policy:
- Set action = 0
- Run system
Expected:
- robot remains stable
- no drift
- no oscillation
If failure occurs:
- check joint direction
- check default pose
- check controller gains
7. Incremental Activation Strategy¶
Do not directly run full policy.
Recommended steps:
- Zero-action test
- Small amplitude action test
- Single-joint test
- Full policy rollout
8. Real-World Debugging¶
Common debugging steps:
Step 1: Verify Observation¶
- check sensor values
- compare with simulation range
Step 2: Verify Action¶
- log raw policy output
- ensure values are bounded
Step 3: Verify Mapping¶
- confirm action → joint mapping
- check sign and scaling
Step 4: Verify Timing¶
- measure loop frequency
- ensure stable scheduling
9. Common Failures¶
| Problem | Cause | Solution |
|---|---|---|
| unstable robot | wrong gains | tune kp/kd |
| wrong movement | joint mismatch | fix mapping |
| oscillation | timing mismatch | align frequency |
| no response | inference failure | check ONNX runtime |
| drift at zero | offset error | fix default pose |
10. Execution Workflow¶
flowchart TD
A[Train Policy] --> B[Export ONNX]
B --> C[Integrate Runtime]
C --> D[Map Control Interface]
D --> E[Safety Layer]
E --> F[Zero-Action Test]
F --> G[Incremental Activation]
G --> H[Full Deployment]
Key Takeaways¶
- Sim2Real depends on strict consistency
- Most failures come from mismatch, not model quality
- Always validate step-by-step
- Never skip safety checks