
The first observability tool for LLM-powered robots.
When your robot does something unexpected, you have no idea why.
Was it the vision model? The planner? A bad command? The network? There's no way to know. Robot SDKs have no logging, no observability, no debugging. You're flying blind.
One line of code. Wrap your robot client and see everything:
from unitree_sdk2py.go2.sport.sport_client import SportClient
from shadowdance import ShadowDance
# Your existing robot code
client = SportClient()
client.Init()
# ONE LINE - wrap with ShadowDance
client = ShadowDance(client) # <- THAT'S IT. Everything below is traced.
# All robot commands now traced with inputs, outputs, timing
client.StandUp()
client.Move(0.3, 0, 0)
client.Damp()No refactoring. No code changes. Just wrap and go.
Modern LLM-powered robots span multiple systems:
┌─────────────────────────────────────────┐
│ Cloud LLM (OpenAI, Anthropic) │
│ "pick up the white box" → commands │
├─────────────────────────────────────────┤
│ Your Agent Code │
│ Vision → Planning → Execution │
├─────────────────────────────────────────┤
│ Robot (Unitree Go2, H1, etc.) │
│ Move, StandUp, Damp, gripper control │
└─────────────────────────────────────────┘
ShadowDance traces the entire pipeline:
from shadowdance import ShadowDance
from openai import OpenAI
# Wrap your LLM (ONE LINE)
llm = OpenAI()
llm = ShadowDance(llm, run_type="llm")
# Wrap your robot (ONE LINE)
robot = SportClient()
robot = ShadowDance(robot, run_type="tool")
# Now you see the FULL chain in your dashboard:
# LLM prompt → generated commands → robot execution → timing → errorsLangSmith (default):
export PLATFORM=langsmith
export LANGCHAIN_API_KEY=...Langfuse:
export PLATFORM=langfuse
export LANGFUSE_PUBLIC_KEY=...
export LANGFUSE_SECRET_KEY=...Weave (Weights & Biases):
export PLATFORM=weave
export WANDB_API_KEY=...Run: robot_session
├── StandUp() 8ms ✓
├── Move(vx=0.3, vy=0, vyaw=0) 12ms ✓
├── Move(vx=0, vy=0.3, vyaw=0) 11ms ✓
└── Damp() 9ms ✓
from shadowdance import task
@task("pick_up_box")
def pick_up_box():
robot = ShadowDance(SportClient())
robot.StandUp()
robot.Move(0.3, 0, 0)
robot.Damp()In your dashboard:
pick_up_box (chain)
├── StandUp (tool)
├── Move (tool)
└── Damp (tool)
See how LLM decisions affect robot behavior:
code_as_policies_task (chain)
├── vision_analysis (llm)
│ └── "white_box at [0.0, 0.1, 0.72]"
├── code_generation (llm)
│ └── "robot.move_to(0.0, 0.1, 0.72)"
└── code_execution (tool)
├── move_to (tool) ✓
└── close_gripper (tool) ✓
# Log all robot commands to a dataset
robot = ShadowDance(
SportClient(),
run_type="tool",
log_to_dataset="robot-tasks"
)
# Every command logged for evaluation
robot.StandUp() # ✓ Logged
robot.Move(0.3, 0, 0) # ✓ LoggedIn your dashboard:
- Go to Datasets & Experiments
- Find
robot-taskswith all executions - Compare robot versions
- Run regression tests
pip install shadowdanceThen install your chosen platform:
# For LangSmith (default)
pip install langsmith
# For Langfuse
pip install langfuse
# For Weave
pip install wandb# Set your platform
export PLATFORM=langsmith
export LANGCHAIN_API_KEY=your-key
# Run your robot code
python your_robot_script.pyView traces at:
- LangSmith: smith.langchain.com
- Langfuse: Your Langfuse dashboard
- Weave: Your Weave project in W&B
Wraps any client object with observability tracing.
Args:
client: The client object to wrap (Unitree SDK, OpenAI, etc.)run_type: Type for filtering ("tool", "llm", "chain", etc.)log_to_dataset: Optional dataset name for evaluation
Example:
# Robot
robot = ShadowDance(SportClient(), run_type="tool")
# LLM
llm = ShadowDance(OpenAI(), run_type="llm")
# Agent
agent = ShadowDance(MyAgent(), run_type="chain")Decorator to create parent runs for nested tracing.
Example:
@task("pick_up_box")
def pick_up_box():
robot = ShadowDance(SportClient())
robot.StandUp() # Nested under "pick_up_box"Context manager for creating parent runs.
Example:
with task_context("move_to_kitchen"):
robot = ShadowDance(SportClient())
robot.Move(0.5, 0, 0)| Run Type | Use Case | Example |
|---|---|---|
"llm" |
LLM/VLM API calls | OpenAI, Anthropic, vision models |
"tool" |
Robot commands, API calls | Move, StandUp, gripper control |
"chain" |
Orchestration logic | Agents, multi-step workflows |
"retriever" |
Document retrieval | RAG systems, vector stores |
"embedding" |
Embedding generation | Text embeddings |
Before ShadowDance:
- Robot SDKs have zero observability
- No way to debug why robot did X instead of Y
- Can't correlate LLM decisions with robot actions
- No regression testing for robot behavior
- Flying blind in production
After ShadowDance:
- Every robot command traced with timing and results
- Full LLM → robot pipeline visibility
- Organized traces by task
- Datasets for evaluation and regression
- Debug production issues from your dashboard
One line of code. That's all it takes to go from blind to full visibility.
./shadowdance/ # Main package
├── __init__.py # ShadowDance wrapper + factory
└── adapters/
├── __init__.py # Base interface + TraceEvent
├── langsmith.py # LangSmith adapter
├── langfuse.py # Langfuse adapter
├── weave.py # Weave adapter (W&B)
├── passthrough.py # Pass-through adapter (testing)
├── example.py # Template for custom adapters
└── README.md # Adapter documentation
./tests/ # Test suite
├── test_adapter_comparison.py # Adapter overhead tests
├── test_unitree_examples.py # Unitree SDK verification
└── unitree_examples/ # Copied Unitree SDK examples
./examples/ # Example code
./pyproject.toml # Package configuration
./requirements.txt # Dependencies
# Run adapter comparison tests (verifies <1ms overhead)
python tests/test_adapter_comparison.py
# Run Unitree example verification tests
python tests/test_unitree_examples.py
# Run Unitree examples with different adapters
python tests/run_examples.py --examples-dir tests/unitree_examplesMIT