[EVAL] AI-generated Gson 1.6 instrumentation (blind test)#10940
[EVAL] AI-generated Gson 1.6 instrumentation (blind test)#10940jordan-wong wants to merge 1 commit intomasterfrom
Conversation
…ate) Generated by apm-instrumentation-toolkit using java_integration workflow. This is a BLIND TEST run - gson was deleted from repo before generation. Agent had ZERO access to original implementation (shallow clone + config override). **Generation Metrics:** - Runtime: 425.3s (7.1 minutes) - Agent turns: 96 - Cost: $3.29 **Layer 1 Validation:** ✅ ALL PASS - compileJava: ✅ PASS - spotlessCheck: ✅ PASS - codenarcTest: ✅ PASS - muzzle: ✅ PASS - test: ✅ PASS - latestDepTest: ✅ PASS **Key Innovations:** - NEW: GsonHelper abstraction class for CallDepthThreadLocalMap - Broader method matchers (catches all toJson/fromJson overloads) - Cleaner code structure with consistent naming **Contamination Check:** ✅ ZERO - Verified agent logs show no git show commands - All file paths show /tmp/dd-trace-java-gson-clean/ - Agent used jackson-core and hystrix as references (both exist in clean clone) **Evaluation:** See eval-comparison/ directory for comprehensive analysis 🤖 Generated with apm-instrumentation-toolkit
BenchmarksStartupParameters
See matching parameters
SummaryFound 1 performance improvements and 0 performance regressions! Performance is the same for 60 metrics, 10 unstable metrics.
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.053 s) : 0, 1053300
Total [baseline] (10.928 s) : 0, 10927723
Agent [candidate] (1.058 s) : 0, 1057761
Total [candidate] (11.007 s) : 0, 11007414
section appsec
Agent [baseline] (1.246 s) : 0, 1245997
Total [baseline] (11.12 s) : 0, 11119572
Agent [candidate] (1.256 s) : 0, 1256060
Total [candidate] (11.259 s) : 0, 11258982
section iast
Agent [baseline] (1.23 s) : 0, 1229703
Total [baseline] (11.262 s) : 0, 11262227
Agent [candidate] (1.234 s) : 0, 1233566
Total [candidate] (11.376 s) : 0, 11376452
section profiling
Agent [baseline] (1.183 s) : 0, 1182876
Total [baseline] (10.963 s) : 0, 10962994
Agent [candidate] (1.199 s) : 0, 1199409
Total [candidate] (11.055 s) : 0, 11054585
gantt
title petclinic - break down per module: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.21 ms) : 0, 1210
crashtracking [candidate] (1.219 ms) : 0, 1219
BytebuddyAgent [baseline] (626.936 ms) : 0, 626936
BytebuddyAgent [candidate] (629.134 ms) : 0, 629134
AgentMeter [baseline] (29.243 ms) : 0, 29243
AgentMeter [candidate] (29.358 ms) : 0, 29358
GlobalTracer [baseline] (255.94 ms) : 0, 255940
GlobalTracer [candidate] (257.109 ms) : 0, 257109
AppSec [baseline] (31.598 ms) : 0, 31598
AppSec [candidate] (31.768 ms) : 0, 31768
Debugger [baseline] (60.43 ms) : 0, 60430
Debugger [candidate] (60.33 ms) : 0, 60330
Remote Config [baseline] (590.817 µs) : 0, 591
Remote Config [candidate] (590.862 µs) : 0, 591
Telemetry [baseline] (7.989 ms) : 0, 7989
Telemetry [candidate] (8.068 ms) : 0, 8068
Flare Poller [baseline] (3.56 ms) : 0, 3560
Flare Poller [candidate] (4.307 ms) : 0, 4307
section appsec
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.218 ms) : 0, 1218
BytebuddyAgent [baseline] (658.283 ms) : 0, 658283
BytebuddyAgent [candidate] (661.683 ms) : 0, 661683
AgentMeter [baseline] (12.105 ms) : 0, 12105
AgentMeter [candidate] (12.304 ms) : 0, 12304
GlobalTracer [baseline] (257.853 ms) : 0, 257853
GlobalTracer [candidate] (260.959 ms) : 0, 260959
IAST [baseline] (24.142 ms) : 0, 24142
IAST [candidate] (24.657 ms) : 0, 24657
AppSec [baseline] (177.599 ms) : 0, 177599
AppSec [candidate] (179.484 ms) : 0, 179484
Debugger [baseline] (65.93 ms) : 0, 65930
Debugger [candidate] (66.779 ms) : 0, 66779
Remote Config [baseline] (631.667 µs) : 0, 632
Remote Config [candidate] (624.83 µs) : 0, 625
Telemetry [baseline] (8.365 ms) : 0, 8365
Telemetry [candidate] (8.416 ms) : 0, 8416
Flare Poller [baseline] (3.623 ms) : 0, 3623
Flare Poller [candidate] (3.657 ms) : 0, 3657
section iast
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.214 ms) : 0, 1214
BytebuddyAgent [baseline] (796.847 ms) : 0, 796847
BytebuddyAgent [candidate] (800.204 ms) : 0, 800204
AgentMeter [baseline] (11.422 ms) : 0, 11422
AgentMeter [candidate] (11.606 ms) : 0, 11606
GlobalTracer [baseline] (247.635 ms) : 0, 247635
GlobalTracer [candidate] (247.98 ms) : 0, 247980
IAST [baseline] (25.431 ms) : 0, 25431
IAST [candidate] (25.433 ms) : 0, 25433
AppSec [baseline] (26.665 ms) : 0, 26665
AppSec [candidate] (26.683 ms) : 0, 26683
Debugger [baseline] (70.52 ms) : 0, 70520
Debugger [candidate] (68.561 ms) : 0, 68561
Remote Config [baseline] (538.034 µs) : 0, 538
Remote Config [candidate] (515.251 µs) : 0, 515
Telemetry [baseline] (9.825 ms) : 0, 9825
Telemetry [candidate] (11.276 ms) : 0, 11276
Flare Poller [baseline] (3.479 ms) : 0, 3479
Flare Poller [candidate] (3.949 ms) : 0, 3949
section profiling
crashtracking [baseline] (1.17 ms) : 0, 1170
crashtracking [candidate] (1.188 ms) : 0, 1188
BytebuddyAgent [baseline] (682.794 ms) : 0, 682794
BytebuddyAgent [candidate] (692.943 ms) : 0, 692943
AgentMeter [baseline] (8.986 ms) : 0, 8986
AgentMeter [candidate] (9.102 ms) : 0, 9102
GlobalTracer [baseline] (215.459 ms) : 0, 215459
GlobalTracer [candidate] (218.223 ms) : 0, 218223
AppSec [baseline] (32.086 ms) : 0, 32086
AppSec [candidate] (32.703 ms) : 0, 32703
Debugger [baseline] (64.47 ms) : 0, 64470
Debugger [candidate] (66.623 ms) : 0, 66623
Remote Config [baseline] (564.797 µs) : 0, 565
Remote Config [candidate] (586.107 µs) : 0, 586
Telemetry [baseline] (8.48 ms) : 0, 8480
Telemetry [candidate] (7.828 ms) : 0, 7828
Flare Poller [baseline] (4.21 ms) : 0, 4210
Flare Poller [candidate] (3.551 ms) : 0, 3551
ProfilingAgent [baseline] (93.724 ms) : 0, 93724
ProfilingAgent [candidate] (94.876 ms) : 0, 94876
Profiling [baseline] (94.285 ms) : 0, 94285
Profiling [candidate] (95.442 ms) : 0, 95442
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.061 s) : 0, 1061263
Total [baseline] (8.836 s) : 0, 8835661
Agent [candidate] (1.058 s) : 0, 1058319
Total [candidate] (8.838 s) : 0, 8837746
section iast
Agent [baseline] (1.222 s) : 0, 1222093
Total [baseline] (9.527 s) : 0, 9527343
Agent [candidate] (1.226 s) : 0, 1225838
Total [candidate] (9.539 s) : 0, 9539038
gantt
title insecure-bank - break down per module: candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.23 ms) : 0, 1230
crashtracking [candidate] (1.227 ms) : 0, 1227
BytebuddyAgent [baseline] (632.895 ms) : 0, 632895
BytebuddyAgent [candidate] (629.962 ms) : 0, 629962
AgentMeter [baseline] (29.574 ms) : 0, 29574
AgentMeter [candidate] (29.349 ms) : 0, 29349
GlobalTracer [baseline] (257.364 ms) : 0, 257364
GlobalTracer [candidate] (257.18 ms) : 0, 257180
AppSec [baseline] (31.632 ms) : 0, 31632
AppSec [candidate] (31.756 ms) : 0, 31756
Debugger [baseline] (59.611 ms) : 0, 59611
Debugger [candidate] (59.599 ms) : 0, 59599
Remote Config [baseline] (585.298 µs) : 0, 585
Remote Config [candidate] (592.191 µs) : 0, 592
Telemetry [baseline] (8.034 ms) : 0, 8034
Telemetry [candidate] (8.163 ms) : 0, 8163
Flare Poller [baseline] (4.249 ms) : 0, 4249
Flare Poller [candidate] (4.36 ms) : 0, 4360
section iast
crashtracking [baseline] (1.213 ms) : 0, 1213
crashtracking [candidate] (1.233 ms) : 0, 1233
BytebuddyAgent [baseline] (792.974 ms) : 0, 792974
BytebuddyAgent [candidate] (795.263 ms) : 0, 795263
AgentMeter [baseline] (11.383 ms) : 0, 11383
AgentMeter [candidate] (11.358 ms) : 0, 11358
GlobalTracer [baseline] (245.929 ms) : 0, 245929
GlobalTracer [candidate] (247.186 ms) : 0, 247186
IAST [baseline] (25.28 ms) : 0, 25280
IAST [candidate] (25.379 ms) : 0, 25379
AppSec [baseline] (26.429 ms) : 0, 26429
AppSec [candidate] (26.508 ms) : 0, 26508
Debugger [baseline] (67.166 ms) : 0, 67166
Debugger [candidate] (67.077 ms) : 0, 67077
Remote Config [baseline] (523.851 µs) : 0, 524
Remote Config [candidate] (529.501 µs) : 0, 530
Telemetry [baseline] (11.175 ms) : 0, 11175
Telemetry [candidate] (11.249 ms) : 0, 11249
Flare Poller [baseline] (3.994 ms) : 0, 3994
Flare Poller [candidate] (3.958 ms) : 0, 3958
LoadParameters
See matching parameters
SummaryFound 4 performance improvements and 1 performance regressions! Performance is the same for 16 metrics, 15 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (19.136 ms) : 18944, 19328
. : milestone, 19136,
appsec (18.913 ms) : 18721, 19105
. : milestone, 18913,
code_origins (17.642 ms) : 17468, 17815
. : milestone, 17642,
iast (17.807 ms) : 17630, 17983
. : milestone, 17807,
profiling (18.568 ms) : 18383, 18754
. : milestone, 18568,
tracing (17.73 ms) : 17554, 17905
. : milestone, 17730,
section candidate
no_agent (18.065 ms) : 17880, 18250
. : milestone, 18065,
appsec (19.922 ms) : 19715, 20129
. : milestone, 19922,
code_origins (17.657 ms) : 17483, 17831
. : milestone, 17657,
iast (18.068 ms) : 17888, 18248
. : milestone, 18068,
profiling (18.512 ms) : 18331, 18694
. : milestone, 18512,
tracing (17.532 ms) : 17356, 17708
. : milestone, 17532,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.182 ms) : 1170, 1194
. : milestone, 1182,
iast (3.121 ms) : 3080, 3162
. : milestone, 3121,
iast_FULL (6.096 ms) : 6033, 6159
. : milestone, 6096,
iast_GLOBAL (3.782 ms) : 3719, 3846
. : milestone, 3782,
profiling (2.321 ms) : 2297, 2345
. : milestone, 2321,
tracing (1.774 ms) : 1760, 1789
. : milestone, 1774,
section candidate
no_agent (1.17 ms) : 1159, 1181
. : milestone, 1170,
iast (3.207 ms) : 3164, 3249
. : milestone, 3207,
iast_FULL (5.867 ms) : 5808, 5927
. : milestone, 5867,
iast_GLOBAL (3.585 ms) : 3526, 3644
. : milestone, 3585,
profiling (1.981 ms) : 1964, 1999
. : milestone, 1981,
tracing (1.788 ms) : 1774, 1803
. : milestone, 1788,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (14.847 s) : 14847000, 14847000
. : milestone, 14847000,
appsec (14.814 s) : 14814000, 14814000
. : milestone, 14814000,
iast (18.905 s) : 18905000, 18905000
. : milestone, 18905000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
. : milestone, 17785000,
profiling (15.011 s) : 15011000, 15011000
. : milestone, 15011000,
tracing (14.98 s) : 14980000, 14980000
. : milestone, 14980000,
section candidate
no_agent (15.516 s) : 15516000, 15516000
. : milestone, 15516000,
appsec (14.521 s) : 14521000, 14521000
. : milestone, 14521000,
iast (17.835 s) : 17835000, 17835000
. : milestone, 17835000,
iast_GLOBAL (17.785 s) : 17785000, 17785000
. : milestone, 17785000,
profiling (15.387 s) : 15387000, 15387000
. : milestone, 15387000,
tracing (14.812 s) : 14812000, 14812000
. : milestone, 14812000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.61.0-SNAPSHOT~668e51355f, baseline=1.61.0-SNAPSHOT~c00f676bb9
dateFormat X
axisFormat %s
section baseline
no_agent (1.482 ms) : 1470, 1493
. : milestone, 1482,
appsec (3.79 ms) : 3570, 4009
. : milestone, 3790,
iast (2.261 ms) : 2192, 2330
. : milestone, 2261,
iast_GLOBAL (2.309 ms) : 2240, 2379
. : milestone, 2309,
profiling (2.115 ms) : 2059, 2172
. : milestone, 2115,
tracing (2.085 ms) : 2031, 2139
. : milestone, 2085,
section candidate
no_agent (1.479 ms) : 1468, 1491
. : milestone, 1479,
appsec (3.816 ms) : 3594, 4037
. : milestone, 3816,
iast (2.267 ms) : 2198, 2335
. : milestone, 2267,
iast_GLOBAL (2.312 ms) : 2242, 2381
. : milestone, 2312,
profiling (2.093 ms) : 2038, 2147
. : milestone, 2093,
tracing (2.08 ms) : 2027, 2134
. : milestone, 2080,
|
PerfectSlayer
left a comment
There was a problem hiding this comment.
Feedback from LP about generated instrumentation
|
|
||
| @Override | ||
| protected String[] instrumentationNames() { | ||
| return new String[] {"gson"}; |
There was a problem hiding this comment.
❔ question: Should there be an alias with the version?
|
|
||
| import datadog.trace.bootstrap.CallDepthThreadLocalMap; | ||
|
|
||
| public class GsonHelper { |
There was a problem hiding this comment.
❔ question: What's the benefits of such helper? There is only one type instrumented, why not use it for the CallDepthThreadLocalMap calls?
| import datadog.trace.agent.test.InstrumentationSpecification | ||
| import datadog.trace.bootstrap.instrumentation.api.Tags | ||
|
|
||
| class GsonTest extends InstrumentationSpecification { |
There was a problem hiding this comment.
#
🔨 issue: It's missing error exception handling at least
Summary
AI-generated instrumentation for Gson 1.6 using the apm-instrumentation-toolkit. This is a blind test evaluation - the original implementation was deleted before generation to ensure zero contamination.
🎯 Evaluation Context
📊 Generation Metrics
✅ Layer 1 Validation (Automated)
All checks passed:
💡 Key Innovations
📉 Known Regressions vs Original
📚 Comprehensive Analysis
See
eval-comparison/directory in apm-instrumentation-toolkit for detailed evaluation.🎓 Evaluation Outcome
Overall Score: Generated: 7.8/10 | Original: 7.5/10
Recommendation: Adopt with modifications - restore span metadata and add ClassLoader matcher.
🤖 Generated with apm-instrumentation-toolkit | Run #4 (Blind Test)