Conduct controlled experiments on LLM behavior
Conducts rigorous experiments to characterize LLM behaviors and biases.
Use cases: Bias studies, cognitive studies, logical performance analysis, consistency testing
| Parameter | Type | Description |
|---|---|---|
|
prompt_templates
|
List<String> | The base prompt templates to test (use {variable} for case-insensitive substitution) |
|
prompt_variables
|
Map<String, List<String>> | Variables to substitute in the prompt template with their possible values |
|
metrics
["response_length", "response_time"]
|
List<String> | Specific metrics to track (e.g., response_length, sentiment, contains_keywords) |
|
temperature_values
[0.1, 0.7]
|
List<Double> | List of temperature values to test |
|
repetitions
3
|
Int | Number of times to repeat each experimental condition |
|
statistical_analysis
true
|
Boolean | Whether to analyze statistical significance of results |
|
significance_level
0.05
|
Double | Significance level for statistical tests (e.g., 0.05 for 95% confidence) |
LLMExperimentTask