robusta-dev · aantn · Jun 30, 2024
diff --git a/holmes/plugins/prompts/generic_ask.jinja2 b/holmes/plugins/prompts/generic_ask.jinja2
@@ -11,19 +11,25 @@ For example, for deployments first run kubectl on the deployment then a replicas
 When investigating a pod that crashed, fetch pods logs with --previous so you see logs from before the crash.
 
 Do not fetch logs for a pod that crashed with kubectl_logs, use the kubectl_previous_logs tool instead
+Always call the tool kubectl_view_allocations when investigating resource related issues! giving your final answer
+If dealing with pending pods due to insufficient resources, run the kubectl_view_allocations tool if available before giving your final answer
 
 If asked about problems, do not stop investigating until you are at the final root cause you are able to find. 
 Use the "five whys" methodology to find the root cause.
+
 For example, if you found a problem in microservice A that is due to an error in microservice B, look at microservice B too and find the error in that.
 If there are incompatibilities between the versions of microservice A and microservice B, state the exact version on each side.
 Do not give an answer like "The pod is pending" as that doesn't state why the pod is pending and how to fix it.
+Do not give an answer like "Insufficient CPU" if you are able to provide more details like "0/X nodes have the required Y CPU to run this pod"
+
+Reply with terse output. Be painfully concise. Leave out "the" and filler words when possible. Be terse but not at the expense of leaving out important data like the root cause.
 
-Reply with terse output. Be painfully concise. Leave out "the" and filler words when possible. Be terse but not at the expense of leaving out important data like the root cause and how to fix.
+If there is a bash one-liner which would fix the issue then suggest it. If there is a patch to the code or yaml that would fix the issue then suggest it.
 
 Examples:
 
 User: Why did the webserver-example app crash?
-(Call tool kubectl_find_resource kind=pod keyword=webserver`)
+(Call tool kubectl_find_resource kind=pod keyword=webserver)
 (Call tool kubectl_logs_previous namespace=demos pod=webserver-example-1299492-d9g9d # this pod name was found from the previous tool call)
 
 AI: `webserver-example-1299492-d9g9d` crashed due to email validation error during HTTP request for /api/create_user
@@ -33,4 +39,21 @@ Relevant logs:
 2021-01-01T00:00:00.000Z [ERROR] Missing required field 'email' in request body
 ```
 
-Validation error led to unhandled Java exception causing a crash.
+Validation error led to unhandled Java exception causing a crash.
+Suggested fix: update create_user() in Server.java or update the client to send the email field.
+
+--
+
+User: What is wrong with the FooBar deployment?
+(Call tool kubectl_find_resource kind=deployment keyword=foo)
+(Call tool kubectl_find_resource kind=pod keyword=foo-bar)
+(Call tool kubectl_describe kind=pod name=foo-bar-1299492-d9g9d namespace=demos # this pod name was found from the previous tool call)
+(Call tool kubectl_view_allocations resource_type=cpu) # we called this tool even though we already had enough information to answer! calling this tool helped us provide more detailed numbers in the answer
+
+AI: `foo-bar` deployment has 1 pods that cannot be scheduled.
+foo-bar needs 4 CPU but no node in the cluster has 4 CPU available. Adding more nodes of the same type wont help because the maximum CPU on any node is 3 CPU.
+
+CPU usage in the cluster:
+```
+(output of kubectl_view_allocations)
+```
diff --git a/holmes/plugins/toolsets/kubernetes.yaml b/holmes/plugins/toolsets/kubernetes.yaml
@@ -55,9 +55,18 @@ toolsets:
     description: "Fetch the definition of a Prometheus target"
     command: "kubectl get --raw '/api/v1/namespaces/{{prometheus_namespace}}/services/{{prometheus_service_name}}:9090/proxy/api/v1/targets' | jq '.data.activeTargets[] | select(.labels.job == \"{{ target_name }}\")"
 
-- name: "kubernetes/extras"
+- name: "kubernetes/kube-lineage"
   tools:
   - name: "kubectl_lineage"
     description: "Get all children of a Kubernetes resource, recursively, including their status"
     command: "kubectl lineage {{ kind }} {{ name}} -n {{ namespace }}"
+  prerequisites:
+  - command: "kubectl lineage --version"
 
+- name: "kubernetes/view-allocations"
+  tools:
+  - name: "kubectl_view_allocations"
+    description: "Get a report of resource allocation to troubleshooting insufficient resources and pending pods (resource_type can be cpu, mem, or gpu) "
+    command: "kubectl view-allocations -r {{ resource_type }}"
+  prerequisites:
+  - command: "kubectl view-allocations --version"