Best Practices for Prompting LLMs for Data Analysis

This guide outlines best practices for prompting LLMs effectively in your data analysis workflow.

Be Explicit Rather Than Implicit

❌

Analyze this sales data

✅

Identify the top 3 performing products by revenue in Q2 2024

When working with LLMs, explicit instructions produce more accurate and relevant results. Never assume the model understands your implicit goals or context. Spell out exactly what you need.

Specific vs. General Prompts

❌

Tell me something about this customer dataset

✅

Calculate the customer retention rate for each month in 2024, segmented by customer tier

Specific prompts guide the LLM toward precise analytical goals. General prompts often result in generic observations that may not address your actual analytical needs.

One Task at a Time vs. Long Lists

❌

Analyze customer churn, create a forecast model, identify key drivers, suggest retention strategies, and create visualizations all from this dataset.

✅

First, calculate the monthly customer churn rate from this dataset. After reviewing those results, we'll explore potential drivers.

Break complex analyses into sequential steps. This creates a clearer analytical path and allows you to review intermediate results before proceeding.

Include Analysis Parameters

❌

Find outliers in this dataset

✅

Identify outliers in the transaction amounts column using the IQR method, defining outliers as values beyond 1.5 * IQR from the quartiles

Specify your analytical parameters, methodologies, and thresholds to ensure the LLM applies the appropriate techniques.

Provide Context and Background

❌

Run a statistical test on these two groups

✅

Run an appropriate statistical test to determine if the difference in HbA1c reduction between groups is statistically significant. We're particularly concerned about Type II errors due to our relatively small sample size

Contextual information helps the LLM understand the business environment and deliver more relevant analyses.

Use Concrete Examples

❌

Standardize phone numbers in this dataset

✅

Standardize phone numbers in this dataset to the format (XXX) XXX-XXXX

Examples clarify your expectations and improve the precision of LLM responses.

Handling Missing Data

Explicitly indicate how missing values are represented (NA, NULL, empty string, etc.)
State your preferred handling strategy in prompts (imputation method, removal, etc.)
Consider asking the LLM to report on missing data patterns before proceeding with analysis

Agentic Analysis Techniques

Initial Analysis: Start with a basic analytical prompt
Review Output: Assess the quality and relevance of the results
Refine Prompt: Adjust your prompt based on the output
Repeat: Continue refining until you achieve satisfactory results

This iterative approach leverages the LLM as a collaborative analytical partner.

Use Keptune AI

Keptune AI enables agentic analysis workflows and provides several advantages:

Automated Feedback Loops: Keptune AI can automatically evaluate LLM outputs and refine prompts
Tool Integration: Keptune AI can connect LLMs with specialized analytical tools for statistical analysis, visualization, etc.
Memory and Context Management: Keptune AI can maintain analytical context across multiple queries
Hallucination Reduction: Keptune AI greatly reduces the risk of hallucination by using LLMs to write and execute code rather than relying on LLMs to interpret data directly.