Best Practices for Prompting LLMs for Data Analysis

Large Language Models (LLMs) are revolutionizing data analysis by making complex analytical tasks more accessible through natural language interfaces. This guide outlines best practices for prompting LLMs effectively in your data analysis workflow.

Be Explicit Rather Than Implicit

Analyze this sales data

Identify the top 3 performing products by revenue in Q2 2024

When working with LLMs, explicit instructions produce more accurate and relevant results. Never assume the model understands your implicit goals or context. Spell out exactly what you need.

Specific vs. General Prompts

Tell me something about this customer dataset

Calculate the customer retention rate for each month in 2024, segmented by customer tier

Specific prompts guide the LLM toward precise analytical goals. General prompts often result in generic observations that may not address your actual analytical needs.

One Task at a Time vs. Long Lists

Analyze customer churn, create a forecast model, identify key drivers, suggest retention strategies, and create visualizations all from this dataset.

First, calculate the monthly customer churn rate from this dataset. After reviewing those results, we'll explore potential drivers.

Break complex analyses into sequential steps. This creates a clearer analytical path and allows you to review intermediate results before proceeding.

Include Analysis Parameters

Find outliers in this dataset

Identify outliers in the transaction amounts column using the IQR method, defining outliers as values beyond 1.5 * IQR from the quartiles

Specify your analytical parameters, methodologies, and thresholds to ensure the LLM applies the appropriate techniques.

Provide Context and Background

Run a statistical test on these two groups

Run an appropriate statistical test to determine if the difference in HbA1c reduction between groups is statistically significant. We're particularly concerned about Type II errors due to our relatively small sample size

Contextual information helps the LLM understand the business environment and deliver more relevant analyses.

Use Concrete Examples

Standardize phone numbers in this dataset

Standardize phone numbers in this dataset to the format (XXX) XXX-XXXX

Examples clarify your expectations and improve the precision of LLM responses.

Handling Missing Data

  • Explicitly indicate how missing values are represented (NA, NULL, empty string, etc.)
  • State your preferred handling strategy in prompts (imputation method, removal, etc.)
  • Consider asking the LLM to report on missing data patterns before proceeding with analysis

Agentic Analysis Techniques

Iterative Refinement Loop

  1. Initial Analysis: Start with a basic analytical prompt
  2. Review Output: Assess the quality and relevance of the results
  3. Refine Prompt: Adjust your prompt based on the output
  4. Repeat: Continue refining until you achieve satisfactory results

This iterative approach leverages the LLM as a collaborative analytical partner.

Use Keptune AI

Keptune AI enables agentic analysis workflows and provides several advantages:

  • Automated Feedback Loops: Keptune AI can automatically evaluate LLM outputs and refine prompts
  • Tool Integration: Keptune AI can connect LLMs with specialized analytical tools for statistical analysis, visualization, etc.
  • Memory and Context Management: Keptune AI can maintain analytical context across multiple queries
  • Hallucination Reduction: Keptune AI greatly reduces the risk of hallucination by using LLMs to write and execute code rather than relying on LLMs to interpret data directly.