LLMs perform best with well-structured, tidy data that follows these principles:
date | city | temperature_c |
---|---|---|
2024-01-01 | New York | 3.2 |
2024-01-01 | San Diego | 15.8 |
2024-01-02 | New York | 2.4 |
2024-01-02 | San Diego | 16.1 |
date | New York | San Diego |
---|---|---|
2024-01-01 | 3.2 | 15.8 |
2024-01-02 | 2.4 | 16.1 |
Why it’s bad:
date | New York Temp | San Diego Temp | New York Humidity | San Diego Humidity |
---|---|---|---|---|
2024-01-01 | 3.2 | 15.8 | 65% | 50% |
2024-01-02 | 2.4 | 16.1 | 70% | 55% |
Why it’s bad:
city
, variable
).date | city | temperature_c | comment |
---|---|---|---|
2024-01-01 | New York | 3.2 | Cold morning |
2024-01-01 | New York | 3.2 | Windy |
2024-01-01 | San Diego | 15.8 | Warm and sunny |
Why it’s bad:
variable | New York | San Diego |
---|---|---|
2024-01-01 | 3.2 | 15.8 |
2024-01-02 | 2.4 | 16.1 |
Why it’s bad:
date | location_temp |
---|---|
2024-01-01 | New York:3.2 |
2024-01-01 | San Diego:15.8 |
2024-01-02 | New York:2.4 |
Why it’s bad:
location_temp
column combines two variables.