# MASTER TEMPLATE: Validation Rules # Comprehensive validation for SQL queries and question decomposition ## QUESTION DECOMPOSITION VALIDATION ### 1. Question Type Classification ✅ VALID: Question correctly classified based on keywords ❌ INVALID: Wrong classification (e.g., earnings question classified as person search) ### 2. Table Usage Validation ✅ VALID: Correct table used for question type - EARNINGS questions → earning table - PERSON questions → member table ❌ INVALID: Wrong table usage - earnings question using member table - person search using earning table ### 3. Column Usage Validation ✅ VALID: Using existing columns - earning table: date, amount, name - member table: id, name, email ❌ INVALID: Using non-existent columns - agent_id, income_date, agent_name ### 4. SQL Hint Validation ✅ VALID: SQL hint matches question type and table ❌ INVALID: SQL hint uses wrong table/columns ## SQL GENERATION VALIDATION ### 1. Syntax Validation ✅ VALID: Proper SQL syntax - Correct SELECT, FROM, WHERE clauses - Proper GROUP BY, ORDER BY usage - Valid LIMIT clauses ❌ INVALID: Syntax errors - Missing FROM clause - Invalid WHERE conditions - Wrong GROUP BY syntax ### 2. Column Alias Validation ✅ VALID: Underscores, no spaces - total_earnings, last_date, agent_name ❌ INVALID: Spaces in aliases - "Total Earnings", "Last Date" ### 3. Date Function Validation ✅ VALID: Correct date functions - CONCAT(YEAR(date), '-', LPAD(MONTH(date), 2, '0')) - NOW() for current date - DATE_SUB(NOW(), INTERVAL X DAY/MONTH/YEAR) ❌ INVALID: Wrong date functions - DATE_FORMAT(date, '%Y-%m') - CURDATE() ### 4. Table/Column Existence Validation ✅ VALID: All tables and columns exist ❌ INVALID: References to non-existent tables/columns ## COMMON ERROR PATTERNS ### Pattern 1: Over-interpretation ❌ WRONG: "agent earned income" → WHERE name = 'agent' ✅ CORRECT: "agent earned income" → general earnings question ### Pattern 2: Wrong Table Usage ❌ WRONG: SELECT * FROM income WHERE agent_id = 'x' ✅ CORRECT: SELECT * FROM earning WHERE name = 'x' ### Pattern 3: Non-existent Columns ❌ WRONG: agent_id, income_date, agent_name ✅ CORRECT: name, date, amount ### Pattern 4: Spaces in Aliases ❌ WRONG: SELECT SUM(amount) as "Total Earnings" ✅ CORRECT: SELECT SUM(amount) as total_earnings ### Pattern 5: Wrong Date Functions ❌ WRONG: SELECT DATE_FORMAT(date, '%Y-%m') as month ✅ CORRECT: SELECT CONCAT(YEAR(date), '-', LPAD(MONTH(date), 2, '0')) as month ## VALIDATION CHECKLIST ### Question Decomposition: 1. ✅ Question type correctly identified 2. ✅ Correct table selected 3. ✅ SQL hint uses valid columns 4. ✅ No non-existent columns referenced 5. ✅ Question interpretation matches intent ### SQL Generation: 1. ✅ Syntax is valid 2. ✅ All tables/columns exist 3. ✅ Aliases use underscores 4. ✅ Date functions are correct 5. ✅ WHERE/GROUP BY/ORDER BY are proper ### Final Validation: 1. ✅ SQL executes without errors 2. ✅ Results match question intent 3. ✅ No data type mismatches 4. ✅ Proper aggregation usage 5. ✅ Correct sorting applied ## ERROR CORRECTION RULES ### 1. Column Alias Correction - Replace spaces with underscores - Convert to lowercase - Remove special characters ### 2. Date Function Correction - Replace DATE_FORMAT with CONCAT - Replace CURDATE() with NOW() - Ensure proper interval syntax ### 3. Table/Column Correction - Remove non-existent column references - Use correct table names - Validate against schema ### 4. Question Interpretation Correction - Distinguish between general and specific questions - Use correct table based on question type - Validate SQL hints match intent ## TESTING SCENARIOS ### Valid Scenarios: 1. "When was the last earning?" → earning table, MAX(date) 2. "Find agent John" → member table, name LIKE 3. "Earnings by month" → earning table, GROUP BY month 4. "Total earnings this month" → earning table, SUM(amount) ### Invalid Scenarios: 1. "agent earned income" → WHERE name = 'agent' (WRONG!) 2. Using 'agent_id' column (doesn't exist) 3. Using 'income' table (should be 'earning') 4. Spaces in column aliases ## IMPLEMENTATION GUIDELINES ### 1. Pre-validation Checks - Check question keywords for classification - Validate table selection - Verify column existence ### 2. During Generation Checks - Validate SQL syntax - Check alias formatting - Verify date function usage ### 3. Post-validation Checks - Test SQL execution - Verify result format - Check data consistency ### 4. Error Handling - Log validation failures - Provide specific error messages - Suggest corrections - Fallback to safe defaults