Project 03
Text-to-SQL is usually framed as a model problem. ADAPT-SQL is an argument that it is mostly a pipeline engineering problem. Holding the pipeline fixed and swapping three architecturally different models (Gemma4-31B dense, Qwen3-Coder-30B MoE at roughly 3B active, Qwen3-235B MoE at roughly 22B active), they score within 0.8 points on 89% of Spider queries. The 93.7% execution accuracy comes from the engineering, not the model.
The pipeline has 11 steps. The highest-impact is three-layer schema linking: lexical string matching, then LLM semantic linking, then FK-graph connectivity inference. The graph traversal finds the join path between tables that are not directly mentioned in the question, where most schema linking systems quietly fail.
Complexity classification routes each query to one of three generation strategies. EASY queries use few-shot prompting (96.1%). NON_NESTED queries use NatSQL IR (93.8%), a grammar constraint that prevents invalid set operations at the generation level. NESTED_COMPLEX queries are decomposed into sub-problems (88.4%).
In-context example selection uses structural reranking over FAISS retrieval: semantic similarity weighted at 0.5, structural similarity at 0.3. Structural similarity (how similar the SQL skeletons are) is the more predictive signal for whether an example will help.
A 6-checker validation chain runs after every generated query. Failures feed the specific error back as a prompt for retry: not a generic re-roll, but the exact constraint violated. This recovers 13-19% of initially failing queries.
Architecture only diverges on nested-complex queries, where MoE routing opens a 4.4-point gap over a dense architecture with more active parameters. On standard queries, active parameter count predicts nothing.
What I Learned
Tech Stack
Links