{"id":154972,"date":"2024-04-23T15:47:14","date_gmt":"2024-04-23T19:47:14","guid":{"rendered":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/?p=154972"},"modified":"2025-05-02T14:53:17","modified_gmt":"2025-05-02T18:53:17","slug":"causal-dataset-discovery-with-large-language-models","status":"publish","type":"post","link":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/","title":{"rendered":"Causal Dataset Discovery with Large Language Models"},"content":{"rendered":"\r\n<h2 class=\"wp-block-heading\">\u00a0<\/h2>\r\n<h2 class=\"wp-block-heading\">Author<\/h2>\r\n\r\n\r\n\r\n<p>Junfei Liu<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Mentors<\/h2>\r\n\r\n\r\n\r\n<p>Fatemeh Nargesian and Anson Kahng<\/p>\r\n\r\n\r\n\r\n<h2 class=\"wp-block-heading\">Abstract<\/h2>\r\n<p>Causal discovery, crucial in scientific research by uncovering causal links among a variety of observed variables, faces challenges in inferring inter-relation causality from large-scale repositories. Identifying causal relationships in batches is a complex and time-intensive task, especially when it involves analyzing columns across multiple tables within diverse datasets like data lakes where the complexity is significantly amplified. In this paper, we introduce the causal data lake discovery problem and propose a large language model(LLM)-based framework to discover potential pairwise causal links between columns from different tables. We heuristically improve LLM\u2019s grasp of causality through prompting and fine-tuning and prevent the extreme imbalance in causal candidate distributions due to natural sparsity of causal connections. We create benchmarks specific to this task, experimentally show that our framework achieves remarkable performance, and provide extensions of this problem for future research.<\/p>\r\n<p><a href=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2024\/04\/Junfei-Liu-POSTER.pdf\">Causal Dataset Discovery with Large Language Models<\/a><\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>In this paper, we introduce the causal data lake discovery problem and propose a large language model(LLM)-based framework to discover potential pairwise causal links between columns from different tables.<\/p>\n","protected":false},"author":6242,"featured_media":160382,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_coblocks_attr":"","_coblocks_dimensions":"","_coblocks_responsive_height":"","_coblocks_accordion_ie_support":"","_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4442,96],"tags":[],"coauthors":[8612],"class_list":["post-154972","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-archive","category-csc-archive"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Causal Dataset Discovery with Large Language Models - Senior Design Day<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Causal Dataset Discovery with Large Language Models - Senior Design Day\" \/>\n<meta property=\"og:description\" content=\"In this paper, we introduce the causal data lake discovery problem and propose a large language model(LLM)-based framework to discover potential pairwise causal links between columns from different tables.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/\" \/>\n<meta property=\"og:site_name\" content=\"Senior Design Day\" \/>\n<meta property=\"article:published_time\" content=\"2024-04-23T19:47:14+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-05-02T18:53:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2024\/04\/Screen-Shot-2024-04-23-at-3.20.45-PM.png\" \/>\n\t<meta property=\"og:image:width\" content=\"368\" \/>\n\t<meta property=\"og:image:height\" content=\"396\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/\"},\"author\":{\"name\":\"admin\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#\\\/schema\\\/person\\\/351018fbcf84ed8cac6d8072ba5b347c\"},\"headline\":\"Causal Dataset Discovery with Large Language Models\",\"datePublished\":\"2024-04-23T19:47:14+00:00\",\"dateModified\":\"2025-05-02T18:53:17+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/\"},\"wordCount\":164,\"image\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/Screen-Shot-2024-04-23-at-3.20.45-PM.png\",\"articleSection\":[\"3. Programs Archive\",\"CSC Archive\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/\",\"name\":\"Causal Dataset Discovery with Large Language Models - Senior Design Day\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/Screen-Shot-2024-04-23-at-3.20.45-PM.png\",\"datePublished\":\"2024-04-23T19:47:14+00:00\",\"dateModified\":\"2025-05-02T18:53:17+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#\\\/schema\\\/person\\\/351018fbcf84ed8cac6d8072ba5b347c\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/Screen-Shot-2024-04-23-at-3.20.45-PM.png\",\"contentUrl\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/wp-content\\\/uploads\\\/2024\\\/04\\\/Screen-Shot-2024-04-23-at-3.20.45-PM.png\",\"width\":368,\"height\":396},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/causal-dataset-discovery-with-large-language-models\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Causal Dataset Discovery with Large Language Models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#website\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/\",\"name\":\"Senior Design Day\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/#\\\/schema\\\/person\\\/351018fbcf84ed8cac6d8072ba5b347c\",\"name\":\"admin\",\"url\":\"https:\\\/\\\/www.hajim.rochester.edu\\\/senior-design-day\\\/author\\\/seniordesign\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Causal Dataset Discovery with Large Language Models - Senior Design Day","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/","og_locale":"en_US","og_type":"article","og_title":"Causal Dataset Discovery with Large Language Models - Senior Design Day","og_description":"In this paper, we introduce the causal data lake discovery problem and propose a large language model(LLM)-based framework to discover potential pairwise causal links between columns from different tables.","og_url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/","og_site_name":"Senior Design Day","article_published_time":"2024-04-23T19:47:14+00:00","article_modified_time":"2025-05-02T18:53:17+00:00","og_image":[{"url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2024\/04\/Screen-Shot-2024-04-23-at-3.20.45-PM.png","width":368,"height":396,"type":"image\/png"}],"author":"admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"admin","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/#article","isPartOf":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/"},"author":{"name":"admin","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#\/schema\/person\/351018fbcf84ed8cac6d8072ba5b347c"},"headline":"Causal Dataset Discovery with Large Language Models","datePublished":"2024-04-23T19:47:14+00:00","dateModified":"2025-05-02T18:53:17+00:00","mainEntityOfPage":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/"},"wordCount":164,"image":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2024\/04\/Screen-Shot-2024-04-23-at-3.20.45-PM.png","articleSection":["3. Programs Archive","CSC Archive"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/","name":"Causal Dataset Discovery with Large Language Models - Senior Design Day","isPartOf":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/#primaryimage"},"image":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/#primaryimage"},"thumbnailUrl":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2024\/04\/Screen-Shot-2024-04-23-at-3.20.45-PM.png","datePublished":"2024-04-23T19:47:14+00:00","dateModified":"2025-05-02T18:53:17+00:00","author":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#\/schema\/person\/351018fbcf84ed8cac6d8072ba5b347c"},"breadcrumb":{"@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/#primaryimage","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2024\/04\/Screen-Shot-2024-04-23-at-3.20.45-PM.png","contentUrl":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-content\/uploads\/2024\/04\/Screen-Shot-2024-04-23-at-3.20.45-PM.png","width":368,"height":396},{"@type":"BreadcrumbList","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/causal-dataset-discovery-with-large-language-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/"},{"@type":"ListItem","position":2,"name":"Causal Dataset Discovery with Large Language Models"}]},{"@type":"WebSite","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#website","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/","name":"Senior Design Day","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/#\/schema\/person\/351018fbcf84ed8cac6d8072ba5b347c","name":"admin","url":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/author\/seniordesign\/"}]}},"_links":{"self":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts\/154972","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/users\/6242"}],"replies":[{"embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/comments?post=154972"}],"version-history":[{"count":5,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts\/154972\/revisions"}],"predecessor-version":[{"id":160302,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/posts\/154972\/revisions\/160302"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/media\/160382"}],"wp:attachment":[{"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/media?parent=154972"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/categories?post=154972"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/tags?post=154972"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/www.hajim.rochester.edu\/senior-design-day\/wp-json\/wp\/v2\/coauthors?post=154972"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}