Hanchao Ma

I am a postdoctoral associate in Computer Science at Case Western Reserve University, advised by Yinghui Wu . My research focuses on next-generation AI data systems, with the goal of establishing a new systems paradigm that integrates knowledge graphs and large language models for end-to-end knowledge discovery, reasoning, and autonomous ML-assets and workflow management. My work spans graph-based knowledge discovery and scientific workflow platforms such as CRUX, with publications in leading venues including VLDB, SIGMOD, and ICDE, etc.

I was working as a research intern at Pacific Northwest National Laboratory and Microsoft Research Asia, and as an Applied Scientist Intern at Amazon, where I focused on graph data systems, knowledge discovery, and LLM-driven applications, including large-scale knowledge graph and recommendation systems.

Current Focus

Next-Generation AI Data Systems: A new paradigm unifying knowledge graphs and LLMs for end-to-end knowledge discovery and reasoning.

Graph-Based Knowledge Discovery: Interactive systems for unbiased, exploratory, and explainable graph exploration.

ML Asset & Workflow Ecosystems: Self-organizing systems for automated, reliable, and reproducible AI workflows.

Projects


Intelligent Graph Data Systems for Knowledge Exploration in AI Infrastructure

Interactive systems for fair, diverse, and explainable exploration of large-scale graph data, enabling unbiased knowledge discovery.

Graph

Related papers:

  • Ontology-Based Entity Matching in Attributed Graphs (VLDB 2019)
  • Diversified Subgraph Query Generation with Group Fairness (WSDM 2022)
  • Subgraph Query Generation with Fairness and Diversity Constraints (ICDE 2022)
  • Fair Group Summarization with Graph Patterns (ICDE 2023)
  • Explaining Missing Data in Graphs: A Constraint-Based Approach (ICDE 2021)
  • GRIP: Constraint-Based Explanation of Missing Answers for Graph Queries (SIGMOD demo)

Robust Graph Learning for Graph Data Quality

Learning-based systems for detecting errors and improving robustness in knowledge graphs under limited supervision.

Graph2

Related papers:

  • GEDet: Adversarially Learned Few-Shot Detection of Erroneous Nodes in Graphs (BigData 2020)
  • GALE: Active Adversarial Learning for Erroneous Node Detection in Graphs (ICDE 2023)
  • RoboGNN: Robust Node Classification under Link Perturbation (IJCAI 2022)

AI-Powered Scientific Data Assets & Workflow Management Systems

A unified platform for scientific data management, provenance tracking, and workflow exploration, enabling reproducible research and automated dataset/model discovery.

Graph2

Related papers:

  • CRUX: Crowdsourced Materials Science Resource and Workflow Exploration (CIKM 2022)
  • Selecting Top-k Data Science Models by Example Dataset (CIKM 2023)
  • ModsNet: Performance-Aware Top-k Model Search Using Exemplar Datasets (VLDB 2024)
  • Generating Skyline Datasets for Data Science Models (EDBT 2025)