Real-time collaborative analysis with (almost) pure SQL: a case study in biogeochemical oceanography
We consider a case study using SQL-as-a-Service to support "instant analysis" of weakly structured relational data at a multi-investigator science retreat. Here, "weakly structured" means tabular, rows-and-columns datasets that share some common context, but that have limited a priori agreement on file formats, relationships, types, schemas, metadata, or semantics. In this case study, the data were acquired from hundreds of distinct locations during a multi-day oceanographic cruise using a variety of physical, biological, and chemical sensors and assays. Months after the cruise when preliminary data processing was complete, 40+ researchers from a variety of disciplines participated in a two-day "data synthesis workshop." At this workshop, two computer scientists used a web-based query-as-a-service platform called SQLShare to perform "SQL stenography": capturing the scientific discussion in real time to integrate data, test hypotheses, and populate visualizations to then inform and enhance further discussion. In this "field test" of our technology and approach, we found that it was not only feasible to support interactive science Q&A with essentially pure SQL, but that we significantly increased the value of the "face time" at the meeting: researchers from different fields were able to validate assumptions and resolve ambiguity about each others' fields. As a result, new science emerged from a meeting that was originally just a planning meeting. In this paper, we describe the details of this experiment, discuss our major findings, and lay out a new research agenda for collaborative science database services.