Paul Holtzman (Website)
Consultant, Chicago, USA
Protecting Data from Users
Through the years, I have worked with an assortment of companies essentially developing software for platforms. These platforms serve as the base for a process from which the company’s clients or customers could address research questions... a process including the steps of creating and deploying surveys and then analyzing and reporting on data captured by those surveys, as appropriate to the research questions. Much of my work has been oriented towards achieving the goal of automating this process as much as possible, to assist users of the platform. Back in the old days, the process culminated in a report, provided with a good bit of researcher intervention. Today, in the age of democratization of software, the trend is towards users executing steps along the process by themselves, with little to no intervention.
A point of great concern, amplified by democratization, has always been generating, and then handing off, results to users who may or may not fully appreciate what they were receiving. Some of the concern could be addressed by direct contact with users, for example via training. But this has been replaced to some extent by reliance on annotation and result-triggered comments (with the occasion link to a “white paper” or blog note) added to reports automatically generated on the platform. (Just to be sure, I am a strong supporter of annotation, triggering and offering links.) These can be helpful but only if the user reads and understands them… often doubtful. A more subtle way of addressing this concern, for me, has been building protections into the process, offered at least as options. This is the focus of my monologue. And just to be sure, by “protections”, I mean “protecting data from the user”.
Data are meant to be innocent (themselves having no agenda) pieces of information, evidence, then reduced via analysis to some insight or result upon which a decision can be made regarding “next steps”. At issue then is how can data be captured, analyzed and reported protecting this information, ensuring at least a reasonably clear direction for the required decision, reported with an equally clear sense of risk assessment.
I’ll focus on a small set of guidelines: (1) simplicity, (2) transparency, (3) robust and unbreakable analyses all providing results that are (4) transportable beyond the specific data set from which they were extracted. These guidelines are then applied to three parts of the process mentioned above: (1) survey construction, specifically measurements, and sampling plan (a good bit of pressure to needs to be applied here to capture those innocent data), (2) analysis, (3) reporting results.
Sugnet Lubbe (Website)
MuViSU (Centre for Multi-Dimensional Data Visualisation), Department of Statistics and Actuarial Science, Stellenbosch University, South Africa
Visualising and Exploring Multi-Dimensional Sensory Data
The biplot, as introduced by Gabriel in 1971, is a very useful tool for visually exploring multi-dimensional data sets. Here we will take a different point of view, considering the biplot as a multi-dimensional scatterplot. Different data set from the literature and sensory analyses of wine will be used to illustrate the use and characteristics of biplots. Attention will be given to incorporating ordinal categorical data, mixtures of numerical and categorical data, multi-way data as well as grouped data set.
Andrea Ahlemeyer-Stubbe (Website)
Director Strategic Analytics, servicepro GmbH, and owner of Data Mining & More, Gengenbach, Germany
AI - wonder weapon or hype
In the field of mass data processing, Industry 4.0, text and image processing and in marketing, especially in dealing with customer behaviour and customer needs, AI is a key to success. The advantages of automation, speed and a high degree of individualisation contribute to this. A special aspect is the creation of artificially generated but seemingly real statements/images. The application possibilities are manifold and will generate new insights now and in the future, especially through simulations.
But unfortunately, AI is often only used as a prestige-promoting buzzword and the actual processes behind it are neither new nor particularly spectacular. The keynote offers possibilities for classification and demarcation, which arise from the area of tension between the expert-based, classical, experimental design and statistical methods and the approach of AI. Particularly in areas of application that in the past often worked with very small data derived from controlled surveys and used the large spectrum of static methods, it is necessary to weigh up how and whether AI can currently make an additional contribution to success.