Wednesday, January 15, 2014

Data Use and Ethics

I've been working on this piece for a few months now trying to refine my points to create something coherent and concise.  It's hard to say how successful I've been at either.  I think the final impetus for finishing it was certain topics in this post about a session at the Joint Statistical Meetings, this post about data scientists vs statisticians, and a discussion on MPR about the recent data hack at Target. 

As an individual just on the periphery of graduate level statistics, and someone from a profession where ethics are prominent and a part of almost every discussion, my concerns are somewhere between philosophical and questioning.  As a professional consumer of data analysis and statistics, and an occasional creator, I am concerned and confused about some of the seemingly inadequate elements of both statistics and data use culture.  I think some of my concerns are complicated by the transdiscipline nature of statistics and data analysis; so many people using them in so many different fields of inquiry.  I also recognize that my concerns could potentially be lobbied against any number of professions and schools of thought.  I would add, however, that ethical and social implications of any practice should be examined widely and openly for any profession, field of inquiry, school of thought, or general hobby.  Question and lobby away as far as I'm concerned. 

First, as far as "statisticians" and "data scientists" go, as an outsider, it's like Spy vs. Spy.  I would contend, though, that statisticians who work as educators have a significant hand in producing budding "data scientists" anytime they are working with undergraduate students.  Most of the students are not going to be statisticians, and they many only need the bare minimum in statistics to move on in their field.  They learn a few cool ways to make visualizations in R, or less cool visualizations in SPSS, and they go on their way.  Collecting data is unbelievably easy.  Also, finding data to play around with is equally easy.  Understanding the deeper concepts and responsibilities related to analysis, inference, and interpretation, that's something else.  I do understand that part of educating the masses is through undergraduate education, however, higher education is a privileged experience.
I have a few questions and thoughts surrounding the professional culture of statistics and data use.  I think my general questions/concerns are: 
  • Is there a social and/or ethical obligation to provide education on a given subject to the general population by individuals trained in such subjects?  For example, what level of action and advocacy for accuracy, open discussion, and education falls on learned individuals from specific disciplines regarding society at large?
  • If information is created to be consumed, should the consumers be educated?
  • Also, where does the education regarding the ethical use of statistics and data begin? 

How and Why?

I struggle with the notion that any field of inquiry is not responsible for a broad dissemination and defense of the information they produce resulting in education of the masses. Not only is it good for society, it protects the discipline from misinformation about what it is they do everyday and why they are important to society. In addition to this, society provides the data that data scientists use in their work. Educating people about the role they play in another individual's livelihood seems reasonable. Statisticians and data analysts use public (or private) data to make all sorts of inferences for all sorts of purposes, and they profit from it (sometimes). Why is educating the subjects of their analyses any different than educating a corporate client about the analyses performed and conclusions drawn?

Additionally, the argument that society at large should get no further consideration because everyone benefits from changes or opportunities that result from decisions based on statistical analyses is weak. It may be true to a degree, but it lacks sufficient nuance to capture the whole situation. How can we be educated consumers if we have no foundation to think critically or to appropriately critique information provided to us? Or to critique the methods by which our personal data is collected? (Ahem, Target shoppers and Gmail users)? Is it possible to make an informed decision about something without having a basic understanding of what that something is? If our educators have gaps in their knowledge, how large is the gap they leave in the knowledge of students they teach

In the past, the general population was guided by the intuition of individuals either naturally gifted or trained (sometimes both) to use that intuition for inquiry. Currently, as a society we are pushing the concept of omnipresent data collection, analysis, and interpretation. There continues to be some reliance on the "expert" knowledge of statisticians and trained analysts, but it seems that society is moving toward a more self-directed, self-informed, corporation supported conceptualization of data collection and use. While there are benefits to this, there are equally valid and concerning problems as well. How do we ensure data collection, use, and dissemination are done ethically and in an informed manner? Where does it start? I do not think that statisticians and data analysts are the general cause of data misuse and unethical behavior. I do, however, believe that they are uniquely positioned to mitigate the damage and harm done by the real trouble sources.

It seems likely that most formalized education in applied statistics and data work is grounded in being aware of the assumptions being made regarding the analyses and being prepared to support your position with sound theoretical or substantive reasoning. That's certainly been the bulk of my training in statistics, which by general consensus has been excellent at the graduate level. But what about the steps before and after the analysis? Ethics should be an element of all parts of an inquiry or process. Data is absolutely tied to a context, and when that context is not attended to, the possibility of erroneous conclusions and eventual harm becomes more likely. 

I should state that my idea of ethics in conjunction with data use and statistics includes ethical considerations given to: whether or not data should even be collected, why and how the data are collected and stored, why and how the data are analyzed, why and how the data are presented in this way or that, why and how the "findings" and "results" are used, and perhaps the most seemingly overlooked consideration, the level of education and advocacy surrounding the appropriate and responsible use of data. 

I recognize that often times statisticians are brought in after data has been collected, and they have no say in the methods used. However, with support and persistence, a culture of responsible data, especially big data, could be fostered. And who better to cultivate it then the people who have to work with the data?  I cannot count the number of times I've heard comments on the quality of data people obtain and the grumbling that comes with having to get it in proper order for analysis.

Who and When?

The America Statistical Association has published ethical guidelines that outline a number of expectations for individuals working in the field of statistics. Some of the statements regarding ethical obligations and professional citizenship include:
  • Support for improved public understanding of and respect for statistics.
  • Exposure of dishonest or incompetent uses of statistics.
  • "...all practitioners of statistics, whatever their training and occupation, have social obligations to perform their work in a professional, competent, and ethical manner."
  • "Before participating in a study involving human beings or organizations, analyzing data from such a study, or accepting resulting manuscripts for review, consider whether appropriate research subject approvals were obtained...Consider also what assurances of privacy and confidentiality were given and abide by those assurances."
The American Statistical Association states that students should be encouraged to follow their ethical guidelines. In practice, are they? How are statisticians and the potential data scientists they create supposed to act in accordance with these guidelines in an ethical manner if they are not aware of them? Some time ago I asked a friend, a statistics educator, how students were exposed to ethical reasoning and practice in his program. The response was that it is the student's responsibility to seek that information out. It was acknowledged that there are discussions around appropriate research design and how you treat participants in research etc. but that was essentially it. He also implied, by referencing the implementation of ethics courses at Harvard and the lack of apparent change in ethical operating in the business world, that education around ethics and ethical reasoning has little impact. I wondered how that might change if all business programs included an overt culture of ethical reasoning and action as well as ethics courses? It seems that all fields of scientific inquiry suffer from the deluded notion that their own impatience with the rate of measurable or otherwise demonstrable change equates to no change. The drive to find "significant" results at all costs has derailed many academic careers, and negatively impacted society at large (Who wants a vaccination?). It also perfectly exemplifies the need for ethics training.

Ethical guidelines are created to protect other people from you and your profession or discipline. They are also created as a form of recourse for other professionals to be gatekeepers in their profession or discipline. If those are not reason enough for ethical guidelines and training, the fact that established guidelines can provide recourse for consumers who experience unethical behavior by giving a framework for what to expect should be considered. The establishment of, and education around, ethics will not eliminate unethical behavior; and I suppose it could be argued that it does not necessarily reduce it either. However, it does allow others (professionals and consumers) to be informed and mindful. 

The requirement of obligatory ethics courses and an overt culture of ethics has certainly had an impact in the field of Counseling Psychology. For example, in my current program from the first day of classes we are informed we are bound by the ethical principles of the American Psychological Association. Our syllabi include this statement, and it is highlighted in our program handbooks at both the Master's and PhD level. We are also informed that failure to comply with these guidelines is grounds for dismissal from the program (and later from the profession). Counselors in training are required to take courses in professional ethics, and then continuing education once we are licensed. Does this prevent all unethical behavior? Absolutely not. Some of it? Maybe. Perhaps the ultimate point of ethical guidelines is not prevention, but the creation of informed consumers and colleagues...

A Snapshot?

For the sake of an example in comparison to Counseling Psychology, I'll use the the resident Quantitative Methods program in my department. (This is a small sample to be sure, and it is hopefully not representative of all Quant. programs around the world.) However, going local, according to the information provided openly to prospective students and the information on their website, it appears that there are no such parallel indications or requirements for ethics. While the general program handbook makes one mention of professional ethics associated with research methodology, there are no courses on ethics, and the single comment on professional ethics is only stated for students at the PhD level. This is concerning given that MA level individuals are just as likely to have contact with the general public as PhD level data scientists, if not more. Neither the Master's handbook, nor the Doctoral handbook makes a single statement involving the word "ethics" or the ethical code to which students are bound during their course of study. The handbooks also fail to indicate where students should look for professional guidance on ethics. None of the course descriptions provided on the website include the word "ethics" or indicate how ethical reasoning and principles are applied to the work of individuals working in quantitative fields.

Most individuals are born with the ability to learn language yet we go through years of formal education to learn to communicate effectively.  Athletes are typically born with the ability to run, swim, jump, breath, kick, throw etc.  However, they also spend years in training to enhance their performance.  Statisticians and data scientists are likley naturally skilled with numbers and good internal visualizations of data; why have advanced degrees if it is something they can learn to cultivate on their own?  "Critical thinking" skills are often a target skill for universities to "improve".  If we can improve critical thinking performance through education, why not ethical reasoning?   

This whole topic is such a bear: