The manifesto: Computer science is culturally appropriating statistics and how it must be stopped.

No the title of this post is not a joke.

Yes I do think that computer science is cultural appropriating statistics.

And yes I do think that is bad.

This post first defines cultural appropriation in the context of applied science disciplines then describes why it is undesirable and finally my suggestions as to how to stop the appropriation. The third part is still a work in progress as evidenced by the continued appropriation of  statistics.

Cultural appropriation is when one culture or group harvests elements of another generally in such a manner as the original purposes of the elements are warped or lost. It is also common for the appropriated cultural elements to be viewed as simulacra. While the concept of authenticity is dumb, it seems self evident (i.e., I can’t readily prove it) that shoddy or even accurate copies of a thing devalue the original. Cultural appropriation also seems to only occur when there is a power balance (real or persevered) between the originator and appropriator of the element. Normally it is the under-powered group that has its elements appropriated by the over-powered group (When the culture element is appropriated by the under-powered group it seems to be called cultural imperialism). Cultural appropriation is quite common in food, sport, music, or frankly everything [citation needed].

In the case of academic disciplines the concept of cultural appropriation is more tenuous. Largely, because there is no clear structure that gives one discipline power over another. In addition, the use of analogy to utilize solutions from one discipline to problems in another is healthy even if there are a few clear examples where this sort of process leads to madness. However in the specific case of computer science appropriating statistics there is a power divide: Money. In addition,  computer science has internally a culture of libraries, i.e., using black box systems that fulfill stated purposes. This results in statistical processes being used without clear understandings of them, in particular their limitations. I know this sounds condescending; “only I know how to do anything right”. However this is the simulacra part as many statistical algorithms will converge to answer even when the answer is garbage and common implementations of these algorithm provide no tools for diagnostics. And finally computer science is creating it own newspeak for statistics. The term machine learning is perfect example of this. Machine learning is parameter estimation and model selection. Why is a new term necessary? especially as I have seen articles that describe linear regression as machine learning (Linear regression was first used in 1894). Just as the 1994 reinvention of calculus this is unsettling.

Even given buy-in to the idea that computer science has and is appropriating statistics, there is still the natural question , “why is that bad?”

Well I am glad you asked!

As already noted automating statistical processes leads to superficial analysis. In addition the influx of computer science people also leads to stupid things like piping in r-studio (or frankly r-studio itself, the birds that ate the the worms that will have eaten my corpse will be long dead (because of climate change?) before I stop using the command line in r GUI). Hyperbole aside I think that great programmers are interested in programming they enjoy new syntax  mechanics etc., however in data analysis the point is the data not the tool. Simple linear regression or GLM regression is in many cases sufficient to solve problems that are hosted on to Neural Nets because Neural Nets are trendy. And most importantly it makes me feel sad!

In order to combat this appropriation first I think that the newspeak must be dismantled. To this end I suggest inversion  instead of machine learning. And statistical analyst instead of data scientist. I also propose to highlight the diagnostic procedures with inversions instead of just the algorithms (I am clear guilty of not doing this). Finally I  propose that the purpose of an inversion is highlighted above the tools used to accomplish that purpose.

Tune in next time on the same Bayesian time same Bayesian channel for more crazy rants!