Machine learning is transforming software development. Can academia keep pace?
On paper, new disciplines in computer science and electrical engineering such as deep learning, facial recognition, and advanced graphics processing, look easy to exploit for universities wishing to update their STEM curricula. After all, the business press is awash with gushing propaganda on vertical applications for neural networks and pattern-recognizers exploiting big data sets. A broad-based academic institution could pick domains in which they excel, such as medicine or industrial automation, and apply emerging chip and subsystem architectures to the writing of dedicated applications for those vertical domains.
Easy, right? The model has worked before in communications and embedded processing.
But academia will hit fundamental speed bumps in coming years that could stymie efforts to develop effective criteria. The most obvious problem is in simple expansion of high-level software languages in vertical domains. We’ve already seen the colloquial wonderland of “coding,” which has been the magical mantra for institutions wishing to excel in software, as well as for parents of students wanting their offspring to get a good job as a code jockey. However, as AI penetrates further, all that touted training in coding will become no longer that important.
This trend to generative modules of software does not mean the death of traditional programming languages, but rather the relegation of such code to an embedded-like status. In fact, the danger in relying on high-level modules and non-programmed training platforms is that the collective knowledge of writing in compiled or object-oriented languages could fade, just as the knowledge of BASIC or FORTRAN has disappeared.
Hence, educators must strike a balance between preserving an institutional knowledge of high-level programming, while disabusing students and families of the idea that a lucrative career can be built around C++ or Perl expertise.
We don’t know what we know and don’t know
A far bigger problem is only beginning to dawn on engineering departments. Academic institutions, like non-profit corporations and government agencies, face growing demands for compliance testing, outcome-based quantification of results and the ability to replicate course work for applications in new domains.
Yet, leading AI researchers cannot explain how their systems reach optimal results to difficult problems.
In some early use cases like autonomous vehicles, the U.S. Department of Transportation has encountered the simplest version of the problem with “black box” data inputs. If the data sets do not correspond to the real world in any one dimension, or along multiple vectors, a system like a self-driving car could fail catastrophically. This challenge of “adequacy and completeness of a large multivariate data set” can be a deal-breaker for deep learning in several vertical domains, yet it is only the simplest version of a far bigger problem stemming from the very nature of neural networks.
The type of neural network that has been most successful in untrained learning uses multiple, hidden layers of convolutional connections among many simulated neurons. AI researchers can explain in general terms the type of back-propagation and genetic algorithms used within the hidden layers, but they can’t explain in detail how the neural network reaches its conclusions, and how the network adjusts its synaptic weights to improve its answers.
In short, the best engineering and mathematical minds in the industry have no idea why neural networks work so well, and there is little prospect of human AI experts understanding this in the future, let alone explaining the results to a lay audience.
Computer historian George Dyson says this concept is so fundamental to deep learning, to wit: “Any system simple enough to be understandable will not be complicated enough to behave intelligently, while any system complicated enough to behave intelligently will be too complicated to understand.”
Many of his colleagues in the AI community have informally dubbed this “Dyson’s Law,” and some are worried that the search for “explainable AI” may itself be an example of the type of math problem called non-deterministic polynomial-time complete, or NP-complete. (The problem akin to the traveling-salesman map, where the difficulty of identifying optimal solutions across several variables rises exponentially with additional data points, and a problem quickly becomes unsolvable in what is known as polynomial time.)
If an EE professor with a deep background in math cannot understand how a deep-learning platform works, how will such a gap in knowledge be greeted by department heads controlling purse strings?
An algorithm researcher with a background in large data sets was working on a contract basis with a Midwestern university attempting to apply broad pattern-recognition to the retrieval of medical records. (The researcher asked not to be identified for fear of retribution.) She blames the nature of the industry as much as the over-ambition of the university administrators.
“Are department heads trying to jump on a new trend with only incomplete knowledge of how it might be applied to their discipline? Of course. They always do,” she said. “But deep-learning AI is unique because the answers that will be demanded by a university’s audit staff don’t exist anywhere. The best minds around say, ‘This seems to work, but we don’t really know how the hell it does,’ and that just isn’t an answer that’s going to be accepted by many administrators.”
A startup founded by University of Waterloo researchers hopes to simultaneously make individual neural layers more transparent and easier to explain, while also giving academia another metric to ease the development of vertical applications in deep learning. DarwinAI, founded by Alexander Wong, uses a method called “generative synthesis” to optimize synaptic weights and connections for small, reusable “modules” of neural nets.
DarwinAI touts its toolkit as not only a faster way to develop vertical applications, but also as a way to understand how a vertical deep-learning app works — and to make those apps more practical to develop for small companies and academic institutions. Intel Corp. has validated the generative synthesis method, and is using it in collaboration with Audi on autonomous vehicle architectures. The early involvement of Intel and Audi, however, shows that academia must jump quickly if it wishes to exploit toolkits that seek to level the playing field.
Who controls the research?
The lack of accountability does not stymie all academic research, as attested by dozens of new research grants sponsored by National Science Foundation along with private institutions. Still, grant winners worry they are gleaning mere scraps in a discipline that favors the large existing data centers of Microsoft, Facebook, Alphabet/Google, and Amazon Web Services. A Sept. 26 study in The New York Times, citing in part a review of the field from Allen Institute of AI, suggests that universities will always lag in AI because of their inability to approach, by even a fraction, the compute power of centralized corporate data centers. Part of the problem stems from the exponential acceleration of the number of calculations necessary to run basic tasks in deep learning, which the Allen Institute and OpenAI estimate has soared by a factor of 300,000 in six years.
Even the neutral arbiters are falling under corporate control. The Allen Institute, founded by Microsoft founder Paul Allen, has remained independent, but OpenAI, another analytical AI firm financed by Elon Musk and other leading executives, became a for-profit company in early 2019, receiving a $1 billion investment from Microsoft as a means of gaining a guaranteed source of compute power. OpenAI remains respected for its work in analyzing platforms, but independent analysts worry about the continuing independence of OpenAI.
NSF has meanwhile tried to encourage more independence among software developers, where academia’s role can in theory be protected. The National Center for Supercomputing Applications at University of Illinois Champaign-Urbana recently received a $2.7 million grant from NSF to develop a deep-learning software infrastructure, though even this effort requires tight links with IBM and Nvidia. The specific instances where university centers are financed by NSF still have corporate partners that can skew the research. The strong role played by Intel, Mellanox, Dell EMC, IBM, and Nvidia in the Texas Advanced Computing Center at the University of Texas at Austin is another example.
Waiting for clarity – and explanations
The incremental steps made by startups such as DarwinAI may be harbingers of a new era where deep learning is understood on a finer-grained level, and can be exploited more easily by academic researchers. Unease over the direction and control of AI research may eventually turned out to be as overblown as Defense Department dominance of tube-based computing 70 years ago.
What is clear, however, is that traditional academia models for incorporating CS/EE projects into overall STEM goals are sorely in need of an update. Software education programs based on traditional high-level languages may be so outdated that they could well be scrapped — used only as a means of providing programmers with historical background.
For now, universities are unlikely to cobble together realistic deep learning research projects until better means of explaining the results of neural-network training emerge. Even after these platforms become more understandable, universities will have to examine vertical domains and specialized optimization methods where their projects will not have to compete head-to-head with corporate data centers with far larger computing resources than even the most well-endowed universities.