References
[goodfellow2016deep] - Goodfellow, Ian and Bengio, Yoshua and Courville, Aaron and Bengio, Yoshua - Deep learning. - 2016. -
Summary/Abstract
N/A
[Rosenblatt1958ThePA] - Frank Rosenblatt - The perceptron: a probabilistic model for information storage and organization in the brain.. - 1958. -
Summary/Abstract
N/A
[nakkiran_deep_2019] - Nakkiran, Preetum and Kaplun, Gal and Bansal, Yamini and Yang, Tristan and Barak, Boaz and Sutskever, Ilya - Deep {Double} {Descent}: {Where} {Bigger} {Models} and {More} {Data} {Hurt}. - 2019. -
Summary/Abstract
We show that a variety of modern deep learning tasks exhibit a double-descent phenomenon where, as we increase model size, performance first gets worse and then gets better. Moreover, we show that double descent occurs not just as a function of model size, but also as a function of the number of training epochs. We unify the above phenomena by defining a new complexity measure we call the effective model complexity and conjecture a generalized double descent with respect to this measure. Furthermore, our notion of model complexity allows us to identify certain regimes where increasing (even quadrupling) the number of train samples actually hurts test performance.
[belkin_reconciling_2019] - Belkin, Mikhail and Hsu, Daniel and Ma, Siyuan and Mandal, Soumik - Reconciling modern machine learning practice and the bias-variance trade-off. - 2019. -
Summary/Abstract
Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This double descent curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. We provide evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and we posit a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.
[mcculloch_logical_1943] - McCulloch, Warren S. and Pitts, Walter - A logical calculus of the ideas immanent in nervous activity. - 1943. -
Summary/Abstract
Because of the “all-or-none” character of nervous activity, neural events and the relations among them can be treated by means of propositional logic. It is found that the behavior of every net can be described in these terms, with the addition of more complicated logical means for nets containing circles; and that for any logical expression satisfying certain conditions, one can find a net behaving in the fashion it describes. It is shown that many particular choices among possible neurophysiological assumptions are equivalent, in the sense that for every net behaving under one assumption, there exists another net which behaves under the other and gives the same results, although perhaps not in the same time. Various applications of the calculus are discussed.
[WERKER198449] - Janet F. Werker and Richard C. Tees - Cross-language speech perception: Evidence for perceptual reorganization during the first year of life. - 1984. -
Summary/Abstract
Previous work in which we compared English infants, English adults, and Hindi adults on their ability to discriminate two pairs of Hindi (non-English) speech contrasts has indicated that infants discriminate speech sounds according to phonetic category without prior specific language experience (Werker, Gilbert, Humphrey, & Tees, 1981), whereas adults and children as young as age 4 (Werker & Tees, in press), may lose this ability as a function of age and or linguistic experience. The present work was designed to (a) determine the generalizability of such a decline by comparing adult English, adult Salish, and English infant subjects on their perception of a new non-English (Salish) speech contrast, and (b) delineate the time course of the developmental decline in this ability. The results of these experiments replicate our original findings by showing that infants can discriminate nonnative speech contrasts without relevant experience, and that there is a decline in this ability during ontogeny. Furthermore, data from both cross-sectional and longitudinal studies shows that this decline occurs within the first year of life, and that it is a function of specific language experience.
[10.5555/50066] - Minsky, Marvin L. and Papert, Seymour A. - Perceptrons: expanded edition. - 1988. -
Summary/Abstract
N/A