Welcome. Let’s get into this survey about artificial intelligence, one of the piece of technology that is usually found in majority nowadays.
Artificial intelligence is the creation of machine with intents, purposes, qualities that closely resembles a construct, another creation, just as human. This definition stretches for all considerable life with intelligence, by its arbitrary meaning.
The topic has been around ever since the old time, for example Descartes (1637)
Descartes on AI
Pioneers like Marvin Minsky, the godfather of AI, led the world in its shape of AI in the earlier stage of it (1940s-1960s)
The concept of layering brain - \(A,B,C\)-brain (Marvin Minsky, 1956)
The Society of Mind - according to Marvin Minsky
So, what gives? What is the general idea? What can be considered the construct that can bear the role of artificial intelligence?
The general concept of artificial intelligence - the agent and the world
Using such construct, investigation can be found, and perhaps certainly, process can be deducted to be used accordingly.
A sophisticated reflex agent
A learning agent
Per usual, this looks like a kind of mathematical modelling, and indeed it is. For structure confounding to \(M\), and \(S\) of the system, with \(Q\) relies on the environment, we effectively make a mathematical modelling system with some added quirks - the ability to recognize, remember, ‘self-process’, and ‘adaptation’, or learning. Theoretically.
One of the most important aspect of such model, for any unique application, is to think, and to be able to learn. This is encoded in the theory of machine learning.
The construct of an ML system
Here is our point:
Machine learning can be thought of as a fundamentally powerful method for automatic modelling. Or, more simply said, to make “programs without being explicitly programmed”.
What does that mean? Before machine learning, mathematical modelling is static, and intrinsically rule-heavy.
The typical mathematical modelling construct
Let’s take an example. How about population of a given world?
Supposed the world of population \(p(t)\). Then, \(dp/dt\) is the time-derivative of the population over time. If, we take \(dp/dt=ap\) for arbitrary \(a\in \mathbb{R}\):
The simple model of growth
If, we introduce the concept of birth and death, then for \(b(t)\) and \(d(t)\),
\[ dp/dt = b(t) - d(t)\longleftarrow\begin{cases} b(t) = 2 + \sin{(t)} \\ d(t) = 1 + 0.5 \cos{(t)} \end{cases} \]
The population, with arbitrary death and birth
Machine learning’s goal is to automate this process.
We know the differential equation, we know the functions, we just need implementation. Machine learning automate that.
A machine learning model approximating data with the assumption of it being polynomial. Can you do better than it?
Machine learning can be considered into two main types:
The architypical difference, illustrated of classical and the deep regime of AI
A tree of architectures in classical and neural designs. Note that the neural side, or deep learning, is far cleaner, per usual.
The main general similarity that they exhibit is, however, the reliance on data means they must abide to the concept of train and test partitioning of data. The main reason is that data is generally not perfect.
The intuition on generalization
The overfitting concept and resultant in errors, illustrated of the empirical and generalization learning.
Indeed, while current theory suggests double descent to occur, in general, per certain application, such event would not appear in general, overall. If yes, then it is also not good that much.
The typical training-testing partition phase of a ML models
Deep learning, in general, relies on the notion of the neuron \(\mathcal{N}\), which encapsulate the ideal view of a biological neuron.
The biological neuron
An attempt in the old day, by Rosenblatt (1951)
The artificial neuron from its biological counterpart, simplified to computation
Many of such components can be created and conjuncted, into the artificial neural network (ANN).
The normal feed-forward, artificial, fully-connected network.
The somewhat replica analysis on deep network and complex neural network in biological brain sections
To train those networks, the technique developed per usual, is the backpropagation technique.
Typical backpropagation on normal network.
Cutting-edge research and applications utilize deep learning model the most. It is hence imperative we review certain flavour of them.
For image processing, the most widely use is convolutional neural network (CNN) and their variations
A LeNet, by Yann Le Cun, the first ever convolutional neural network
The main mechanism is focused on the convolutional layer unit, of which compresses information on an image mesh.
The convolutional layer on a \(3\times 3\) image
Result of CNN with six different channels.
Application of CNN on face matching
In a more modern, requirement of \(1-1\) correspondence between spatial information (since CNN compresses spatial pixel), fully-connected CNN is there to solve such problem.
A fully connected convolutional network
Result from FCN on classification. For these test images, we print their cropped areas, prediction results, and ground-truth row by row.
Moving on, if we want the network to facilitate memory, such idea can be done using Recurrent Neural Network (RNN)
RNN with hidden states
They are mostly used in language processing:
A recurrent network for word processing.
Gated memory cell
Partially incorporated gated LSTM structure
Memory cell incorporation
Hidden state incorporation into LSTM
Illustrative GRU reset-update mechanism
The concept of candidate state addition for main flow of GRU.
Hidden state addition into GRU, finalizing it.
An attention layer form
A structure of Transformer architecture
It can also be incorporated into computer vision:
Computer vision plus Transformer.
Alternative, parallel CNN-Transformer hybrid
Usually, structures increase exponentially of the computational cost. We can then use transfer learning for it.
The transfer learning (TL) approach.
The stack includes pre-training, and fine-tuning, both of which serves different scalability criteria
The pre-training and finetuning methods
GAN is introduced to back-forth verification between two status - the model and its adversarial misinformation form.
GAN architectural consideration.
GAN result from discriminator and generator
There are more details that I left out, including reinforcement learning, online learning, more advanced learning scheme (unsupervised, semi-supervised), more sophisticated learning methods like genetic algorithm, physics-inferred learning mechanism or approximation, classical algorithmic approaches, Bayesian statistics, spatial statistics and application, concentration inequalities, network-in-network architecture, and so on (encoder-decoder architecture too). But those are too long for the presentation, and I opt for a clearer understanding in later presentations.
Thank you. (Reference, well, just ask).