Introduction to ML — History

Machine learning, aka ML, is living the third period of recognition. Almost any company, regardless of its size, uses machine learning to process the data and aggregates them in a way that lets us make predictions.

These predictions can be applied in many fields. Questions such as what will happen in the stock market or the weather or the prediction of a robot movement after an action, can be answered due to machine learning.

McCulloch (right) and Pitts (left)

Machine learning history starts with the first mathematical model of neural networks presented in the scientific paper “A logical calculus of the ideas immanent in nervous activity” by Walter Pitts and Warren McCulloch in 1943. Their proposal was based on the application of the logical connection not, and, or and the idea that they could reproduce neural connections by making combinations of them.

Arthur Samuel |

The conception of the term “Machine learning” was coined in 1959 by Arthur Samuel. He created a program for playing championship-level computer checkers. The mind swift was at the behaviour of the program. Instead of writing a program that would include every possible scenario, he created an algorithm that analyzed the data to decide how to play checkers. Additionally, Samuel utilized a minimax algorithm of finding the optimal move, assuming that the opponent is also playing optimally. The algorithm was able to learn from the previous games and improve over time.

In 1965, Ukrainian-born soviet scientists Alexey (Oleksii) Ivakhnenko and Valentin Lapa developed hierarchical representation of a neural network that uses polynomial activation function and are trained using the Group Method of Data Handling (GMDH). It is considered the first-ever multi-layer perceptron and Ivakhnenko is often considered the father of deep learning.

Two years later, in 1967, Thomas Cover and Peter E. Hart from Stanford University published an article about one of the most famous algorithms, the nearest neighbour algorithm. Two tuning decisions are needed, the number of the nearest neighbours to form the classification rule and the distance measure. This algorithm allowed us to group similar things and recognize patterns without knowing anything about the dataset except for some key features.

And then it all changed.

Α visionary group noted a weak point in the knowledge-based systems that were all the rage in artificial intelligence in the 1970s. The results of machine learning were difficult to communicate. Understandably, people tend to resist to something new, so they insisted that it should take the form of if-else rules after the joint effort of engineers and experts in the field. However, the supporters of ML said that if it so hard to write instructions about a specific problem, why not provide the instructions indirectly, educate a machine through examples from which the computer will learn. However, the main difficulty was the absence of the requisite machine-learning techniques.

The scientific community saw this absence as a challenge. Hence, in1983 the volume of ML papers started to surge up. The increase of computer power allowed researchers to create realistic applications, classifiers, automatic fine-tuning programs and many others. Adding up to this uptrend, a textbook released from Tom Mitchell summarized the state of the art of the field in a formal way. The universities built courses around this textbook, and machine learning became popular again to the community.

Paul Smolensky, three years later, in 1986, comes up with a Restricted Boltzmann Machine (RBM) which can analyze a set of inputs and learn probability distribution from them. Nowadays, this model is useful to find the most popular words in an article or develop AI-powered recommendations systems. In the 1990s, the paper “The Strength of Weak Learnability” by Robert Schapire and Yoav Freund introduced boosting for machine learning, which enhanced the power of the machine learning algorithms. The Random decision forests algorithm is introduced in a paper published by Tin Kam Ho in 1995. The name “forest” originated by the creation of multiple AI decisions trees to make accurate decision-making.


Even though this snowball effect of brilliant algorithms by those beautiful minds was so impactful, IBM declares the machine learning importance with a chess game. Deep Blue was a chess-playing computer that won its first game against world champion Garry Kasparov in game one of a six-game match on 10 February 1996. That was proof that machines can learn and outsmart real people.

Later, in 1997, the first “deepfake” software is developed by Christoph Bregler, Michele Covell and Malcolm Slaney. The software utilized machine learning techniques to connect the sounds produced by a video’s subject and their face's shape.

It has also cultivated the belief that we can imitate the human mind. Τhe first mention of the term “deep learning” was by a neural networks researcher Igor Aizenberg. However, the enthusiasm for machine learning waned. It took another 9 years for the next achievement, combined with the rise of computing power.

Fei-Fei Li |

In 2009, a massive visual database of labelled images ImageNet was launched by Fei-Fei Li. She believed that an image has storytelling behind it. Therefore, she has led the students and collaborators team to organize the international competition on ImageNet recognition tasks called ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) between 2010 and 2017 in the academic community.

Andrew Ng |

This gave the baton to Google’s X Lab. The team of Andrew Ng developed an artificial intelligence algorithm Google Brain, which later in 2012 became famously good at image processing. Where eventually led us to facial recognition, Generative adversarial networks (GAN) and many others.

Undoubtedly 2020 is a year that will not be forgotten. In addition to the virus, two major innovations were announced this year, contributing to machine learning evolution.

A neural network, called AlphaFold, developed by Google-owned DeepMind, has made an enormous step in solving one of biology’s most significant challenge determining a protein’s 3D shape from its amino-acid sequence. By accurately predicting protein structures from their amino-acid sequence would vastly accelerate efforts to understand the building blocks of cells and enable quicker and more advanced drug discovery.

Lastly, NVIDIA AI managed to create new images that reproduce the original artist’s work style. These images can then be used to help train further AI models. The technique is called Adaptive Discriminator Augmentation (ADA), and NVIDIA underlines that it is able to reduce the number of the training images required by 10–20x while still getting these fantastic results.

To conclude, machine learning has a long way to go. I am one of those who supports the AI dream. We live in an era of enthusiasm for this science field from 2009 and onwards. This fact, give me the intuition that we will witness amazing innovation in the near future. And I hope that all of us will be safe and healthy to witness it together.

Computer science became my passion since I entered university. Programming always keeps me motivated because of the fact that it allows me to improve our lives.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store