Research and Collaboration

Below are the areas where I am currently focused and the people and partners with whom I have worked and continue to explore these exciting fields.

Large Language Models

Foundation LLMs like ALBERT, GPT-4, FLAN-T5, and LAMDA are changing the landscape of commerce, and research will be disrupted just as much, if not more, very soon.

Large language models (LLMs) have completely dominated the technical landscape in the last 6 months. Where this hype will end and what roles/tasks will be changed by the end of it is still hard to tell. As an educational exercise, I believe in empowering people to know how these models work, what they are under the hood, and how they were built so that the mystery can be removed and the promise can be realized. In my research and education, I focus on:

Pre-Training - The original datasets that are used to train the underlying LLM architecture are gargantuan and contain so much information, we may never know everything that’s included. This data is used to train the Foundation model over 100s of GPU hours.
Fine-Tuning - A foundation model on its own is not quite as useful as you’d expect. This is mostly due to the fact that for a model like GPT-x, its objective in training is to simply output the next token… which could be from a literary article, a piece of Python code, an instruction manual, or pretty much anything you can think of. This diversity is incredible and allows for the magical knowledge experience GPT-x affords, but it would need refining before being able to interact well with a user. Fine-tuning is the way to take a Foundation LLM and have it perform how you need it to. This can look like a number of options:
- Traditional fine tuning on custom data - say you have sensitive data that wouldn’t be on the public web, or you just need the linguistic power of the Foundation model on a smaller, more focussed dataset. This type of fine-tuning takes the weights of the trained LLM as a starting point and uses the data you give it to train further on.
- Prompt-engineering (few/one/zero-shot learning) - If you don’t have much labeled data to train with, or none at all, you can use a curated format of the prompt (the input to the LLM) to give more context, and even some examples. The LLM will look at this context and do as well as it can (which can be remarkably performant with even around 50 samples).
- Prompt-tuning - This is still a very early field where, somewhat akin to training the classifier in a CNN, the prompt embeddings are trained on rather than updating the weights of the whole model. This moves the word/token embedding space around as a means to improve performance on the Foundation model.
LLM Fundamentals - What is attention? Auto-encoders vs. autoregressive models? There are tons of new topics that LLMs bring to deep learning, as well as leveraging on important deep learning fundamentals.
LLM Chaining - Probably the most important next step in our development of LLMs and AI is the notion of allowing these models to think and plan and execute these plans. With tools like LangChain, LLMs can be used to interact with multiple APIs (say Google search, or Wolfram Alpha) so that the reasoning power of the LLM can be combined with more application-focussed but “dumb” tools so that more elaborate tasks can be attempted.

Physics-Informed Machine Learning

Physics-based loss functions allow for better generalized deep learning models.

Synthetic data can inform new design tools for complex biomedical devices

Big data and the ever increasing computational power has meant that in many areas of business and science, A.I has become a powerful tool in inference and classification. However, for areas like the physical sciences, where the data is based on variables that might be extremely difficult or impossible to measure, data is hard to produce. This is where the concept of Physics-Informed Machine Learning comes into play. Using this concept, my research has enabled the design of novel biomedical devices, improved the detection of mild traumatic brain injury, enabled better oil-well performance predictions, and continues to push the bounds of what is currently possible using simulation and A.I together. This new area of research is pregnant with possibilities and combines many new areas together to solve some of the hardest problems. My current focus is on the three pillars of Physics-Informed Machine Learning:

Pre-Training - using simulated results to navigate deep learning to a point where the limited amount of real-world data can push the model to it’s optimal state.
Physics-based Loss Functions - adding terms like conservation of energy, momentum, and other important laws into the optimization process to improve model generalizability and ensure realistic predictions.
Synthetic Data for Inverse Modeling - some models that predict how a system will evolve from one point to another cannot be inverted to go backwards. Synthetic datasets from these numerical models can be used to train a deep learning model to let us invert these problems and discover the initial or boundary conditions.

Partners and Collaborators

Collaboration is fundament to success in all areas of research. I believe the future of research depends strongly on the bridging of industry and academia, there is so much that they can, and should, learn from each other.

Research Partners*

Databricks
Amazon Web Services
NVIDIA
MathWorks

Massachusetts Institute of Technology

Stanford University
Monash University
University of Melbourne

*Partner in no way associates any particular work or opinion from this site to any of the entities mentioned. These partners have simply either directly or indirectly contributed to my work. Further details of the relationship can be obtained by emailing me.

Research and Collaboration

Large Language Models

Physics-Informed Machine Learning

Partners and Collaborators

Research Partners*

Sam Raymond, PhD