Deep Learning (9)

What are Large Language Models? What are they not?

This is a high-level, introductory article about Large Language Models (LLMs), the core technology that enables the much-en-vogue chatbots as well as other Natural Language Processing (NLP) applications. It is directed at a general audience, possibly with some technical and/or scientific background, but no knowledge is assumed of either deep learning or NLP. Having looked at major model ingredients, training workflow, and mechanics of output generation, we also talk about what these models are not. “At this writing, the only serious ELIZA scripts which exist are some which cause ELIZA to respond roughly as would certain psychotherapists (Rogerians). ELIZA performs best when its human correspondent is initially instructed to”talk” to it, via the typewriter of course, just as one would…

Continue reading...

Group-equivariant neural networks with escnn

Escnn, built on PyTorch, is a library that, in the spirit of Geometric Deep Learning, provides a high-level interface to designing and training group-equivariant neural networks. This post introduces important mathematical concepts, the library’s key actors, and essential library use. Today, we resume our exploration of group equivariance. This is the third post in the series. The first was a high-level introduction: what this is all about; how equivariance is operationalized; and why it is of relevance to many deep-learning applications. The second sought to concretize the key ideas by developing a group-equivariant CNN from scratch. That being instructive, but too tedious for practical use, today we look at a carefully designed, highly-performant library that hides the technicalities and enables…

Continue reading...

Deep Learning and Scientific Computing with R torch: the book

Please allow us to introduce Deep Learning and Scientific Computing with R torch. Released in e-book format today, and available freely online, this book starts out by introducing torch basics. From there, it moves on to various deep-learning use cases. Finally, it shows how to use torch for more general topics, such as matrix computations and the Fourier Transform. First things first: Where can you get it? As of today, you can download the e-book or order a print copy from the publisher, CRC Press; the free online edition is here. There is, to my knowledge, no problem to perusing the online version – besides one: It doesn’t have the squirrel that’s on the book cover. So if you’re a…

Continue reading...

Implementing rotation equivariance: Group-equivariant CNN from scratch

We code up a simple group-equivariant convolutional neural network (GCNN) that is equivariant to rotation. The world may be upside down, but the network will know. Convolutional neural networks (CNNs) are great – they’re able to detect features in an image no matter where. Well, not exactly. They’re not indifferent to just any kind of movement. Shifting up or down, or left or right, is fine; rotating around an axis is not. That’s because of how convolution works: traverse by row, then traverse by column (or the other way round). If we want “more” (e.g., successful detection of an upside-down object), we need to extend convolution to an operation that is rotation-equivariant. An operation that is equivariant to some type…

Continue reading...

AO, NAO, ENSO: A wavelet analysis example

El Niño-Southern Oscillation (ENSO), North Atlantic Oscillation (NAO), and Arctic Oscillation (AO) are atmospheric phenomena of global impact that strongly affect people’s lives. ENSO, first and foremost, brings with it floods, droughts, and ensuing poverty, in developing countries in the Southern Hemisphere. Here, we use the new torchwavelets package to comparatively inspect patterns in the three series. Recently, we showed how to use torch for wavelet analysis. A member of the family of spectral analysis methods, wavelet analysis bears some similarity to the Fourier Transform, and specifically, to its popular two-dimensional application, the spectrogram. As explained in that book excerpt, though, there are significant differences. For the purposes of the current post, it suffices to know that frequency-domain patterns are…

Continue reading...

Getting started with deep learning on graphs

This post introduces deep learning on graphs by mapping its central concept - message passing - to minimal usage patterns of PyTorch Geometric’s foundation-laying MessagePassing class. Exploring those patterns, we gain some basic, very concrete insights into how graph DL works. If, in deep-learning world, the first half of the last decade has been the age of images, and the second, that of language, one could say that now, we’re living in the age of graphs. At least, that’s what commonly cited research metrics suggest. But as we’re all aware, deep-learning research is anything but an ivory tower. To see real-world implications, it suffices to reflect on how many things can be modeled as graphs. Some things quite naturally “are”…

Continue reading...

Starting to think about AI Fairness

The topic of AI fairness metrics is as important to society as it is confusing. Confusing it is due to a number of reasons: terminological proliferation, abundance of formulae, and last not least the impression that everyone else seems to know what they’re talking about. This text hopes to counteract some of that confusion by starting from a common-sense approach of contrasting two basic positions: On the one hand, the assumption that dataset features may be taken as reflecting the underlying concepts ML practitioners are interested in; on the other, that there inevitably is a gap between concept and measurement, a gap that may be bigger or smaller depending on what is being measured. In contrasting these fundamental views, we…

Continue reading...

Forecasting El Niño-Southern Oscillation (ENSO)

El Niño-Southern Oscillation (ENSO) is an atmospheric phenomenon, located in the tropical Pacific, that greatly affects ecosystems as well as human well-being on a large portion of the globe. We use the convLSTM introduced in a prior post to predict the Niño 3.4 Index from spatially-ordered sequences of sea surface temperatures. Today, we use the convLSTM introduced in a previous post to predict El Niño-Southern Oscillation (ENSO). ENSO refers to a changing pattern of sea surface temperatures and sea-level pressures occurring in the equatorial Pacific. From its three overall states, probably the best-known is El Niño. El Niño occurs when surface water temperatures in the eastern Pacific are higher than normal, and the strong winds that normally blow from east…

Continue reading...

Convolutional LSTM for spatial forecasting

In forecasting spatially-determined phenomena (the weather, say, or the next frame in a movie), we want to model temporal evolution, ideally using recurrence relations. At the same time, we’d like to efficiently extract spatial features, something that is normally done with convolutional filters. Ideally then, we’d have at our disposal an architecture that is both recurrent and convolutional. In this post, we build a convolutional LSTM with torch. This post is the first in a loose series exploring forecasting of spatially-determined data over time. By spatially-determined I mean that whatever the quantities we’re trying to predict – be they univariate or multivariate time series, of spatial dimensionality or not – the input data are given on a spatial grid. For…

Continue reading...