Chapter 60 — Attention Is All You Need

 

The Curve of Time, Chapter 60 —— Attention Is All You Need, in which Saskia transfers her attention to Mica.

Followed by a quick shout out to pi-day.

Listen to full episode :

Chapter 60 -- Attention Is All You Need
Rufus Williams

— 60 —

Attention Is All You Need

They had left Saskia’s house in search of a bagel, and because the space was still charged with Wassily’s specter.

“Look,” Saskia tried reasoning, “my past is my past, I can’t change that. There’s——”

“Maybe you can change it,” Mica interrupted.

“There’s no point being jealous of my history.”

“I’m not jealous of your history, I’m jealous of the way you mindmelded with him.”

Saskia sighed.

“He’s the data your mind was trained on.”

That made Saskia smile, Mica throwing her jargon back at her. She couldn’t help it. “So update my biases.”

“What?” Mica tilted her head.

Saskia explained that the transformer architecture that underpinned the modern chatbots wasn’t a static model. “It famously pays attention.” With her fingers, Saskia air-quoted the title of the famous 2017 paper that had kicked everything off: “Attention is all you need.”

“Attention?”

But they had arrived at the bagel shop and Saskia gestured inside. “Let’s order our zeros.”

“Why would you call a bagel a zero?”

Saskia squinted at Mica, confused.

Mica shook her head. “You just called the bagels zeros.”

“Zero is awesome.” Saskia laughed. “It’s an affectionate term. You’re just taking the concept for granted.”

Mica bristled, but rather than elaborate, Saskia had the good sense to steer them both inside her favorite sandwich store.

Mica’s mood improved with their food, and it inclined her to make a peace offering. “So, tell me about attention.”

Saskia explained that attention was meant both as a long term, and as a contextual application to a model’s understanding of words. Words could be imbued with specific meanings depending on updated circumstances of the users’ question. “It can account for frame of reference. Like donut.” Saskia held up her bagel.

“You’re holding a bagel.”

Saskia grinned. “But you’d call the shape a donut. A mathematician would call it a torus. A donut is something filled with custard that you eat. So in the word2vec embedding space ‘donut’ could shade either way; context would determine if it referred to the shape or a fried dough delight. Attention determines which.”

Mica pursed her lips and squinted at Saskia,

“The transformer moves the embedding vector representing ‘donut’ around the embedding space. It uses a sequence of linear transformations determined by the words in your query.”

Mica’s eyes popped wide open and her mouth followed suit. “Wait, how does the embedding space work? Are the words like pieces of string——like knots——or are they the stretched and folded sheets of paper?”

Saskia rubbed her furrowed brow before suddenly dropping her hand and meeting Mica’s eyes. “No, each word is a single point. It’s not like the more topologically intriguing manifolds Wassily was talking about back at your place. And the embedding space, rather than being two, or three, or even four dimensional, has, like, ten thousand dimensions. We’re looking at points in a super high dimensional space.”

Mica frowned again.

“There’s still a bunch of interesting geometry going on, though,” Saskia enthused. “Like, if you look at the difference between the king and queen vectors——the direction and length from the point king is embedded at and the point queen is embedded——then, that’s pretty similar to the difference between, say, uncle and aunt, or brother and sister. So, those difference vectors somehow encode switching gender.” Mica’s frown was turning into a scowl, and Saskia could see she was, not merely losing her, but frustrating her too. Pulling the metaphorical rip cord she switched subjects. “You said that something Wassily said gave you an idea. You ready to tell me about it?”

Mica’s eyes brightened.

Saskia reached out and adjusted a lock of Mica’s hair. “You see, I pay attention too.”

The corner of Mica’s mouth lifted. “I think there might be a way for you to fix the DBG rig in the gulf.”

“I’m not sure I’ve got it in me to try going back that far again,” Saskia responded cautiously.

Mica waved her hands in front of her. “It’s not that. It just requires slowing time down. Employing your super strength.”

Well, Friends, that was chapter 60, I hope you enjoyed it!

For the commentary today, rather than connect my musings to the chapter we just heard, I wanted to give a topical shoutout. March 14th is pi-day, it being 14 days into the third month: 3.14.

Though it might not have crossed your radar in the past, the idea of pi-day has been slowly building steam over the last few years; I suspect you’re more likely to know about it if you have kids. In any event, those celebrating typically highlight some fun fact about pi.

For those paying attention, I obviously glossed over this shoutout last year. But now that we’ve come full circle (in the sense that I’ve been at this podcast for over a full year), I’d feel remiss to ignore it a second time. So, let me convince you that pi is a really natural number.

As I suspect most of you know, pi is the ratio of the circumference of a circle to its diameter. No matter how big or small your circle is, or where you find it! Most people, when they first hear this, are surprised that the ratio of the circumference to the diameter of any circle is the same, regardless of its size, or whereabouts. If I cast my own mind back, I’m pretty sure I was one of `most people’.

Anyway, I thought it might be fun to give a quick intuition for why this must be the case.

To start, imagine a circle. Now, imagine the smallest possible square surrounding it. Let’s call the width of that square one unit. Hopefully, you can see that that implies that our inscribed circle must also have diameter one unit.

Now, suppose I want an upper bound for how far it is around our circle. Well, clearly circumnavigating the circle is shorter than going all the way around the square, since going around the circle is simply cutting each corner of the square. But, the square, of course, is four units around, thus, since we can always put a smallest square around any circle, it must be true that for any circle, circumnavigating it is less than four times the distance to cross it.

Alternatively, we could inscribe the biggest possible square in our circle. This time, the diagonal of that square would be the same one unit across the circle, making the side lengths of our square 1 over the square root of 2. And now, since going around the circle means a longer path than circumnavigating the square, we get a lower bound of two times the square root of 2 (the sum of the four sides of the inscribed square) for the distance around any circle. Thus, any circle must be between 2.82.. and 4 times the distance across it.

Perhaps you see the trick we can now apply? Instead of a square, put a pentagon, or a hexagon, or an octagon inside and outside your circle. The geometry is a bit trickier, but the same ideas apply. The clever thing is that we could do the same trick with higher and higher sided polygons and the gap between the upper and lower bounds clearly get tighter and tighter.

In fact, the lower bound for a twenty sided polygon would be about 3.13; for a one hundred sided polygon, 3.14, which I suspect means that at a glance, a one hundred sided polygon looks more or less identical to a circle. Pi is simply the limit of these two converging bounds, and, as we’ve just seen, it doesn’t depend on where our circle lives.

Until next week, be kind to someone and keep an eye out for the ripples of joy you’ve seeded.

Cheerio
Rufus

PS. If you think of someone who might enjoy joining us on this experiment, please forward them this email. And if you are one of those someone’s and you’d like to read more

SUBSCRIBE HERE

Previous
Previous

Chapter 61 — An Ephemeral Wind

Next
Next

Chapter 59 — A True Superpower