Sam Altman interviewed at MIT - Transcript (April 13, 2023)

Date of conversation: April 13, 2023. https://www.youtube.com/watch?v=T5cPoNwO7II Speaker 1 0:11 context for you. I know you're on this worldwide tour, trying to help control the fire that GPT four and Chad GPT have started. This particular room is maybe a little different from from a lot of that most of the people here in this room are either building companies, or working on plans to build companies that are in the ecosystem really triggered by jet chat, GBT. My people I was gonna were there. Yeah, I wish you were here to actually, they are exactly your kind of people. And, and I know that part of the mission here is to make the world a better place, but also to build on top of the platform that you've created. And obviously, you navigated to the position you're in in life very deliberately, and you're the perfect person to help advise them. So we're going to try and keep this on focus in a way that it helps this room as much as possible these 900 people create successful companies. And so the first thing we're going to ask you about is, you know, if AGI is, is in the near term future, then we're right now at this inflection point where human history has a period of time up till AGI and then obviously has a completely different history from here forward. So it seems to me that at this stage, you're going to be a centerpiece of the history books, no matter how this evolves, do you think it's the same? Sam Altman 1:36 Do I think it's the same in terms of what Speaker 1 1:38 in terms of in terms of the way, history will describe this moment, this moment being this year of innovation in this field? Sam Altman 1:47 Um, I mean, I hope this will be like a, you know, a page or a chapter in the history books. But I think that over the next several billion years, such unbelievable things are going to happen that this will be just sort of like, you know, one small part and there'll be new and bigger and more exciting opportunities and challenges in front of us. Speaker 1 2:08 So I think one of the things that a lot of people are asking, you know, what's prior iterations of GPT, open source iterations, you had a whole variety of ways of taking that source code and making a vertical company out of it, or an adjacent company something a federated learning or something. In the future iteration of these companies, you've got this highly tunable closed API to start from any quick advice on okay, I'm starting a company now I have to make some decisions right out of the gate. What do I start with? How do I how do I make it work in any given vertical use case? Sam Altman 2:44 You know, I think there's always more that stays the same about how to make a good company than what changes that the and a lot of people, whenever there's like a new platform shift like this thing, just because they're using the platform, like that's what's going to guide business strategy. It doesn't nothing and lets you off the hook for building a product that people love for being like very close to your users fulfilling their needs for thinking about a long term durable business strategy. Like, that's actually probably only more important during a platform shift, not not less, like if we think back to the launch of the App Store, which is probably the most recent similar example. There were a ton of companies that built very lightweight things with, I don't want to call them like exploitive mechanics, but just like, you know, it was it was not something durable. And those companies had incredible meteoric rises and falls. And then the companies that really like did all the normal things are supposed to do to build a great business endured for the last 15 years. And so you definitely want to be in that latter category. And the tech, the technology is just like this is a new enabler. But what you have to do as a company is like build a great company that has a long term compounding strategic advantage. Speaker 1 3:57 And then what about foundation models, just as a starting point, you know, if I look back two years, one of the best ways to start was to take an existing foundation model, maybe add some layers and retrain it in a vertical use case. Now, the foundation models are that the base model has maybe a trillion parameters, so it's much much bigger. But your ability to manipulate it without having to retrain it is also far, far more flexible. I think you have 50,000 tokens to play with right now. And the basic model. Sam Altman 4:28 Is that right? About 32,000 in the biggest model 32,000 based one, okay? Speaker 1 4:34 And actually, so how's that going to evolve? There are new iterations that are going to come out pretty quickly. Sam Altman 4:40 We're still trying to figure out exactly what developers want in terms of model customization. We're open to doing a lot of things here and we're you know, we also hold our like developers or our users, it's our goal is to make developers super happy and figure out what what they need. We thought it was going to be much more of a fun Interesting story. And we have been thinking about how to offer that in different ways. But people are doing pretty amazing things with the base model, and for a bunch of reasons often seem to prefer that. So we're like, actively reconsidering what customization to prioritize, given what users seem to one seem to be making work. As the models get better and better. It does seem like there is a trend towards less and less of a need to fine tune and you can do more and more in the context. Speaker 1 5:30 And when you say fine tuning, you mean changing parameter weights? Yeah. Yeah. I mean, is there going to be ability and ability at all to change the parameter weights? In the Sam Altman 5:40 arm the JpT? Well, we Yeah, well definitely offer something there. But it like right now, it looks like maybe that will be less used, and then ability to offer like, super cheap context, like 1 million if we can ever figure that out? Yeah. Baseball? Yeah, let's Speaker 1 5:54 drill in on that just a little bit. Because it seems like regardless of the specifics, the trend is toward as the models are getting bigger and bigger and bigger. So you go from 1 trillion to 10 trillion parameters, the amount you can achieve with just changing prompt engineering, or changing the tokens that are feeding into it, is growing disproportionate late to the model size. Does that sound right? Sam Altman 6:19 Um, disproportionately to the model size? Yes. But I think we're like at the end of the era, where it's going to be these like, giant, giant models, and we'll make them better and other ways. But I would say, like it grows proportionate to the model capability. Yep. Speaker 1 6:35 And then the investment in the creation of the foundation models, is in the on the order of 50 million 100 million, just in the in the training process. So it seems like, is it what's the magnitude there? Sam Altman 6:50 We don't share, but it's much more than that. Okay. Speaker 1 6:55 And, and rising, I assume, over time. Yeah. So then, so then somebody's trying to start from scratch. Somebody's trying to start from scratch, you know, is trying to catch up to something that's, maybe or Sam Altman 7:05 maybe we're all being incredibly dumb. And we're missing one big idea. And all of this is not as hard or expensive as we think. And there will be a totally new paradigm that obsoletes us, which will be great. And not great for us, but like great for the world. Speaker 1 7:17 Yeah. So let me get your take on some. So Paul Graham calls you the greatest business strategist that he's ever encountered. And of course, all these people are wrestling with their business strategy and what exactly to build and where. And so I've been asking you questions that are more or less vertical use cases that sit on top of GPT, four and Chechi. And soon GPT, five, and so on. But there's also all these business models that are adjacent. So things like federated learning, or data conditioning, or just deployment and and and so those are interesting business models, too, if you were just investing in a class of company that's in the ecosystem, any thoughts on where the greater returns are where the faster growing more interesting business models are? Sam Altman 8:05 I don't think PG quite said that. I know, he said something like in that direction. But in any sense. In any case, I don't think it'd be true. I think there are people who are like unbelievable business strategists. And I'm not one of them. So I hesitate to give advice here. The only thing I know how to do, I think is one strategy again, and again, which is very long time horizon capital intensive, difficult technology. That's, and I don't even think I'm particularly good at those, I just think not many people to try them. So there's very little competition, which is nice. I mean, a lot of competition. But the strategy that it takes to now like take a platform like open AI, and build a new, fast growing defensible consumer enterprise company, I know almost nothing about, like, I know all of the theory, but none of that practice, and I would go find people who have done it and get the practice, get the advice from them. Speaker 1 9:03 All right. Good advice. Couple questions about the underlying tech platform here. So I've been building neural networks myself, since the parameter, countless sub 1 million, and they were actually very useful for a bunch of commercial applications. And then kind of watch them tip into the billion and then the, you know, with GPT, two, I think about one and a half billion or so. And then GPT, three, and now GPT, four, so you go up, we don't know the current parameter count, but I think it was under 75 billion in GPT, three, and it was just mind blowing ly different from GPT. Two and then GPT. Four is even more mind blowing ly different. So the raw underlying parameter count. Seems like it's on a trend just listening to invidious forecasts where you can you can go from a trillion to 10 trillion. And then they're saying up to 10 quadrillion in a decade. So you've got four factors of 10 or 10. 1000 acts in a decade, does that even sound like it's in the right? ballpark? Sam Altman 10:05 I think it's way too much focus on perimeter count. I mean, perimeter count will trend up for sure. But this reminds me a lot of the gigahertz race in chips in the late 90s, and 2000s, where everybody was trying to like, point to a big number. And then event like you don't need probably most of you don't even know how many gigahertz are on your iPhone, but it's fast. Like what we actually care about is capability. And I think it's important that what we keep the focus on is rapidly increasing capability. And if there's some reason that parameter count should decrease over time, or we should have like multiple models working together, each of which are smaller, we would do that, like, well, we want to deliver to the world at the most capable, useful and safe models. We are not here to like, jerk ourselves off about parameter count. When we quote you on that, Speaker 1 10:57 it's gonna get quoted no matter what, so yeah, that's my. Okay, well, thanks for your thank you for taking that away from me. So. But one thing that's absolutely unique about this class of algorithm versus anything I've ever seen before, is that it surprises you with raw horsepower, regardless of whether you measure it in parameter count or some other way. It does things that you didn't anticipate, purely by putting more horsepower behind it. And so it takes advantage of the scale, the analogy I was making this morning is if you have a spreadsheet, you coded it up, you run it on a computer, that's 10,000 times faster, it doesn't really surprise you. It's nice and responsive, it's still a spreadsheet, whereas this class of algorithm does things that it just couldn't do before. And so we actually one of our partners in our venture fund wrote an entire book on GPT. Two, and you can buy it on Amazon, it's called Start here or start here romance, I think about 10 copies have sold, I bought one of them. So maybe nine copies have sold. But if you read the book, it's just not a good book. And here we are, it's only that was four years ago, it's only been four years. And now the quality of the book, it has gone from you know, GPT 234, not a good book. Now it's somewhat reasonable book to now it's possible to write a truly excellent book, you have to give it the the framework, you have to You're still effectively writing the concept, but it's filling in the words just beautifully. And so as an author, that could be a force multiplier of something like 10, or 100. And it just enables an author to be that much more powerful. So this class of algorithm, then, if the underlying substrate is getting faster and faster and faster, it's going to do surprising things on a relatively short timescale. And so I think one of the things that people in the room need to predict is okay, what is the next real world society benefiting use case that hits that tipping point on this curve? So any insights you can give us into? You know, what's, what's going to be possible? That wasn't possible a year prior two years prior? Sam Altman 13:04 Okay, I said, I don't have like business strategy advice, I just thought of something I do. I think in new areas like this, one of the right approaches is to let tactics become strategy instead of the other way around. And, you know, I have my ideas. I'm sure you all have your ideas. Maybe we'll be mostly right, we'll be wrong in some ways. And even the details of our right will be wrong about the I think you never want to lose sight of vision and focus on the long term, but a very tight feedback loop of paying attention to what is working and what is not working. And doing more of the stuff it's working and less of the stuff that's not working. And just very, very careful. User observation can go super far. So like, you know, I can speculate on ideas, you can speculate on ideas. None of that will be as valuable as putting something out there and really deeply understanding what's happening and being responsive to it. Speaker 3 14:10 As Dave is getting ready for the next question, Sam, when did you know your baby chat? GPT was something really special? And what was the special sauce that allowed you to pull off something that others haven't? And they will come back? But yeah. Oh, who likes Sam so far? All right. If Sam was hiring, would you consider being part of his team? Okay. Sam Altman 14:32 All right. We got a lot of hands. Right? Yeah, please, please come we really need help and it's going to be a pretty exciting next few years. I mean, we've been working on it for so long that it's like you kind of know with gradually increasing confidence that it's really going to work but this is what we've been doing the company for seven years. He's things taken along. I would say by and like in terms of why it worked when others have And just because we've like, been on the grind, sweating every detail for a long time and most people aren't willing to do that, in terms of when we knew that Chechi Beatty in particular was going to like catch fire as a consumer product, probably like 48 hours after launch. Yeah. All right. Unknown Speaker 15:17 So before Dave comes went back, I asked Lex to ask a sexy question. Hey, do you want us to communicate? Are you good? What is it? It's a Star Trek. Or you're good, Unknown Speaker 15:29 I'm good. I grew up in the Soviet Union. We didn't have Speaker 3 15:35 checkoff second season, you Speaker 4 15:38 may ask some sexy controversial questions. So you got legends in artificial intelligence, Ilya Sutskever and Andre Karpati over there who's smarter. Just kidding. I'm just kidding. You don't have to answer that as a joke. But what he was about to he was thinking about it. I like it. No, I just so we're at MIT and from here are with Max Tegmark and others. They put together this open letter to halt AI development for six months. What are your thoughts about the this open letter? Sam Altman 16:12 There's parts of the thrust that I really agree with, we spent more than six months after we finished training GPT four before we released it. So taking the time to really study the safety of a model to get external audits external red teamers to really try to understand what's going on and mitigate as much as you can. That's important. It's been really nice. Since we have launched GPT, four, how many people have said, like, wow, this is not only most capable model opening hours put up, but like, by far the safest and most aligned. And unless I'm trying to get it to do something bad it won't. So that we totally I totally agree with. I also agree that as safety, as capabilities get more and more serious, that safety bar has got to increase. But unfortunately, I think the letter is missing, like most technical nuance about what's where we need the pause, like, it's actually like opening AI, an earlier version of the letter claimed open eyes training GPT five right now, we are not alone for some time. So in that sense, it was sort of silly, but we are doing other things on top of GPT, four that I think have all sorts of safety issues that are important to address, and were totally left out of the letter. So I think moving with caution, and an increasing rigor for safety issues is really important. The letter I don't think is the optimal way to address it. Speaker 4 17:40 It's just a quick question for me one more, is, you have been extremely open having a lot of conversations being honest. Others that open AI as well, what's the philosophy behind that? Because compared to other companies that are much more closed in that in that regard? And do you plan to continue doing that, Sam Altman 17:59 we certainly plan to continue doing that. The trade off is like we say dumb stuff, sometimes, you know, stuff that turns out to be totally wrong. And I think a lot of other companies don't want to say something until they're sure it's right. But I think this technology is going to so impact all of us that we believe that engaging everyone in the discussion, putting these systems out into the world, deeply imperfect, though they are in their current state, so that people get to experience them, think about them understand that the upsides and the downsides, it's worth the trade off, even though we do tend to embarrass ourselves in public and have to change our minds with new data frequently. So we're gonna keep doing that, because we think it's better than any alternative. And a big part of our goal at open AI is to like, get the world to engage with this and think about it and and gradually update and build new institutions, or adapt our existing institutions to be able to figure out what the future we all want is. So that's kind of like why we're here. Speaker 1 19:00 So we only have a few minutes left, and I have to ask you a question that that has been on my mind since I was 13 years old. So I think if you read Ray Kurzweil or any of the luminaries in the sector, the day when the algorithms start writing the code that improves the algorithms is a is a pivotal day, it's like accelerates the process toward infinity or in the singularity view of the world to absolute infinity. And so now a lot of the companies that I'm an investor in or have been co founder of are starting to use. LLM is for cogeneration. And it's interesting, very wide range of lifts are improvement in the performance of an engineer ranging from about 5% to about 20x. And it depends on what you're trying to do, what type of code how much context it needs. A lot of it is related to tuning in the in the system. So there's two questions in there first, within open AI, how much of a force multiplier Do you already see within the creation of the next iteration of the code and then the fall One question is, okay, what does it look like a few months from now a year from now, two years from now? Are we getting close to that day? Where the thing is so rapidly self improving that it hits some? Some great Sam Altman 20:12 question. I think that it is going to be a much fuzzier boundary for you know, getting to self improvement, or or not, I think what will happen is that more and more of the improvement loop will be aided by AIS, but humans will still be dragging it. And it's going to go like that for a long time. And there's like a whole bunch of other things that I have never believed in the like, one day or one month take off for a bunch of reasons. But like one of which is how incredibly long it takes to build new data centers, bigger data centers, like even if we knew how to do it right now, just like waiting for the concrete to dry getting the power into the building stuff takes a while. But I think what will happen is humans will be more and more augmented and be able to do things in the world faster and faster. And it will not work out like it will not somehow like most of these things don't end up working out quite like the Sci Fi books, and neither will this one. But the rate of change in the world will increase forevermore from here, as humans get better and better tools.