EP. 150: ARTIFICIAL INTELLIGENCE AND THE PHYSICIAN OF TOMORROW

WITH MICHAEL HOWELL, MD

The Chief Clinical Officer of Google unpacks how artificial intelligence is transforming the work of doctors, raising urgent questions about ethics, empathy, and what it truly means to heal in an age of intelligent machines.

Listen Now

What happens to the practice of medicine when machines begin to reason, summarize and even empathize — at least in the linguistic sense — better than humans do? 

In this episode, we meet with Michael Howell, MD, MPH, Chief Clinical Officer at Google, to explore the seismic shifts underway in healthcare as artificial intelligence becomes more deeply embedded in clinical workflows. Dr. Howell, a pulmonary and critical care physician, has spent his career at the crossroads of clinical excellence and systems innovation. Before joining Google, he served as chief quality officer at University of Chicago Medicine. At Google, he leads the development and implementation of AI technologies intended to support scalable, safe and equitable medical care. 

Over the course of our conversation, we examine what AI is and isn't. We delve into how large language models are reshaping the cognitive labor of clinicians, the implications of machines that may someday outperform humans in diagnosis, and whether there is something inherently human about healing that algorithms will never capture. Along the way, we discuss not only the promises of AI, but also its hidden dangers, ethical landmines, and the enduring question — in a future defined by ever smarter machines. What does it mean to be a good doctor?

  • Michael Howell, MD is the Chief Clinical Officer at Google, where he leads the team of clinical experts who provide guidance for Google’s health-related products, research, and services. His career has been devoted to improving the quality, safety, and science of how care is delivered and helping people get the best information across their health journey. He previously served as the University of Chicago Medicine's Chief Quality Officer, was associate professor of medicine at the University of Chicago and at Harvard Medical School, and practiced pulmonary and critical care medicine for many years. Michael has published more than 100 research articles, editorials, and book chapters, and is the author of Understanding Healthcare Delivery Science, one of the foundational textbooks in the field. He has also served as an advisor for the CDC, for the Centers for Medicare and Medicaid Services, and for the National Academy of Medicine.

  • In this episode, you will hear about:

    • 2:43 - Dr. Howell’s path to medicine and eventually to becoming Chief Clinical Officer at Google 

    • 6:45 - Why examining the differences between theory and implementation of technology matters

    • 17:35 - The evolution of AI and its clinical capabilities

    • 26:05 - The definition of “thinking” in the age of AI

    • 36:11 - How AI could change the landscape of healthcare on a global scale

    • 50:26 - The ethics of using — and not using — AI in medicine

    • 54:36 - The role of a doctor in 20 years 

  • Henry Bair: [00:00:01] Hi, I'm Henry Bair.

    Tyler Johnson: [00:00:03] And I'm Tyler Johnson.

    Henry Bair: [00:00:04] And you're listening to The Doctor's Art, a podcast that explores meaning in medicine. Throughout our medical training and career, we have pondered what makes medicine meaningful. Can a stronger understanding of this meaning create better doctors? How can we build healthcare institutions that nurture the doctor patient connection? What can we learn about the human condition from accompanying our patients in times of suffering?

    Tyler Johnson: [00:00:27] In seeking answers to these questions, we meet with deep thinkers working across healthcare, from doctors and nurses to patients and healthcare executives. Those who have collected a career's worth of hard earned wisdom, probing the moral heart that beats at the core of medicine. We will hear stories that are by turns heartbreaking, amusing, inspiring, challenging, and enlightening. We welcome anyone curious about why doctors do what they do. Join us as we think out loud about what illness and healing can teach us about some of life's biggest questions.

    Tyler Johnson: [00:01:02] What happens to the practice of medicine when machines begin to reason? Summarize and even empathize, at least in the linguistic sense, better than humans do. In this episode, we meet with Doctor Michael Howell, chief clinical officer at Google, to explore the seismic shifts underway in healthcare as artificial intelligence becomes more deeply embedded in clinical workflows. Doctor Howell, a pulmonary and critical care physician, has spent his career at the crossroads of clinical excellence and systems innovation. Before joining Google, he served as chief quality officer at University of Chicago Medicine. At Google, he leads the development and implementation of AI technologies intended to support scalable, safe and equitable medical care. Over the course of our conversation, we examine what I actually is and isn't. We delve into how large language models are reshaping the cognitive labor of clinicians. The implications of machines that May 1st day outperform humans in diagnosis, and whether there is something inherently human about healing that algorithms will never capture. Along the way, we discuss not only the promises of AI, but also its hidden dangers ethical landmines, and the enduring question in a future defined by ever smarter machines. What does it mean to be a good doctor? As technology evolves, the essence of healing may lie not in doing more, but in being more fully present. Michael, thank you for taking the time to join us and welcome to the show.

    Dr. Michael Howell: [00:02:42] Thanks for having me.

    Tyler Johnson: [00:02:43] Before we talk about how you got to where you are, can you give us a brief overview of where you are and what you do, the day to day responsibility of your work?

    Dr. Michael Howell: [00:02:53] I'm the chief clinical officer at Google. And if that sounds a little bit like a made up job title for the best job in the world, you know, it could be could be, right. And, you know, I have teams of doctors and nurses and psychologists who work in areas where Google has products that impact health. And so that would be things like search and YouTube, but also research and cloud and Google DeepMind, sort of across the company. And then I have the team that does the human subjects research trials because we have a few medical regulated medical devices. So we do some prospective human subjects research studies. And there's there's sort of a lot to unpack there enough that my boss and I, Karen DeSalvo, and I a couple of years ago wrote an article in New England Journal catalyst describing what a clinical team at a large at a large technology company should be in the modern era.

    Tyler Johnson: [00:03:49] Okay. So I think it's probably the case that most people, when they're in medical school studying, you know, the cranial nerves and abdominal anatomy and whatever, probably are not thinking, oh, when I grow up, I want to be the chief clinical officer at Google. In fact, most people probably don't even know that Google has a chief clinical officer. They may not even know that Google has any medical enterprises, period. So could you first trace for us how did you get into medicine period in the first place? But then also how did your path eventually end up taking you to Google of all places to work?

    Dr. Michael Howell: [00:04:23] Yeah, I mean, I was one of those kids who, for reasons that are still opaque to me, never wanted to be anything besides a doctor. Don't really know why. No body who does any kind of medicine or nursing or physical therapy or anything in the family? Um, but apparently, you know, the first toy I supposedly asked for was a Fisher-Price doctor set. Like, there must be a gene, right? And so it's it's really quite a blessing to know what you want to do. I have come to realize, did my undergrad at rice, I was an Asian Studies major, focused on like 2000 year old literature in China and then went to med school, as one does, and then thought I would be a primary care doc in the burbs with a private practice. Learned I love taking care of sick people, and that I really loved intensive care medicine and then had what is a more common academic career now. So between 1995 and 2017, I was either training or practicing as an ICU doc. I ran parts of health systems mostly focused on quality and safety, but also, you know, a few other things here and there.

    Dr. Michael Howell: [00:05:41] And then I did a bunch of research in this area that a few of us tried to brand as health care delivery science, which is the idea of how to use research, quality math and good methods to understand what you're doing in actual care delivery instead of theoretical care delivery. And I did some research with the Google team. I supervised a couple of MIT doctoral students in 2010 ish, trying to use neural networks on full fidelity electronic health record data, and the experience was so bad, I shut down machine learning work in my lab for the next five years. And there's some reasons, like historically, there was this big important discovery in 2011, 2012, and it's just before that. And we'd done some research together. And Google had in the kind of 2016, 2017 span, figured out that machine learning now really worked on healthcare data. And so I came over as one of the quite early hires in trying to figure out, like, what would Google do now that ML worked in healthcare?

    Tyler Johnson: [00:06:45] So I want to spend a lot of our time today talking about artificial intelligence, large language models, machine learning and all of that. But before we get to that, you have mentioned already that a part of your work over your career has dealt with the gap between what we as doctors think is going to happen when a particular procedure or treatment or algorithm or whatever is introduced into the sort of medical practice and what actually happens. Right. And I think that's an interesting idea, and probably one that a lot of medical students and even doctors don't think about that much. Right. It would seem like, oh, well, gosh, if there's a phase three trial in the New England Journal of Medicine, then of course, that, you know, shows that the thing works. And yet what you are saying is that at least part of your work has dealt with the gaps between the theory and the implementation. And so can you talk a little bit to us about why that's important, or maybe even share an example of of why looking at that gap and really analyzing it matters.

    Dr. Michael Howell: [00:07:43] I'll give a few examples. If you take my field pulmonary and critical care for decades, it was defined by a few things, right? It was defined by a swan-ganz catheter. It would be an example. You know, that's funny, because I wonder if younger doctors who are just coming out of training or just coming up through training, I wonder if they even know what a swan-ganz catheter is. Which is ironic because I think I was sort of coming into medicine right at the end of the era when getting a wedge pressure was considered a, you know, a key constituent of taking care of most ICU patients, right? Or at least many ICU patients. Whereas more recently it's something that I see only very rarely now. I'm not an ICU doctor, but I take care of, you know, a lot of ICU patients who have cancer or whatever when I'm attending in the hospital. All of which is just to say that it is interesting to note how much that has changed.

    Dr. Michael Howell: [00:08:36] Yeah, I might say I might say it this way. You know, I made the mistake one time. I don't know if it's a mistake, but I made the mistake of writing a textbook about healthcare delivery science. And we have this section in it called problems about thinking about health care delivery to simply the really risky time is when something seems obviously correct. So it's obviously correct that if someone in shock is someone is in shock, that more information will be better, right? That's what a swan gets catheter was about. Turns out, empirically, it's not true. And you expose patients to risk and expense and other things with it. You know it's useful in some situations, but not all. It's obviously correct that if a patient with diabetes has high blood sugar in the ICU, that you should control that tightly. Well, it turns out that that approach, which spread into quality measures and across the United States, may have been responsible for thousands of deaths once the right trials were done. There's some debate about that. It's obviously correct that we should treat pain as is the fifth vital sign and provide pain medication to make it go away. We've seen consequences with the opioid epidemic related to that. And so the tension between we think we know what to do and we've tested what to do is quite wide because the human body is miraculously complex. And there are lots of things we don't understand about it. And so that's what the field was about.

    Tyler Johnson: [00:10:13] So I do want to pause here to talk for a minute about attention that I think maybe we don't give enough attention to, which is, you know, at least when I was in medical school, the the mantra that we repeated and, and tried to live by was evidence based medicine. Right. And so we had, you know, whole classes about how to discern what was good evidence and what wasn't, how to read evidence, how to implement it, how to think about sort of what constitutes high level evidence, all of those kinds of things. But one of the things that is so striking to me is that the very elements of, for example, a large blinded phase three randomized controlled trial, the very thing that were things that make it so powerful, are also, in another sense, the things that are so limiting, right, in the sense that in theory, what is supposed to happen is that when there's a phase three trial, you look at it and you say, okay, here are the results. And so now I'm going to go and I'm going to go implement those results. But then the question is right, that in some sense the phase three trial, all it can do is look at a group of people, or in a sense, it almost feels like a sort of a, quote, average patient for a, you know, a sort of a given scenario.

    Tyler Johnson: [00:11:28] But of course, there is no such thing as an average patient. Right? And even though you may be able to try to put patients into different groups, it seems questionable to assume that anybody who, for example, meets the inclusion exclusion criteria in a trial is therefore going to warrant that particular treatment. Necessarily in that particular scenario. So then there's this sort of a an. Intuitive sense that well, yes, we need to use the evidence to guide us. But then we also need to know when to make exceptions. Right. That seems sort of intuitive. But then as soon as you start making exceptions, then at some level you wonder, well, gosh, what was the point of the phase three trial? If we're just going to make exceptions to how we apply the evidence anyway? So I guess all of that is just to say, can you talk through a little bit that tension between how we think about the importance of the statistical considerations that go behind making a phase three trial or whatever, statistically and clinically significant, and yet how we have to think about applying that evidence to an individual patient.

    Dr. Michael Howell: [00:12:31] Yeah, I mean, I was the the chief quality officer for University of Chicago Medicine and also did a lot of clinical time. And so I had this sort of beautiful job where you make some policy and then you'd, like go in the ICU. You'd be doing work. You're like, who did this? And you're like, oh, it's me. This is terrible. And then you're like, you know, go back to the office after you're off service and you'd be like, all right, I got to fix that. And that's a useful feedback loop to have number one. But let me tell you how I described it to our faculty, which is for things that are reasonably common, you should agree on how you're going to practice and you should write that down. And if you choose to not practice that way because the patient's characteristics or the family's needs mean that you should practice other some other way, great. Like write a sentence down about why. Because care should be instantly, infinitely variable when driven by the needs of the patient or the family. But it shouldn't vary because they see you or they see me, or they see me on a Tuesday, or me on a Thursday, and my practice pattern is different on Thursday than Tuesday.

    Dr. Michael Howell: [00:13:45] So those kinds of things of, you know, does this particular trial on, you know, for my practice on Proning or on how to set peep, you know, does it apply to this patient? There's a lot of nuance to that. Every protocol that's ever written is always wrong for this specific patient and is best on average. And so there's a lot of art and practice to figuring out what to apply when. But when you look at it, the amount of variation that's driven by things other than the needs of the patient or family is extraordinarily high, even within an individual physician. And so that is hard to defend. And so that that was some of the work that we looked at. And then the other piece that we would look at are, you know, my colleagues in the pharmaceutical industry will call it off target effects. And Don Berwick and II would call it balancing measures of, you know, the health care system is a complex adaptive system. That means, by definition, you can't always predict what will happen correctly. And so when you implement something, you need to check for it doing other things you didn't intend.

    Tyler Johnson: [00:14:53] As I listened to you talk, it occurs to me that I think in a sense, what is needed is a balance where on the one hand, certainly we need to understand the need for and value of rigorous evidence with statistical and clinical and methodological rigor as we have been discussing. And yet, at the same time, we also have to have the humility and insight and recognition to know that nothing about which we might feel certain is ever as certain as we think it is, right. Which is to say that even if we have results from a large phase three randomized controlled trial or what have you, we are still going to have to be thoughtful and careful about how we implement those results, right? We have to have the humility to recognize that the person in front of us is never as simple as an archetype, and that we have to be thoughtful both about the underlying background pathophysiology and about the potential untoward effects that the things that we implement may cause in spite of our best efforts or best intentions.

    Dr. Michael Howell: [00:16:05] Yes, 100% agree with that. I would add to it that there are a lot of things that are that determine people's patients outcomes and healthcare encounter that have nothing to do with the doctor, and that there there are ways that more sophisticated analytic methods can help with that. And so I'll give an example. You know, people talk about throughput being important and like oh it's important to margin. It's important all these things. Very true. But if you've ever done CPR in the waiting room of an emergency department. Because the hospital was backed up because people couldn't get discharged and therefore the ER was backed up so people couldn't get into the ER and therefore the waiting room was full of sick people. You understand that throughput has a real consequence to actual human beings. So you should care about throughput. So if I ask this question of let's say you have a 500 bed hospital, you know, you might have 1500 or more nurses to staff that before you start applying rules like we want X number of workdays per week or other things like that. How many options are there for how to staff your hospital tomorrow? Well, it turns out that the answer is that that number is larger than the number of atoms in the observed universe. And like, maybe we should have analytic methods more sophisticated than a pie chart to help us out with that kind of complexity.

    Tyler Johnson: [00:17:35] Okay, so I think this makes for a good segue. One of the reasons that we are excited to have you on the program today is to be able to talk about AI and large language models and all the rest. And so, you know, I feel like on the one hand, this is something that is very much in the water right now. If you read the news right, you can read about the new presidential administration and what they think about AI. You can read about Elon Musk, you can read about Sam Altman, you can read about the intrigue at, uh, you know, OpenAI and the launch of ChatGPT or whatever. And especially since the launch of ChatGPT into the public, I feel like that was sort of the horizon event that really put this onto the radar screen of most people, including probably many doctors. Yet at the same time, I feel like if you were to ask most doctors, what is AI and what's it going to mean for you and your career? There's this kind of like vague notion of sort of what it is.

    Tyler Johnson: [00:18:30] And then there is this even vaguer notion that it's probably going to play a big role in health care at some point, but I think if you asked people to be at all specific about those things, they would probably have great difficulty doing that. And so I think just to begin with, some sort of definition of terms, what is AI or more specifically, you know, I think sometimes we fail to recognize that doctors have been using assistive technology for a long time, right? So if I order a Pet scan, the way that the Pet scan is protocols into the computer so that I can view the images or the way that I look things up on Google or on up to date or whatever, all of those are assistive technological innovations, right? And so I guess the question that I have is compared to the things that we have already had, what is it about AI that makes it different? Like what is it that I can do that the other current or past technologies couldn't do? What what really distinguishes it?

    Dr. Michael Howell: [00:19:23] Yeah. So if we took the general term of AI being the science and practice of making machines seem intelligent in the way humans are intelligent, Let's take that as a working definition. Its history is really useful here because there are two technical inflection points to talk about. We wrote last year a paper in Jama called The Three Epochs of Artificial Intelligence for healthcare. It's like two pages, one figure that sort of summarizes everything I'm about to say. Let's go back to 1950 for a minute. And Alan Turing wrote a paper, and that's the paper that the idea of the Turing test came from, which is how we thought about what counted as intelligence for a long time. That Turing test is really is quite well known. But what's not as well known is that Turing cited only nine other papers, and these papers like girdle and these amazing, amazing names. But one of them was the British Medical Journal, and he cited the 1949 Lister Oration in the Royal College of Surgeons. That orations title was called The Mind of Mechanical Man. And so, going back to before the term artificial intelligence was coined. So artificial intelligence coined in 1956 ish, there was a dialectic between people who thought machines would not be intelligent and people who would. And the thing that all the listeners should take away is that medicine. And I have been entwined from the very, very beginning. It's not something that's new. It goes all the way back to Turing's paper. Okay. Fast forward and we see a huge amount of work in artificial intelligence over the ensuing decades. So things like internist one was a study published in the New England Journal of Medicine in the 80s. There were artificial ID docs called Mycin, and they all had one thing in common. They tried to encode human knowledge into the computer as a set of expert rules. If gram positive, then do this.

    Tyler Johnson: [00:21:31] Okay, so it's a sort of a super fancy algorithm, right?

    Dr. Michael Howell: [00:21:36] And then they would use regular statistical methods. And so that could do some amazing things. But it was brutal in practice, brutal in practice. And it just didn't work quite right. You have those in your health system today with complicated if then order sets that try to encode clinical practice and have a bunch of branching logic. It's fair to think about that as AI 1.0. Then in about 2011, computers began to be able to see the world around them with convolutional networks, and they began to be able to process words at scale with things like recurrent neural networks or this big innovation called word two vec. And so then we got this decade from 2010, 2011, 2012 to 2020, 2022. Of all of a sudden, we didn't have to label all the photos in our phones as cat and dog, and meanwhile we got reasonable autocomplete. We got quite good machine translation out of this. And in healthcare, one of our group had one of Jama's most influential papers of the decade about diabetic retinopathy detection using AI and machine learning, and showed that you could do that as well as a US board certified ophthalmologist. Since then, there have been a thousand papers that are like that. The important thing about AI 2.0, it's different than what we have today, is every algorithm only did one thing at a time. You have a breast cancer detection algorithm. We just published with northwestern a randomized controlled trial of breast detection AI. Now you want it to screen for lung cancer. You need a whole new set of data, 20, 30, 40 engineers in a year to make a new algorithm to do the next thing.

    Tyler Johnson: [00:23:29] Okay, so in version 2.0 you can teach a bot to do a thing, but you can only teach it in effect to do one thing and teaching it to do that one thing requires an enormous amount of person power and energy and time and cost, and then you have to effectively deploy it to do that single one thing over and over and over again.

    Dr. Michael Howell: [00:23:51] Yes. And so so it can it can have superhuman performance, but at a very narrow bit. And so the next big technical innovation happened in 2018. And it's the creation of something called the transformer. And you know, I'm a I'm a Gen Xer. So I feel like I have to say like not Autobots and Decepticons, but a neural network architecture. And that was invented, I don't know, like 275 yards from my desk or something. The transformer is the T in GPT, and it's the foundation on which almost all modern foundation models in AI today are built. And the important part about the transformer and sort of all of the training that happened with it, is that it can do many different things without a new data set and without being retrained. And so if you think about when you talk to one of the consumer chatbots and you say, oh, can you I'm going on a podcast, I didn't do this, but I probably should have. Oh, I'm going on a podcast about the doctor's art. Can you help me think about what questions they might ask and it'll write you some great.

    Dr. Michael Howell: [00:25:01] I like the fourth question. Give me a response to that question. Oh, now do it as a poem. Oh. Can you. And so each one of those things, all you've had to do to get it to do a new task is give it a sentence, and it does something completely different than in the AI 2.0 era would have required an entire new data set, 20, 30, 40 engineers in a year. And you did it with a sentence. It's remarkable that that works. And so that's the key part about what's going on now is that it can do many different things, and that it's part of that some of its capabilities are that it can sound like a human in 2025, as opposed to 2024. It can take in many different kinds of input video, sound, take a picture of an EKG, and it can do things that seem like thinking about what's going on in a way that it can explain itself to you. And so that's how the three big epics differ and why today is different.

    Tyler Johnson: [00:26:05] Okay. So right at the end there, you made a really interesting statement, which is that this latest generation seems like or looks like it is thinking. So I want to unpack that statement for a minute in a few different ways. So if you look at version one and version two and then compare to the latest version. Implicit in what you said is that those earlier versions did not do this thing that looks like thinking, and this most recent version does. So I guess my question is what is thinking as you are defining it here? And what is it that the third version does that looks like that, that the first versions didn't do?

    Dr. Michael Howell: [00:26:47] Yeah. So so if you take generation two, basically what it would do is predict a number or predict a classification event. This person is likely to have diabetes would be an example of that. Super useful in context the IE 3.0. At its heart it's a next word prediction engine. That's basically what it does. It learns the semantics and structure of human language, and then it will predict the next most likely thing. And that explains a bunch of things like why they hallucinate. So you say, oh, you know, normally here somebody would cite a medical journal. I know what medical journal citations look like. Here are words that are plausible for medical journal citation.

    Tyler Johnson: [00:27:32] Even though they don't actually exist.

    Dr. Michael Howell: [00:27:34] Not go look it up in it doesn't go look it up in PubMed. It just is the model. And so in practice, what you do is you train it so that it knows it's going to give a citation. And then you use PubMed or whatever to actually do that. What I mean when I say that it looks like it's thinking. Let's say that I gave it the task of help me make questions for a podcast interview. It'll give you an explanation that sounds like a person wrote it, and it will do that in a way that sounds like it's planful and sounds usually like it's a reasonable explanation for whatever it did before. The other piece of this that's quite remarkable is that if we sort of jumped over into healthcare for a second. We've done a lot of work on how do we, uh, the term people use is domain align these models. And so we've done a lot of work in how do we domain align them for healthcare. And one of the studies that we did depends on something that I know is in your background, which is medical education. And a tricky thing to do is to figure out what is a safe testing harness for an AI. And so we picked the OSCE, and I imagine that many of the listeners know what an OSCE is, but I bet you could explain it better than me.

    Tyler Johnson: [00:28:58] Yeah. So the OSCE exam is kind of if you imagine a spectrum where over on one end of the spectrum are exams that you might take early in medical school, where you have a number of multiple choice questions, right? Either an anatomy exam where you have to identify a bunch of structures, or a biochemistry exam where you're asked about the details of a biochemical pathway or whatever. The OSCE exam is kind of on the opposite side of that spectrum. So a student is presented with a series of cases. They are sort of in a simulated environment with simulated patients, and they go through and they receive a, you know, certain things in the history and in the exam, whatever. And then they are graded both in their communication and the way that they get the information, all that, but then also in how they sort of put all that together into what do we think is going on and what do we want to do about it. So it's meant to be sort of the highest level reasoning, or one of the highest level reasoning exams that you encounter in medical school, because it's asking you to synthesize all of this knowledge and all of these skills into figuring out what's going on and what you want to do about it, which, of course, is, you know, sort of the that's the prime calling of of an actual doctor.

    Dr. Michael Howell: [00:30:07] Yeah. And, uh, so objective structured clinical exam, standardized patient who knows the script, right. For what it's like to have lupus or heart failure or breast cancer. And medical student or resident goes in and interviews them and maybe does an exam and makes a plan. And so we hired standardized patients. And then we randomized them to do an OSCE by text and by chat. And they were randomized to board certified primary care docs or to a model that was trained to do this. And then like in a real OSCE, we had experts grade the technical component of it. Cardiologists looked at heart failure cases and looked at diagnostic accuracy, and the model outperformed the primary care doctors. But then one of the things I've always loved about the OSCE is that the patient grades the learner for things like empathy, rapport and connection. And we had the patients do that. And much to my, uh, kind of upset by the result that the model outperformed the primary care doctors on something like seven of eight dimensions that our research team looked at. It's quite an upsetting result in many ways. And the thing that I think is important is does the model have empathy? Obviously not. Right. Does the model interact in a way that a human being perceives as empathic? Empirically, that seems to be true. That's a really interesting dichotomy.

    Tyler Johnson: [00:31:45] Okay, so there have been a number of studies now that have shown similar sorts of findings in, you know, slightly different scenarios and whatever. So I want to ask some questions that come out from those kinds of studies. But before we get to that, I want to go back first to that final statement that you made at the end of some of your extended comments a few minutes ago where you said that it looks like or seems like the AI is thinking, so what you have just done now is explained what it is that this latest iteration does, that the previous iterations didn't do that make it seem like it is thinking. But what we have not done yet is had you explained to us, and I recognize that this, you know, question may be philosophical or even metaphysical, but I still think it's important. Why do you say it looks like or seems like the eye is doing it is thinking rather than just saying that the eye is thinking. Like, what is the thing that the AI bot lacks? Why do you believe that there is a difference between the thinking that a human does and the whatever it is that the eye is doing?

    Dr. Michael Howell: [00:32:48] I don't know that there is a great answer to that. I think it's probably been, you know, being debated for 3000 years right through, like, you know, Plato, Hobbes, Locke, Aristotle about what defines cognition. So, you know, I'm convinced that these models are primarily predicting the next token in a sequence and that they're really good at that. They're tuned to do that. We have ways when they predict weird tokens, like making up a medical journal that doesn't exist. We have ways to give feedback and adjust the weights of the neural network in a way that then they're less likely to do that in the future. And so I don't have evidence that they're thinking in the same way, that I don't have evidence that they have empathy. But the outputs are similar to what we talk about when we think about thinking. So what we think about is we think about training time and inference time. And it used to be there's sort of an important thing called a scaling law. That's a nonlinear thing about like how much computation do you have to give to make the model 10% better, twice as good or whatever? And that's why you hear that these models are so expensive to train, because we're pretty far along on the scaling law of the training time. It turns out that if you give them longer to do their processing and considering what token to do next, that they perform better on a number of tasks, and that you can also do things where you say, oh, instead of just giving the answer, make a plan about how you would get to the answer. Then you do another loop with the model and you say, now execute on this plan. And empirically, it turns out that they do better on a wide range of tasks when you do that.

    Tyler Johnson: [00:34:44] And yet I also hear a little bit of a of a hesitation or a caveat in the sense that I think I also hear you saying that even if you have a bot who is equipped with or that is equipped. So here we go. I'm anthropomorphizing all over the place. But even if you have a bot that is equipped with words and ability to react and whatever that make it seem as though the bot is more empathetic, as rated by the people who are using it, that there is still some key thing that is lacking, something that is lost when you move from human to human interaction to human to bot interaction, irrespective of how the humans rate the bot or how the bot does on on some sort of measurement. Am I saying that about right?

    Dr. Michael Howell: [00:35:39] I think there's an empirical question to be answered, which is when people know whether they are talking to a bot or a person, do they have the same reaction as they do when the condition is blinded? That's an empirical question. There's a little bit of evidence that's accumulating that they don't, that the perception of empathy changes when you know, when you know that it's not a person. And I think that I'm interested in how the evidence accumulates there.

    Tyler Johnson: [00:36:11] Okay, so with this in mind, let me pose to you a theoretical ethical question. Let's go back to the Ascii. But let's say that this time it is done in such a way that it is unblinded. So maybe we could imagine two rooms and standardized patients are randomly assigned to go to room A room type A or room type B, and in room type A they interact with a person, a medical student, and in room type B, they interact with a humanoid bot, but that is very obviously still a bot, so they can tell by looking at it. And then let's say that at the end of the encounter, they are asked to evaluate the empathy, bedside manner, rapport, communication skills, whatever of the two entities. And let's say that the results are statistically indistinguishable, that the bot does just as good a job by whatever, you know, methods of measurement you want to use as the human, as a leading healthcare AI researcher or however we want to classify you. If that were the case, would you then say that receiving healthcare delivered by the bot was equally valuable to receiving healthcare from the human?

    Dr. Michael Howell: [00:37:28] I think the right way to think about this is that today on the planet, there are 4.5 billion people. Billion with a B who don't have access to essential health services. And I think, you know, have you tried to get a primary care doctor lately? It's it's murder, right. Even, you know, I have, like, the world's best insurance and it still is murder to get a primary care doctor, let alone if you need a specialist or something like that. And so one of the things that I'm hopeful about is that let's say that we had evidence That there was an AI that could in the way that nurse practitioners and physician assistants have helped scale access. Wouldn't that be amazing? I think that the the question that you're asking, I think, is a little bit of a false dichotomy. Let me give an example. I lived in Chicago like home of the world's best pizza, but sometimes I didn't want to talk to anybody. And, you know, to like, get the good pizza. You have to call and, like, talk to a human being in order and like, you know, maybe give them your credit card. And sometimes I will confess, I would just order Domino's because I could do it on the app. And I didn't want to talk to somebody. And I just like, wanted some pizza. There's a company called insurance. Maybe I don't know anything about insurance, except that they used to have TV ads, like on the regular old TV. And at the end of this ad, their tagline was something along the lines of technology when you want it, people when you don't. I think that's the right way to think about the future. If the evidence base accumulates, is that sometimes you'll want technology and sometimes you want pizza. I recently swapped out from having a virtual primary care doctor to an in-person doc, because I feel like there's something different there.

    Tyler Johnson: [00:39:26] Okay, so recently on the program, we had an author and thinker, Christine Rosen, who recently released a book a month or so ago called The Extinction of Experience. I don't know if you have read it.

    Dr. Michael Howell: [00:39:38] I have not read it.

    Tyler Johnson: [00:39:39] She's sort of one of the world's experts in how changes in technology change the way that we interact with the world, the way that we interact with each other, and to some degree, the way that we interact with ourselves. And one of the things that she discusses in that book is her concern that accelerating change in the digital sphere will eventually bring us to the point It where the privilege of interacting with another human in healthcare, or in whatever field you want to talk about, will become a luxury good that is only available to the rich. And so I guess what I'm wondering is, could you foresee a future in which, if you have regular old Joe Schmo experience, you get to go see a bot for your care? But if you have really fancy schmancy experience or the equivalent of concierge care now where you pay like a, you know, a yearly retainer above and beyond what you would pay for insurance or what have you. Only in that case, you get to go and see a human. Do you think that that that is a possible or probable future scenario? And what would you say about that?

    Dr. Michael Howell: [00:40:53] I mean, I, I worry about the inequitable distribution of the benefits of health care all the time. My family is scattered across northern Alabama, and if I look at the cancer care that my family gets in rural Alabama compared to patients who seek, you got to think it's different. And if I look at maps of global travel time to health care, I know that large swaths of the planet, even if you have a car, it will take you more than 24 hours to see any health care. I don't think there's any reason to think that that will be different in the future, unless we work on making it different, and my hope is that there are things that I assist with that let us get back to the place that we grew up hoping that medicine would be, which it sounds like, you know, is what your friends find in this very niche and boutique area of medicine. And so if we had the ability to support the parts of practice where there's evidence that I can help with in the same way that like, I mean, there was a time when only physicians took blood pressure and then nurses were allowed. And now you can do it at your house and at CVS and all those things. Like, I think that there are there are some counterexamples that are good of things that you don't enjoy doing that don't need your I mean, I don't know what grade you were in when they let you out to practice. I was in the 28th grade, and yet I had an administrative fellow who followed me around on rounds. One time we had this procedure service at UChicago. And, you know, you have to be really good at ripping labels that are printed out to then stick them on things in order to be able to do your job. And it's a real skill to successfully rip the labels and get them on, on all the things. And I don't need 28 grades to be know how to rip labels. And I think that there are a lot of other examples of that.

    Tyler Johnson: [00:43:02] So I hear you on the one hand, but on the other hand, I want to push back a little bit because let me sort of tell a story that I think is a cautionary tale, and then let that sort of ask the question for me. So during the Barack Obama era, there was an enormous amount of excitement about electronic medical records, right? This was back even though it was not that long ago. This was back at a time when the uptake of electronic medical records was still very spotty, and there were a lot of places that were still using paper charts. And there was a big push from the federal government to have electronic medical records used universally. And there was a lot of excitement about what that was going to mean. Now, to be clear, I 100% support electronic medical records in the sense that I'm just old enough to have been in some hospitals where they did not yet have EMRs instituted fully, and it was a total disaster. Right? Like, in retrospect, it's sort of a miracle that any care ever got delivered safely and efficiently, because you could have one patient with their vital signs in this chart. Their lab results in this chart, their radiographs on this computer program and their notes in this other chart, and doing what we call pre rounding, which is getting ready for rounds with the attending, was just an absolute inefficient disaster.

    Tyler Johnson: [00:44:16] So certainly I am glad for epic where you can, you know, just look up the stuff that you need as quickly as you need it. Having said that though, what is also, I think pretty clear in retrospect is that the way that the electronic medical record got implemented was largely focused on maximizing the profit and the ability to bill for the health care companies who are ultimately financially responsible now for employing most physicians and for sort of being providing kind of the healthcare infrastructure that is behind seeing most of the patients, at least anyway in the United States. And so what that has resulted in is an electronic medical record that is optimized for profit, but it is not really optimized for the welfare of the doctors or for the welfare of the patients. And so as a consequence, for example, for a number of years, I was a member of the, you know, sort of the wellness committee at Stanford. And much of the work of the Wellness Committee is focused on freeing doctors from the overwhelming burden imposed by the electronic medical record. Right. Your inbox is never empty. Your notes are never all signed. And even if they are today, then they won't be tomorrow. Right. And some of that, of course, is just the nature of medical care.

    Tyler Johnson: [00:45:39] But there is also this sense in which there are all of these sort of both limits and prerogatives that are imposed upon you by outside bodies, that is to say, by the, you know, sort of the corporate or executive suite at the hospital or by those who are trying to raise profits or lower costs or what have you. And so I guess all of that is to say that I just don't see any compelling or even convincing reason to think that the implementation of AI in medicine is really going to be any different at the end of the day, than the implementation of electronic medical records. It seems like what will happen is that we will optimize for profit. And so even if AI does make it more efficient to come to a diagnosis or even to express empathy or whatever, the thing is that it's in healthcare, it seems like the result of that is not going to be that it quote unquote, frees up doctors. It's going to be that it just provides healthcare corporations for further room to squeeze and squeeze and squeeze so that doctors end up being asked to do more with less again. And so I guess I'm saying, are you seeing something that I'm not seeing that makes it so that that is unlikely to be the case?

    Dr. Michael Howell: [00:46:50] Yeah. Uh, let me disentangle two things here. So one is the way that electronic medical records happened in the United States, and one is a thing called the productivity paradox, which is a historical, maybe not a law, but a repeated thing in history. And so the productivity paradox is we put in all these computers or we put in all these general purpose technologies, and it's been five, six, seven, ten years. And like things are no better and maybe they're worse. And so it turns out, if you go back to the 1900s and the electrification of factories, what happened was they just replaced the big water wheel with one big motor. They didn't redesign the work around the new technology, and it took 15 to 20 years to be like, oh, we can make a bunch of small motors at the place where people work. And then productivity went up. If you look at computers in banking, right. So a common thing is like, why isn't the EHR like ATMs? I can use my ATM card anywhere in the world and get cash. Well, if you look at the implementation of computers in banking, there was not a giant. All of a sudden ATMs appeared. It was. They redid the work as it existed. And then 15, 20 years later, we're like, oh, look at all these other things that we can do. So when did we digitize healthcare in the United States? It was around 2010 with a high tech act and other things like that, where we went from 6% of hospitals and practices having EHR to, you know, now more than 90. Well, how far are we from 2010 or 15 years from it? And so I am hopeful that and we're starting to see some of this where people are using the technology to redesign the work, not replicate the way we did things before.

    Dr. Michael Howell: [00:48:47] And so, so historically, like, I am a believer that history predicts the of the future. Historically, we're at a good time where things might change. How the EHR was implemented in the United States is a different is a different thing. And so, you know, sometimes I'm like, if you've never, like, worked today and gone home and finished your notes and fallen asleep and woken up with dee dee dee dee dee dee dee dee dee. Have you ever really practiced. Right? And so, you know, as an example of one of the things that certainly we see people having a lot of uptake with AI is ambient dictation. And scribes and scribes are a workaround to issues with the human technical interface of the EHR. And so I do think that, you know, I worry for US healthcare, that one out of every $5 that is spent in the United States is spent on healthcare. I worry that many states, because of healthcare costs in their pension, are laying off teachers. I worry that everyone's cost is someone else's revenue. I also have some hope that we're at the right time historically, where it's very hard to do AI on paper in a manila folder, in a file cabinet that no one has the key to that it is possible that the convergence of the timing of AI being what AI is today, and the fact that we're at the right historical point after the after the digitization of US healthcare. I'm not without hope. I think I would say that things may get better.

    Tyler Johnson: [00:50:26] Okay, so I want to go back for a minute to this idea of randomized controlled trials comparing human and bot delivered care. And I want to pose a thought experiment and then ask a question. Let's imagine that you set up an experiment where in a particular city, when patients come to the emergency department, in any of the major hospitals there, they get randomly assigned to be initially evaluated by either a human or a bot, and that evaluation is going to include taking their history, coming to an initial set of conclusions about what likely is going on, and coming up with initial orders, that is to say, giving initial medications or what have you. And then let's imagine that those patients are followed for two years after they are discharged from the emergency department, whether they also get admitted to the hospital or not. And then let's imagine that after those two years, the patients who were cared for by a bot rather than a human, at least initially, are demonstrated to have a clinically and statistically significantly higher median overall survival than those who were taken care of by humans. So, in other words, we show that the bots are better able to provide good care than the humans. And then again, for the sake of the thought experiment, let's assume that this is repeated at multiple hospitals and multiple conditions and whatever, and that the findings are consistent and the evidence is robust. If that were the case, do you think it would then become unethical to have patients cared for in a way that does not involve the bot? Or in other words, do you think there is some threshold of evidence? After which, if we demonstrate that bots are better at something than humans, even within the sphere of health care, that it becomes unethical to allow the humans to continue to do a thing that evidence is shown bots are better at.

    Dr. Michael Howell: [00:52:24] Wouldn't that be a lovely problem to have? I mean, I think we we're we're a long way from even being able to approach that kind of a study. I think the analogy I would use, I proposed and had written the full IRB for a randomized trial of, uh, Thoracentesis using the way I was trained in like thunk thunk thunk versus ultrasound. And, uh, once you've done a few thoracentesis with ultrasound, you're like, ah, it'd be really hard to go back. Same thing with central venous access of, you know, I'm, I'm at like exactly the right vantage of Don I don't know 100 and change like, you know with my hands. And then there's compelling evidence that ultrasound is better for it. So I think it'll be like the adoption of any new technology in healthcare that we need an evidence base. Some things are within scope of medical device regulators. Certainly what you described would be and that, um, the important piece to remember is that the piece of technology by itself is one component of a complex adaptive system. But you know what? I put in a central line without ultrasound. If ultrasound were available, hard to hard to argue. And so. So I'm hopeful. One of the things I love in your thought experiment A is it's very detailed, and B that it presupposes that healthcare will continue to accumulate evidence in a way where we can distinguish shiny from works. And so one of the things that I'm really focused on is how do we help the industry and help help with the accumulation of evidence in a way where we really understand how this technology works and where it doesn't. And so I like the question of multiple repeated randomized controlled trials showing that an intervention works or doesn't work. I think that's core to the modern practice of medicine. And I don't think that AI is different.

    Tyler Johnson: [00:54:36] Okay, so here's our parting shot. Then let me, um, call up an image from a movie and then use that to as sort of the springboard to the question. So, you know, many years ago there was this movie that Robin Williams was in called Patch Adams, which is the story of this sort of iconoclast going through medical school and faces these various different trials and whatever, and then eventually gets his degree. And anyway, but there is a scene in that movie where the character played by Robin Williams is this guy named Patch Adams, and he's portrayed as kind of a savant in the sense that he doesn't seem to have to study very much and yet gets top grades in all of his classes. But then he has a roommate who is played by Philip Seymour Hoffman, who is this very kind of straight laced type, A personality guy who's just studying all the time. Right? And there is a scene where Philip Seymour Hoffman's character is calling the Robin Williams character on the fact that he never studies. And Robin Williams makes some kind of snide remarks that his character makes some kind of snide remarks about how he's doing well on the exams anyway. And you know what is making the other guy so mad? And then Philip Seymour Hoffman delivers this kind of weighty speech where he says, in effect, what you don't understand is that this fact that I'm studying in this book on this afternoon, as part of my medical education, may make the difference between whether a patient lives or dies.

    Tyler Johnson: [00:56:05] 15 years from now. And so, in effect, is saying that he has a sort of a moral obligation to study endlessly because it is going to be his brain's access to those facts that is going to make the difference down the road. Now, that movie was made in, I don't even know. I'm going to say 1995, maybe 2000, something like that. But of course, the the sort of the joke if you're watching that movie today is that that scene makes no sense, because all doctors know that if there is any fact about anything that you don't know, you can Google it or up to date it in two seconds. In fact, I would argue and have argued before that our smartphones have made us functionally like cyborgs. Right? We just we have them on us so ubiquitously that it's like it might as well be like Google might as well be a almost physiologic extension of my brain. And so then the question that I have is this. If that's what we think of looking 15 years backwards, as you who are a person that is at the cutting edge of science as it is advancing.

    Tyler Johnson: [00:57:08] Now, if you look 15 years forward, you know, if I'm just entering into medical school right now and I'm thinking, well, gosh, I've got four years of medical school and then three years of residency and then three years of fellowship and maybe a year for research or teaching or whatever. Right. I may not be a full fledged independent physician until 10 or 15 years from now, even if I'm entering medical school right now. What is it even going to mean to be a doctor in 10 or 15 years? Like if we have seen this much advance in large language models and artificial intelligence and all the rest in just the past five years, and if these things often advance in a, you know, sort of exponential rate, and therefore it seems like a diagnostic and therapeutic and even empathetic bots are only going to get better and better and better. Like, what is the role for a human even going to be? Are humans going to become superfluous? And if they're not going to become superfluous, what is the thing that humans are going to do? Like what is the irreducible or irreplaceable part in medicine that humans will play in the not so distant future, if there is one at all?

    Dr. Michael Howell: [00:58:16] Uh, in, uh, 2019, my mom got diagnosed with gastric cancer at the GE junction. And, um, I was, uh, down to help my dad take care of her. Right. And it was one where, uh, you and I both understand the actuarial tables of that. Right? So it was clear early that it was a non-survivable injury. For folks who don't know, this, cancer is a particularly difficult anatomic location to have because you stop being able to eat quite early. So my mom had a feeding tube. I was down there and I was working at Google at the time, and my dad had a box of stuff that he used to help take care of her feeding tube. You know, we were on a podcast, but I have a picture of that which I sometimes have when I go and do grand rounds and other things about. There's a lot of medicine that's cognitive. And of course, we should want all the cognitive supports that we can get. We should want whatever before PubMed. Do you remember you had to like go to the library and there was like a book where you could look up what all the journal articles were and you had to go like PubMed was a giant advance in the world, right? Google scholar and all these things. Of course you should want all those things. But there's a lot of medicine that is not only isolated cognitive work, it's emotional work. It's reciprocity, human reciprocity. It is respect. It's physical care. Of course, we should want all the cognitive aides that we can. But I think that, you know, if, uh, if I were talking to somebody who's a first year medical student now, I would say it's going to be really interesting and pretty amazing field and that you should use every tool that you can lay your hands on, whether that's an ultrasound or a platinum based therapy or a checkpoint inhibitor or an AI. Of course, you should use every tool that you can lay your hands on to help your patients. I think that will be true. It could be wrong, but I think it would be true.

    Tyler Johnson: [01:00:28] Well, we are so grateful to have had you join us. You have been so generous with your time. We appreciate all of your experiences and your expertise. We know it's taken many years to garner all of those, and we are deeply grateful. And thank you so much for being on the show.

    Dr. Michael Howell: [01:00:44] Thanks for having me.

    Henry Bair: [01:00:49] Thank you for joining our conversation on this week's episode of The Doctor's Art. You can find program notes and transcripts of all episodes at the Doctor's Art.com. If you enjoyed the episode, please subscribe, rate, and review our show available for free on Spotify, Apple Podcasts, or wherever you get your podcasts.

    Tyler Johnson: [01:01:08] We also encourage you to share the podcast with any friends or colleagues who you think might enjoy the program. And if you know of a doctor or patient, or anyone working in healthcare who would love to explore meaning in medicine with us on the show, feel free to leave a suggestion in the comments.

    Henry Bair: [01:01:22] I'm Henry Bair.

    Tyler Johnson: [01:01:23] And I'm Tyler Johnson. We hope you can join us next time. Until then, be well.

 

You Might Also Like

 

LINKS

Learn more about Dr. Howell’s work here.

Next
Next

EP. 149: HUMAN EXPERIENCE IN A DIGITAL WORLD