Skip to Main Content

Yale Psychiatry Grand Rounds: "Multiple Steps to the Precipice: Risk Aversion and Worry in Sequential Decision-Making"

January 19, 2024
  • 00:14And just swap your screen and
  • 00:16then we'll be done. Exactly.
  • 00:18We have this all nicely prepared,
  • 00:19of course. That's OK. Perfect. Super.
  • 00:25OK, Well, thank you very much indeed.
  • 00:27Sorry about that. That that hiccup.
  • 00:28No, nothing is quite as smooth as you hope.
  • 00:30Thanks so much for that
  • 00:31really generous introduction.
  • 00:32You know, it's a really great
  • 00:33pleasure and honour to be here.
  • 00:34I really followed Phil's work
  • 00:35over many years as well,
  • 00:36really learned an awful lot from it.
  • 00:37So. So it's really great to be
  • 00:39here and thanks for the thanks.
  • 00:41That's it.
  • 00:41So the work I'm going to talk about
  • 00:43is joint with a number of people.
  • 00:45So Chris Gagney,
  • 00:46who was a post doc in tubing
  • 00:47and is now a now works for a
  • 00:49company called Hume in New York,
  • 00:51two research assistants in in tubing
  • 00:53and Kevin Shen and Yannick Striker.
  • 00:55And then I might also talk
  • 00:56about some work with two of my
  • 00:58other colleagues in Tubing and
  • 01:00Kevin Lloyd and Shin Sui.
  • 01:03So to introduce this,
  • 01:05imagine the following game.
  • 01:06You're controlling this rather crude
  • 01:09refrigerator like a robot here,
  • 01:12and your job is to get to
  • 01:15this treasure chest here.
  • 01:16And there's a word for getting to
  • 01:18the treasure chest worth worth
  • 01:20five points to our subjects.
  • 01:22There's a cost for falling into
  • 01:24these these things which Chris
  • 01:26loves to call these lava pits.
  • 01:27There's this,
  • 01:29this this is the Iceland version of
  • 01:31this with the with the the volcanoes you
  • 01:34have when you try to move north-south,
  • 01:35east and West,
  • 01:36there are some blockages
  • 01:38shown by these brick walls.
  • 01:39And there's also an error chance
  • 01:41of an error of a of a of an
  • 01:43eighth when you try to move.
  • 01:44So if you try to go north,
  • 01:46there's an eighth chance you'll
  • 01:47move in one of the other directions
  • 01:48instead and then we have a discount
  • 01:50factor to try and encourage you
  • 01:51to get to the goal quickly.
  • 01:52So the question then we pose our
  • 01:54subjects is which route would
  • 01:56you take given this?
  • 01:57So there's a three obvious routes.
  • 01:59I think there's this route that
  • 02:00goes down here through all the
  • 02:02lava hits to get to the reward,
  • 02:03the most direct route.
  • 02:04There's a route which goes as sort
  • 02:06of the intermediate route which goes
  • 02:08around here and then goes close
  • 02:10to this lava but not not the the
  • 02:12main bulk of lava to get to here like this.
  • 02:14And then there's this long route
  • 02:16that goes around here all the way
  • 02:17and then gets to the novel pit that
  • 02:19gets to the to the goal in that way.
  • 02:21So we administered this to
  • 02:23to our subjects in the lab.
  • 02:24I promised I wouldn't tell tell you
  • 02:26who they are because he's kind of
  • 02:28revealing about about your colleagues
  • 02:30when you do this and you can see
  • 02:31that there are subjects divided about 1/3,
  • 02:33a third,
  • 02:34a third maybe a few fewer.
  • 02:36So some people took this very
  • 02:37direct route to get to the goal.
  • 02:39Another group took this intermediate
  • 02:41one and you can see here the where
  • 02:43they're being deviated off this route
  • 02:45by these by these random spots.
  • 02:48And then some other subjects
  • 02:49took all the way around.
  • 02:51And so the question for this talk is,
  • 02:52what is it that goes on in terms of
  • 02:55evaluating the risk associated with these,
  • 02:57with these parts?
  • 02:58And how do you make these?
  • 02:59How do you make these choices?
  • 03:01In this instance,
  • 03:01we're very interested in the
  • 03:03case that you're making choices,
  • 03:04not just a single choice,
  • 03:06but by committing to this path here,
  • 03:08you're successively adjusted.
  • 03:11You have to adjust yourself so these
  • 03:13many steps of risk that you get.
  • 03:15And I think that in,
  • 03:15you know a lot of the work that
  • 03:17that that we and other people have
  • 03:18done in reinforcement learning is
  • 03:20thinking about sequential decision
  • 03:21problems where you don't only make
  • 03:23one choice, you make many choices.
  • 03:25And when those choices are
  • 03:26are infected by risk,
  • 03:27risk can accumulate on paths
  • 03:29in rather interesting ways.
  • 03:31And that really is the context of my
  • 03:32talk of my talk to think about what
  • 03:34the consequences are of that and how we
  • 03:36should think about that as the whole.
  • 03:38So the original,
  • 03:39some of the original thinking about
  • 03:41risk was actually came from the
  • 03:43Bernoulli's thinking about what's
  • 03:44what then became known as or what is
  • 03:47known as the Saint Petersburg problem.
  • 03:48The way that you pose this is you're
  • 03:50tossing a fair coin and then you
  • 03:52look at the number of heads that
  • 03:54you get before you get a tail.
  • 03:56So if you get one head before a tail,
  • 03:57you get to €2 or two monetary units.
  • 04:00If you get 2 heads, you get 4,
  • 04:02three heads, 8 and so forth.
  • 04:03And the question is how much would
  • 04:05you be willing to pay me to give
  • 04:07you an instance of this game.
  • 04:08And the the reason why it's a problem
  • 04:11or a paradox is that the expected value,
  • 04:14so the mean value of these of
  • 04:16this sequence of of outcomes,
  • 04:18this mean value of of of being
  • 04:20playing this game like this is
  • 04:22actually infinite because with a
  • 04:23probably over half you get €2.00
  • 04:25the probably of 1/4 you get €4.00
  • 04:28probably an 8 you get €8 and so forth.
  • 04:30And so the sum value each of these,
  • 04:32each of these possibilities is
  • 04:34worth €1.00 and that would then
  • 04:35just go off to the off to Infinity.
  • 04:38And so the expected value is about
  • 04:40is Infinity,
  • 04:41but the amount that most people think
  • 04:42how much you'd be willing to pay most
  • 04:44people will pay you know somewhere
  • 04:45between 4:00 and 8:00 EUR or four
  • 04:47and $8 to play a game like this.
  • 04:49And so that's the paradox,
  • 04:49is to try and understand why.
  • 04:52But I think the paradox becomes
  • 04:53sharper or at least the task becomes
  • 04:55sharper when you think of it in the
  • 04:57sequential manner that it really
  • 04:58is originally could also be posed.
  • 05:00So here you're tossing the first
  • 05:02coin and at stake is €2.00.
  • 05:04If you get a, if you get a a tail,
  • 05:07that's what you're going to walk
  • 05:08away with is just two EUR.
  • 05:09On the other hand, if we're lucky,
  • 05:11we get a head.
  • 05:12This is the world's smallest gold coin,
  • 05:13which is that Einstein, It's a Swiss coin.
  • 05:16Then you get a head.
  • 05:18That means that now you get stake is €4.00.
  • 05:21And again you're tossing this
  • 05:22coin and you're thinking,
  • 05:23you know what's going to happen.
  • 05:24I get a head or a tail.
  • 05:26I'm lucky.
  • 05:26I'll get a head and then now the
  • 05:28stake becomes €8 and so forth and then
  • 05:31you get a tail and then and then in
  • 05:33this instance you'd walk away with the €8.
  • 05:35And so you can imagine that as you're
  • 05:38getting you know essentially more and
  • 05:39more money is at stake as you do this.
  • 05:42I'm sure many of you are familiar
  • 05:44with the balloon adaptive risk task,
  • 05:46the balloon adaptive risk,
  • 05:47the bot task, which has something
  • 05:49very similar where you're pumping up
  • 05:50a balloon and you know at some point,
  • 05:52you know one pump is going to make
  • 05:53it burst and you lose everything.
  • 05:54And the question is when do you,
  • 05:55when do you quit?
  • 05:56And the Saint Petersburg problem,
  • 05:57it's you have to pay before you ever start.
  • 06:01OK. So the plan for the talk is talk
  • 06:03a bit about risk aversion in general,
  • 06:05how it comes up,
  • 06:06talk about the measure of risk,
  • 06:08which I think is a particularly useful
  • 06:10measure for the sort of work that that we do.
  • 06:13And I think also that it applies
  • 06:14also in animal cases too.
  • 06:15And I'll give you a little example
  • 06:17of that at the end of my talk,
  • 06:18I hope if I have time,
  • 06:20so talk about tail risk in
  • 06:22sequential problems,
  • 06:23then talk about risk of
  • 06:25those online behaviour.
  • 06:26So thinking about our subjects
  • 06:27making their choices in the in
  • 06:30that little maze that you know
  • 06:31with the with the robot and the
  • 06:33and the lava pits and so forth,
  • 06:35say a word about risk,
  • 06:36averse offline planning.
  • 06:37So the idea is if you're in an
  • 06:40environment in which risk is,
  • 06:42which is replete with risk,
  • 06:43then maybe there are things that you
  • 06:45can do ahead of time to try and mitigate it.
  • 06:47Maybe that's going to change the
  • 06:48way you go about thinking about
  • 06:50the about the aspects of the world,
  • 06:52doing some offline planning to
  • 06:54prepare yourself correctly and
  • 06:55then think about what that looks
  • 06:57like in the context of of of risk,
  • 07:00risk diversion and risk sensitivity.
  • 07:02And then also as I say if I have
  • 07:03a chance I'll talk a word,
  • 07:04say a word about a some modelling
  • 07:07we've done of a some lovely data on
  • 07:10how mice do apparently risk sensitive
  • 07:13exploration with some data from
  • 07:16whatabi Yoshida Mitsuko's work in in Harvard.
  • 07:20OK,
  • 07:21so decision making and risk.
  • 07:22So as you all know,
  • 07:24risk is a very critical aspect of decision
  • 07:26making and it comes up anytime that
  • 07:29we have uncertain or probabilistic outcomes.
  • 07:32So here you know you're here in
  • 07:33Saint Petersburg,
  • 07:34we're spinning a coin in other contexts,
  • 07:36we have other sorts of ways
  • 07:37of generating these,
  • 07:38these these probabilities.
  • 07:41Obviously whole industries
  • 07:42have been designed around it.
  • 07:44So things like insurance markets.
  • 07:45So this is the famous,
  • 07:47this is Lloyds of London,
  • 07:48a little picture of Lloyds of London.
  • 07:50And I think that it's likely
  • 07:52plays a very crucial role in
  • 07:53many aspects of psychopathology.
  • 07:54And this is a study that has
  • 07:56been done by very many groups,
  • 07:57including obviously working
  • 07:59in in in in Yale too.
  • 08:01So things like anxiety and mania
  • 08:04are obviously issues about what
  • 08:06might happen could could be there
  • 08:07in OCD you'd see that as well
  • 08:09something again something that
  • 08:10Phil has actually worked on too.
  • 08:12And you also you have this notion of
  • 08:14these sort of ruminative what ifs.
  • 08:16So in the context of the complex world
  • 08:18that we occupy there are many ways in
  • 08:21which we can be many risks that can
  • 08:23with very low probability events there
  • 08:26will cast swerves on the ice in a in
  • 08:28a in Tubian this morning very icy.
  • 08:31So you can imagine when you're you know
  • 08:32walking on the pavement there is a
  • 08:34chance that something nasty can happen.
  • 08:35If you pay a lot of attention to these
  • 08:37very low probability probability outcomes,
  • 08:39then then of course that's going to be
  • 08:42problematical for your expectations
  • 08:44about what might about what might happen.
  • 08:46And when you do that,
  • 08:47when you know you commit to a long
  • 08:49series of choices, then as I as I said,
  • 08:52you have to worry about how risk
  • 08:54accumulates along these along these paths.
  • 08:56So it's been beautifully studied
  • 08:58using single shot gambling paradigms.
  • 09:00So here's a classic example where
  • 09:02you have a choice of either a Shaw
  • 09:04$5 or a 5050 chance of $10 or a 5050
  • 09:07chance of $16.00.
  • 09:08I'm sorry in this case
  • 09:10here so many paradigms.
  • 09:12Obviously Canavan diversity done a
  • 09:13lot of work on that in in in Yale.
  • 09:14IFAT has done a lot of beautiful
  • 09:17work along these lines too.
  • 09:19But what we want to look at is
  • 09:21the sequential problems and not
  • 09:22only not only single shot games.
  • 09:24And so we'll see how that comes out.
  • 09:27So in order to make progress,
  • 09:29we have to define what sort of what
  • 09:31measure of risk we're going to use.
  • 09:33So there are a number of measures that
  • 09:36have been studied in the literature.
  • 09:37So prospect theory, for instance,
  • 09:38very famously gives us a ways of thinking
  • 09:41about how to combine your utilities and
  • 09:43probabilities and these risk cases.
  • 09:46But there's also a lot of work
  • 09:48from the insurance industry,
  • 09:49which of course has been,
  • 09:50you know,
  • 09:50which was worried about many
  • 09:52aspects of risk for a long time
  • 09:54and in a very quantitative way.
  • 09:55And one of the and they've
  • 09:57sort of come up with ideas,
  • 09:59or the mathematical aspect of that has come
  • 10:01up with ideas about how to systematize risk.
  • 10:04And one of the systematic ways that they
  • 10:07think about is to think about tail events.
  • 10:10So here we think of the distribution
  • 10:12of possible returns as just
  • 10:13some sort of histogram.
  • 10:14And then we the the risks
  • 10:17that we worry about,
  • 10:18the risks we care about are risks which
  • 10:20are found typically in the lower tail.
  • 10:22They're the nastiest things that can happen.
  • 10:23So for instance,
  • 10:24many of you will know that you could
  • 10:27think about there these Markovits
  • 10:28utilities where you add to the
  • 10:30mean some fraction of the variance,
  • 10:32but the variance of the distribution
  • 10:34includes not only the lower tail but
  • 10:36also the upper tail that thinks about
  • 10:37the whole structure of the distribution.
  • 10:39Whereas the things that we
  • 10:40worry about are the tail risks.
  • 10:42They're the nastiest things
  • 10:42that could possibly happen.
  • 10:43So things like and that's naturally medicine,
  • 10:46finance, engineering and maybe also
  • 10:48things like predation in animals too.
  • 10:51So how does that work?
  • 10:52So just illustrate this
  • 10:53with our very simple case,
  • 10:55the Saint Petersburg problem.
  • 10:57So yeah, So what I'm now doing is
  • 10:59showing you all the outcomes and their
  • 11:01weighted by the and their probabilities.
  • 11:03So this is 5050 for two EUR up to,
  • 11:05you know, gets vanishingly small with this,
  • 11:07this average value outcome
  • 11:09being worth Infinity.
  • 11:10And if you think about the tail,
  • 11:12what we might do is to say
  • 11:13let's choose in this instance,
  • 11:14let's say the lower 7/8 of the distribution.
  • 11:17So that's just these three dark blue bars.
  • 11:20And that cuts off the upper
  • 11:221/8 of this distribution,
  • 11:24which is all the other,
  • 11:24the much nicer outcomes you could
  • 11:27possibly have and and this and then
  • 11:31this the the value of the outcome
  • 11:33at the which is which is defined
  • 11:36by this by this lower 7/8 tail.
  • 11:38That's a quantile.
  • 11:39That's just a 7/8 quantile
  • 11:40of this distribution.
  • 11:42That's a risk measure itself
  • 11:44called the Value at Risk or VAR,
  • 11:47shown here.
  • 11:47It turns out that the value at
  • 11:49risk doesn't satisfy some of these
  • 11:51nice qualities that we expect that
  • 11:54the from the insurance industry
  • 11:56nicely worked out by Artzner,
  • 11:57Rockefeller and EUR 7 many others as well.
  • 12:00But a measure which also thinks
  • 12:02about the lower tail and does
  • 12:05satisfy these axioms is called
  • 12:07the conditional Value at Risk,
  • 12:08which is simply the average
  • 12:10value in that lower tail.
  • 12:12So the idea is you say I'm
  • 12:14worried about the tail,
  • 12:15we have an alpha value saying
  • 12:16which tail am I worried about?
  • 12:17The 7/8 tail.
  • 12:18If it's the if it's the 100% tail,
  • 12:20the one tail,
  • 12:21it's just the whole distribution.
  • 12:23Here it's the seven eighths tail.
  • 12:24I've cut off all the really
  • 12:25nice outcomes and I'm left only
  • 12:27with the nastiest outcomes.
  • 12:28And as that gets more extreme,
  • 12:30I think about more and more or
  • 12:31less and less of the distribution,
  • 12:33just more and more of the nastiest
  • 12:34things that can happen are going to be
  • 12:36the things that I imagine happening.
  • 12:37And that then defines the
  • 12:39average value in those,
  • 12:40defines this conditional value at
  • 12:42risk or this C bar value itself.
  • 12:44So how does that look?
  • 12:47As we reduce alpha so alpha equals one,
  • 12:49we have the whole distribution.
  • 12:50That's Infinity.
  • 12:51If alpha is 15 sixteenths,
  • 12:53we just get these four bars,
  • 12:557/8 the three bars,
  • 12:563/4 these two bars, and alpha is 1/2.
  • 12:59We just have this one bar left
  • 13:01and so as alpha gets smaller we're
  • 13:03getting more and more risk averse.
  • 13:05We're thinking about this lower
  • 13:07tail of the outcomes that we could,
  • 13:08that we could possibly have.
  • 13:10So formally you can write that down as
  • 13:13being the expected value in this lower tails.
  • 13:16That's.
  • 13:16Then you could just write down these,
  • 13:18these,
  • 13:18this expected value underneath this
  • 13:20quantile of the of the distribution.
  • 13:22But there's another way
  • 13:23of thinking about this,
  • 13:24exactly the same calculation,
  • 13:26almost like a dual view,
  • 13:27which also relates to the way
  • 13:29that prospect theory controls
  • 13:31or thinks about probabilities,
  • 13:32which is to have a what they call
  • 13:35a probability distortion function.
  • 13:37So here I've also now written down explicitly
  • 13:40these probabilities of these outcomes,
  • 13:42so half, 1/4 and so forth.
  • 13:44And what you do with probably
  • 13:46distortion is to say I'm allowed
  • 13:49to multiply the values or change
  • 13:51the values of the nastier outcomes.
  • 13:54I boost those probabilities and I
  • 13:57suppress the higher probabilities,
  • 13:59and the idea inside this conditional
  • 14:03value at risk is that there's a
  • 14:05maximum value of possible distortion.
  • 14:08So if my alpha value is 7/8,
  • 14:10which means I'm interested in this
  • 14:11bottom 7/8 of the distribution,
  • 14:13it means I'm allowed to multiply
  • 14:16all my nastiest probabilities
  • 14:17by 8 / 7 by 1 over alpha.
  • 14:20And then I just keep on doing
  • 14:21that until I run out of Rd.,
  • 14:22until I run out of probability mass
  • 14:24because in the end it still has
  • 14:26to be a probability distribution.
  • 14:28So in this instance,
  • 14:29I multiply all these outcomes by a
  • 14:31weighting factor which is 8 sevenths here
  • 14:33until I then run out of run out of road.
  • 14:36And so then then that just leaves the
  • 14:38only these three bars as being something
  • 14:40which is contributing to my to my values.
  • 14:42And you can see that that's an exactly
  • 14:44equivalent to the three bars that we
  • 14:46have here in terms of the value at risk.
  • 14:48So these are equivalent ways of thinking
  • 14:51about, about thinking about this,
  • 14:53about the effect of of these tales.
  • 14:56And they're both very, I think,
  • 14:58very useful constructs to think
  • 15:00about the about these, these,
  • 15:03these these nasty possible outcomes.
  • 15:06OK, so just to summarise on,
  • 15:07on Sevar,
  • 15:08it's what's called a coherent risk measure.
  • 15:11And that's these axioms I was
  • 15:12referring to that that we want from
  • 15:14insurance which have to do with
  • 15:15things like you want the risk to
  • 15:17decrease if we diversify your assets,
  • 15:19something that's what the value
  • 15:21at risk does not have.
  • 15:23It emphasises the lower tail.
  • 15:25So we're always interested in the
  • 15:26nasty things that can happen.
  • 15:28If alpha's one,
  • 15:29it's the regular mean.
  • 15:30We just think about the overall mean of
  • 15:32the distribution that was the Infinity.
  • 15:33Here, as alpha tends to zero,
  • 15:35we only care about the worst possible case,
  • 15:38which is the the minimum that can happen.
  • 15:41And we have this nice equivalence
  • 15:43to these distorted these probability
  • 15:45distortion measures in which
  • 15:47we favour that outcomes.
  • 15:49OK,
  • 15:49so that's when we can see the whole
  • 15:51distribution in front of us like
  • 15:52you have in a regular gambling case.
  • 15:54You know if you're just specify
  • 15:56that what happens if we the way
  • 15:58we started thinking about this was
  • 16:00to think about the sequential case
  • 16:02where we spin the coin and then we
  • 16:04either get it either get a head or
  • 16:06tail and then we can spin the coin again.
  • 16:08So how does that work in this in this domain?
  • 16:10And you'll see a sort of surprise comes
  • 16:12up that we then have to cope with.
  • 16:14So here we started off with the first
  • 16:17flip of the coin and so these you know
  • 16:19if we get the tail we get to €2.00,
  • 16:21we get the head, we get a chance
  • 16:23to carry on to know and then we get
  • 16:25to chances to spin the coin again.
  • 16:27So and then if you spin the coin
  • 16:29again you get to know again if
  • 16:31you get a tail you get €4.00.
  • 16:33If you get the head,
  • 16:33you get, excuse me,
  • 16:34the chance to spin the coin again,
  • 16:36You spin the coin again,
  • 16:37you get €8 and then and so forth and
  • 16:40just carries on down and down and down.
  • 16:42So as I mentioned now what we want to
  • 16:45do when we're thinking about the the
  • 16:47risk is we distort our probabilities.
  • 16:49So we start at the beginning.
  • 16:51We say OK well now I said that
  • 16:53if alpha is 7 / 7 / 8,
  • 16:55we get to distort the properties
  • 16:57by 8 by by 8 / 7.
  • 16:58Then we can distort those
  • 17:00properties some maximum value,
  • 17:02which means that we make it
  • 17:03more likely to get the tail and
  • 17:05less likely to get the head.
  • 17:06So we make this bar the the the
  • 17:08left bar slightly higher and
  • 17:09the right bar slightly lower.
  • 17:11That's our distortion.
  • 17:12Our risk sensitivity has said,
  • 17:15OK, we think that even though
  • 17:16it should really be 5050,
  • 17:18the the real answer is 5050.
  • 17:20In our subjective evaluation of this,
  • 17:22we boost the nasty one and and
  • 17:24slightly suppress the the the nice
  • 17:26one and the amount that we suppress
  • 17:28it by then though is is is is also
  • 17:30reflected by the to to make sure
  • 17:32that the property is also up to 1.
  • 17:34So you might think it'd be
  • 17:36very natural thing.
  • 17:37Well,
  • 17:37now we have another choice and
  • 17:38we do the same distortion again,
  • 17:40and then we do the same
  • 17:41distortion again and so forth.
  • 17:43But that does actually
  • 17:46generate a a version of sebar,
  • 17:48but it doesn't generate the
  • 17:50version of sebar that we started
  • 17:51off with thinking about.
  • 17:52So here I say what you want to do is just
  • 17:54look only at the lower possible tail.
  • 17:56You can see that if we just
  • 17:58keep on distorting by the same
  • 17:59fraction every single time,
  • 18:00then we're going to actually get instead of
  • 18:03getting distorting the the tails like this,
  • 18:06we're actually going to get a
  • 18:07contribution from all the possible outcomes.
  • 18:09But now each of the outcomes instead
  • 18:12of instead of being boosted by,
  • 18:14instead of being going down like
  • 18:17one like a half 1/4 and so forth,
  • 18:19it tends to go,
  • 18:20it actually goes down like
  • 18:213737 squared and so forth.
  • 18:22There's a sort of technical reason for that.
  • 18:24You can see that that doesn't
  • 18:26have the property that I talked
  • 18:27about in which we just sort
  • 18:28of slice off this bottom,
  • 18:30this bottom aspect of the distribution.
  • 18:32It is a, it is a risk measure that
  • 18:33we some that we could also use.
  • 18:35And in fact in many cases it's a very,
  • 18:38it's a very severe risk measure.
  • 18:41It's a more severe risk measure.
  • 18:42But the measure we wanted to talk
  • 18:44about instead actually requires us to
  • 18:46do a different sort of calculation,
  • 18:47which I think is really important for
  • 18:50thinking about how risk processing
  • 18:51works in this this sequential way.
  • 18:53So instead what happens is after we've,
  • 18:57after we've boosted the, after we,
  • 18:58we're lucky and we we got ahead.
  • 19:00At this point, if you think about it,
  • 19:02we're trying to accumulate the
  • 19:04amount of luck that we can have
  • 19:06over a whole sequence of choices.
  • 19:07This is the sequential aspect.
  • 19:09And if we start off and we're already lucky,
  • 19:12it means we've already consumed
  • 19:13some of our good luck.
  • 19:14Which means that now we have to be a
  • 19:16little bit more risk averse in the
  • 19:18future in order that the total amount
  • 19:20of luck that we're expecting to get or
  • 19:22that good or bad luck we're expecting
  • 19:24to get is pegged to right at the beginning.
  • 19:27So that means that now
  • 19:29having been this much risk,
  • 19:30having been this lucky in this case,
  • 19:32we got our first tail,
  • 19:34we got Einstein first,
  • 19:35we now have to be a more risk averse.
  • 19:39So alpha started out at 7/8 and now it
  • 19:42turns out that it has to be boosted.
  • 19:44It has to be.
  • 19:45The amount of risk aversion
  • 19:46has to be boosted,
  • 19:47which means that the alpha value
  • 19:49decreases from being 7/8 to being 3/4.
  • 19:52So now when we do our probability distortion,
  • 19:55we're now we distort the we now make
  • 19:58it even more likely now with Four
  • 20:00Thirds more likely rather than rather
  • 20:02than 8 sevenths more likely that we're
  • 20:04going to get the unfortunate outcome,
  • 20:06which is the the the the tail in this case,
  • 20:10and we make it less likely that
  • 20:11we're going to get the head.
  • 20:13And now if we do get the head,
  • 20:14we've been lucky again.
  • 20:16We've consumed even more of our good luck.
  • 20:18And so now the we become even
  • 20:20more risk averse.
  • 20:21The alpha value goes down further to 1/2.
  • 20:25And so now when we do the distortion
  • 20:26it turns out we do maximal distortion.
  • 20:28So now the tail instead of being
  • 20:31probably 5050 in our minds it's gone
  • 20:34up to the probably has gone up to 1.
  • 20:36The probably getting the head,
  • 20:37the sorry the probably getting
  • 20:38the head has gone to zero.
  • 20:40And that is then means that we
  • 20:41therefore can never get the,
  • 20:42we never get any more further down the tree.
  • 20:45And so in order to compute the
  • 20:48Sivar in this way,
  • 20:50when we think about a sequential problem,
  • 20:52we have to keep on revaluing our alphas.
  • 20:55If we're lucky,
  • 20:56it means we become more risk averse,
  • 20:58which means alpha gets lower.
  • 21:00If we're unlucky,
  • 21:00it means in fact we can become
  • 21:02more risk seeking in the future
  • 21:04because we're sort of
  • 21:05trying to peg the total amount of risk
  • 21:07that we suffer along the whole path
  • 21:09along the way towards towards the end.
  • 21:12So there's this notion
  • 21:13here of pre commitment.
  • 21:15When we start the problem we think how
  • 21:17much risk are we willing to endure
  • 21:20or and then as we then are lucky or
  • 21:22unlucky we don't have to adjust the
  • 21:25way that we we endure this the way
  • 21:29that we evaluate future outcomes.
  • 21:32So in pre committed C bar we're
  • 21:34privileging a start saying we're
  • 21:35saying this is where we're defining
  • 21:37risk from because then because
  • 21:39we're then revaluing our alpha,
  • 21:40our risk aversion in order
  • 21:42to peg where we're going.
  • 21:43So you might think of that as
  • 21:44being like a home or a or a nest
  • 21:46for an animal for instance.
  • 21:47And then we have to change alpha
  • 21:49and the way we change it is like a
  • 21:51justified form of the gambler's fallacy.
  • 21:53If you're unlucky,
  • 21:54you've been unlucky for a while,
  • 21:56then you then in some sense
  • 21:57you can be more risk.
  • 21:59You can be more a little
  • 22:00bit more risk seeking,
  • 22:01you mean less risk averse.
  • 22:02If you've been lucky then you're expecting
  • 22:04to be more unlucky in the future,
  • 22:06so therefore your alpha decreases in that
  • 22:08way in order to peg the total amount
  • 22:11of risk you have along a whole path.
  • 22:13Alpha equals zero and one are special,
  • 22:15so alpha equals one is means.
  • 22:18It's just the mean and then
  • 22:19you never revalue that.
  • 22:20You just keep on without
  • 22:22value of alpha equals one,
  • 22:23alpha equals 0 is the minimum and you
  • 22:24stick with that too because you can
  • 22:26never you can never get more risk.
  • 22:27You know you you basically
  • 22:29if you you've run out of Rd.
  • 22:30you're always thinking about the
  • 22:32worst possible outcome that can
  • 22:34ever happen and so you have to
  • 22:35then in order to do this you don't
  • 22:37have to have this either.
  • 22:38So monitor how much luck you've
  • 22:41had along a path or we just think
  • 22:44about changing the value of alpha
  • 22:45as we go along and then we make it
  • 22:47in the way I showed you for Saint
  • 22:49Petersburg problem where we make
  • 22:51alpha where there we made alpha
  • 22:52smaller and smaller because we kept
  • 22:53on being lucky and lucky and lucky.
  • 22:55Every time we got the head until
  • 22:56the end we ran out of road and then
  • 22:58we ran out of the at the after the,
  • 23:00you know, evaluation of this,
  • 23:01we ran out of at the third outcome.
  • 23:05So how does that look in a more
  • 23:07conventional sort of random walk?
  • 23:09So here's a simple random walk where
  • 23:11we have a agent which can go left or right,
  • 23:15or try to stay where it is.
  • 23:16There are two rewards,
  • 23:17one on the right hand side,
  • 23:19a small reward worth +11,
  • 23:21on the left hand side worth +2.
  • 23:24And then here's one of Chris's Lava pits,
  • 23:25which is,
  • 23:26which is threatening.
  • 23:28And you have again a small
  • 23:30probability of an error
  • 23:31in the choices. So here if you
  • 23:33have completely uniform choice,
  • 23:35you go left, right or try to
  • 23:36stay where you are equally often.
  • 23:38Then if this is our start state,
  • 23:40this is the distribution of outcomes
  • 23:42you would actually get with some
  • 23:43with a discount factor of .9.
  • 23:45So then because in the end you
  • 23:46get trapped by the lava pit and
  • 23:47then that's the end of the,
  • 23:49that's the end of the game.
  • 23:50And so here from the stored state,
  • 23:52this is the distribution.
  • 23:53So we're thinking about C bar,
  • 23:54We're obviously thinking about
  • 23:55the tails of this PC bar.
  • 23:57We're thinking about the tails of
  • 24:00this distribution to think about.
  • 24:02So how can we evaluate the
  • 24:03locations in this in this world?
  • 24:05Well, if you have the this uniform
  • 24:08policy and here our alpha value is 1.
  • 24:11So we're just a regular reinforcement
  • 24:13learner thinking about the average
  • 24:14value of each of the states.
  • 24:16So you can see that here I've shown
  • 24:17them in colour from -10 up to plus 10.
  • 24:19So the ones on the right are relatively
  • 24:21good because you have this reward of
  • 24:23one it you tend to a while before
  • 24:24you you end up in the in the lavapia,
  • 24:26which means that that value
  • 24:28is discounted by a lot.
  • 24:30If alpha is 0 you always think the worst
  • 24:32possible thing can happen will happen.
  • 24:35So the way I'm showing you that
  • 24:36is there are these grey arrows
  • 24:38here and so though inside this,
  • 24:40inside these, inside these the choices,
  • 24:43it says how frequently you try to go left,
  • 24:46right or or stay where you are.
  • 24:49The re weighting system says,
  • 24:51well I'm going to think about the outcome,
  • 24:52which is the worst possible outcome
  • 24:54because my alpha is 0 and that puts all
  • 24:56the weight on going left because the
  • 24:58nastiest thing that can happen is going left.
  • 25:00And so here you can see that all
  • 25:02the values are then much much worse,
  • 25:04and indeed you then just go left.
  • 25:05Every time you just end up in the lava pit.
  • 25:08And then in for intermediate values.
  • 25:10You can see intermediate values of alpha,
  • 25:13you can see how states get evaluated.
  • 25:15And again you can see this effect.
  • 25:17When I said that if you are lucky,
  • 25:20that means in this instance that
  • 25:21means you're going white.
  • 25:22Because right states are better,
  • 25:24then you tend to decrease your
  • 25:26value of alpha.
  • 25:26So these these arrows,
  • 25:28these little grey arrows,
  • 25:29outside the choices that you make,
  • 25:31they tend to point downwards.
  • 25:33If you're unlucky,
  • 25:34which in this instance
  • 25:35means going going left,
  • 25:37then you tend to become a bit more,
  • 25:38you become a bit less risk averse,
  • 25:40which means that the arrows
  • 25:42then point upwards.
  • 25:43And so you can see that as we become
  • 25:45more and more risk averse so this
  • 25:46alpha value we have this very nice
  • 25:48way of looking at the the changes of
  • 25:50how states go from being on the right.
  • 25:52For instance go from being good
  • 25:54to being
  • 25:55go to from being good to being bad.
  • 25:58So you don't only have to
  • 26:00think about evaluation here,
  • 26:02you can also optimise your policy
  • 26:04based on the on your risk aversion.
  • 26:06You try to optimise say what's
  • 26:09the policy which maximises my my
  • 26:11this pre committed C var value
  • 26:13with a given value of alpha.
  • 26:16So if your alpha is 1, then then,
  • 26:22then, then the risk averse.
  • 26:24You're not risk averse at all,
  • 26:25you're just thinking about the mean.
  • 26:26We designed it such that the
  • 26:28from the start state here,
  • 26:29if alpha equals one,
  • 26:30the best thing you can do is just
  • 26:32to go left and you can try and stay
  • 26:35at the at the reward is worth 2 and
  • 26:37as long as you can and that's then
  • 26:39a way of maximizing your reward.
  • 26:41If alpha equals zero, you try.
  • 26:44Well, the IT actually doesn't
  • 26:45matter at all what you try to do,
  • 26:47because there's a chance that if you try,
  • 26:49if you try to stay where you are,
  • 26:51you'll know less will go left.
  • 26:52If you think about the worst outcome,
  • 26:54it's always to go left.
  • 26:55And so you can see that the
  • 26:57alpha value equals 0.
  • 26:57Here,
  • 26:58the optimum policy is just the
  • 26:59same as the uniform policy or
  • 27:01any other policy as well.
  • 27:02You'll always go left.
  • 27:03So in fact this is sort of a form
  • 27:05of learned helplessness where
  • 27:07although you really have control
  • 27:08in this world and some control in
  • 27:10this world because you think about
  • 27:12the worst thing that could happen,
  • 27:14you sort of don't trust your own control.
  • 27:17And therefore you think the the worst
  • 27:18thing that could happen will happen.
  • 27:20And thereby therefore it doesn't
  • 27:21matter what you do, you can't.
  • 27:23There's nothing you can do to
  • 27:25mitigate that that chance and
  • 27:26then in the middle so here we had
  • 27:28this the pre commitment remember
  • 27:30is relative to a start state.
  • 27:32So here our start state is this is
  • 27:34this at alpha equals .3 and you
  • 27:36can see again that now we have a
  • 27:39policy where you know in this in
  • 27:41this particular domain the optimal
  • 27:43policy at that start state is to
  • 27:45go right rather than to go left
  • 27:47because of the problems of the risk.
  • 27:48And then as you as then this is
  • 27:51what you you try to do.
  • 27:52And then and then you try to
  • 27:54stay here as long as you can.
  • 27:55And so you can see that,
  • 27:56as you might expect for everywhere
  • 27:59else in the in this random walk,
  • 28:02apart from the value alpha equals zero,
  • 28:04you have a better outcome.
  • 28:07You have all these values.
  • 28:09All the values of the optimum
  • 28:10policy are much better than the
  • 28:12values of the uniform policy here,
  • 28:14except for this long nastiest
  • 28:15possible outcome,
  • 28:16nastiest possible degree of risk
  • 28:17aversion where you're where you
  • 28:19just think whatever terrible
  • 28:20happened will happen no matter what.
  • 28:25I should just say so.
  • 28:26There's also this this NC,
  • 28:28this other mechanism which doesn't
  • 28:30pre commit to a value but instead
  • 28:32just sticks at a particular
  • 28:34value of alpha the whole time.
  • 28:36That's what I showed you in the in
  • 28:38the Saint Petersburg paradox where
  • 28:39you just waited the the the heads and
  • 28:42tails the same way every single time.
  • 28:43So in this domain that actually
  • 28:45turns out to be for alpha equals one,
  • 28:47it's the same as PC bar for alpha which
  • 28:49is just the mean for alpha equals 0.
  • 28:51Again it just focuses on the minimum,
  • 28:53the worst thing that can happen and
  • 28:55so it also looks the same but in
  • 28:58between in for intermediate values.
  • 28:59Then you can see you can see you
  • 29:02can again get evaluations of states.
  • 29:04And in this instance it turns out
  • 29:06that this NC bar mechanism here
  • 29:08is a generally more risk averse,
  • 29:12so the values are worse than
  • 29:14the values for the PC bar.
  • 29:16So that's not true in the Saint
  • 29:18Petersburg paradox because in
  • 29:19that problem the only way you get
  • 29:20to carry on is by being lucky,
  • 29:22whereas in this problem you can be
  • 29:24lucky or unlucky as you as you carry on.
  • 29:26And then in PC bar if you're
  • 29:28unlucky then you become less,
  • 29:30you become less risk averse.
  • 29:32Whereas in the Saint Petersburg
  • 29:34paradox or in the bot task,
  • 29:37every time you continue you must
  • 29:39have been lucky and therefore you
  • 29:41become more risk averse and so
  • 29:43therefore relatively you the there's
  • 29:46a greater degree of risk aversion.
  • 29:47It's Peterborough paradox.
  • 29:48Whereas in these sorts of other problems,
  • 29:50NC bar is is generally more risk averse.
  • 29:53In these sorts of cases you see
  • 29:55that by these values all being more
  • 29:58red than the than the other ones.
  • 30:00So and then you can work out
  • 30:02the optimal policy has the same
  • 30:04similar characteristics.
  • 30:05OK,
  • 30:06so let's come back to our lava pits where
  • 30:09we had these these cases where we had,
  • 30:12excuse me,
  • 30:13where we where we gave
  • 30:14our subjects this chance,
  • 30:16we we showed them this and
  • 30:17asked them how they would move.
  • 30:19And so we designed this domain so
  • 30:20that it would start to distinguish
  • 30:22different values of alpha.
  • 30:23So different values of risk aversion as
  • 30:25a way of interrogating what subjects
  • 30:27would be like in these in these cases.
  • 30:30So it turns out that the this most direct
  • 30:33path is associated with alpha equals one.
  • 30:36So if you are risk neutral then you would
  • 30:39take this what this this rather risky path.
  • 30:42If your value of alpha is about 0.5,
  • 30:45which means you just think about the
  • 30:47bottom 50% of that distribution,
  • 30:48then you take this intermediate path.
  • 30:50You tend to take this intermediate
  • 30:52path like this and then if you're
  • 30:54much more risk averse,
  • 30:55you care about the bottom 15%
  • 30:56of the of the outcomes,
  • 30:58then you take this,
  • 30:59this much more extreme risk aversion here.
  • 31:01And I think it's interesting
  • 31:02as one of these cases
  • 31:04where it's very hard when you see how
  • 31:06somebody in your lab you know performs this.
  • 31:08If you're a sort of 0.4 a person,
  • 31:11it's very hard to imagine somebody
  • 31:12who would be so risk of so risk
  • 31:14seeking as to take the very short one.
  • 31:16Or if you're the person who takes this very,
  • 31:17very long path, you think it's you think
  • 31:19you know how could anybody take these,
  • 31:20these these short paths themselves.
  • 31:22So I think there's some interesting
  • 31:24phenomena that come up with this.
  • 31:26So we administered 30 of these mazes to
  • 31:30mazes like this to a a group of subjects
  • 31:33and we designed them in order to,
  • 31:35you know, in order to look at
  • 31:36things like how consistent was an
  • 31:38individual subject in the way that
  • 31:40they would be risk averse in these,
  • 31:41in these, in these domains.
  • 31:43And we saw a very nice
  • 31:45degree of of of consistency.
  • 31:48So if it's here,
  • 31:49you can see one another of these
  • 31:50mazes where the start stage is here,
  • 31:52the goal is here.
  • 31:53And so again we have a very sort of a
  • 31:56path which is for the people who are
  • 31:59pretty risk neutral would take which
  • 32:01gets close to these two lava pits.
  • 32:03You have this intermediate path
  • 32:04which is longer,
  • 32:05which is why it would be less favoured,
  • 32:07but only goes close to one
  • 32:08of these lava pits.
  • 32:09And then we have an an even
  • 32:11longer path which looks,
  • 32:12which goes all the way around here
  • 32:14to get to the goal which really
  • 32:16avoids these lava pits dramatically.
  • 32:17And so these are three individual
  • 32:20subjects and so these choices
  • 32:22were themselves associated with
  • 32:23three different values of alpha,
  • 32:25point,
  • 32:26you know like point 2.5 and point 2.9 or so.
  • 32:29And then in another maze the
  • 32:30the the behaviour of the same
  • 32:33subject in a different maze.
  • 32:34So here this is a bit like a Cliff.
  • 32:36There's just two other pits here.
  • 32:38The question is how far around
  • 32:40you know around them do you go.
  • 32:41So one option is just to go directly
  • 32:43to the goal from the start say here to
  • 32:45the goal that's most no risk neutral.
  • 32:48Here's one which is a bit
  • 32:49a bit more risk averse.
  • 32:50You can think well how far away from
  • 32:52the the Cliff you would you would
  • 32:54you choose to be there yourself.
  • 32:55And again,
  • 32:56it's very hard if you're a sort
  • 32:57of risk neutral person to think,
  • 32:59well,
  • 32:59how is it crazy to go so far
  • 33:01away from the from the goal.
  • 33:03We took these 30 mazes that we administered.
  • 33:06We looked at the first half and
  • 33:08the second-half inferred the
  • 33:09values of alpha that our subjects
  • 33:11had for those for those mazes by
  • 33:14fitting the choices that they made.
  • 33:16And you can see that we had a reasonable
  • 33:18degree of consistency between the
  • 33:191st 15 mazes and the 2nd 15 mazes.
  • 33:21So this shows the the alpha,
  • 33:23the peak,
  • 33:24the map out of the,
  • 33:25the the the maximum likelihood
  • 33:27alpha value for the first
  • 33:28and second-half of mazes.
  • 33:30So we see that they are reasonably
  • 33:32well pinned and indeed the the means
  • 33:34are fairly similar to and then if we
  • 33:36look at the across all our subjects.
  • 33:39So now this axis shows
  • 33:40you the value of alpha.
  • 33:41This is now the this is the the
  • 33:45posterior value of alpha
  • 33:46across all the toss we have.
  • 33:48So you know, you know hierarchical fit.
  • 33:50And then we just ordered the subjects
  • 33:51by from alpha the people with the
  • 33:53smallest value of alpha to people
  • 33:55with the largest value of alpha.
  • 33:56And you can see that we nicely
  • 33:58cover the range of possible alphas
  • 33:59in this in this domain and some
  • 34:01people we can't infer alpha so well
  • 34:03just from these these these plots.
  • 34:05And so you can see that then we also in
  • 34:07order to fit them, fit their behaviour.
  • 34:09We have a couple of other statistics as well.
  • 34:12We have they have a temperature,
  • 34:13so an inverse temperature,
  • 34:14or temperature, which is how noisy
  • 34:16is their behaviour generally,
  • 34:18and then a lapse rate which says that
  • 34:19sometimes they try to, they know.
  • 34:21We imagine they might try to go north,
  • 34:22but perhaps they just,
  • 34:23you know, by mistake,
  • 34:24go in a different direction too.
  • 34:25So these are very standard things
  • 34:27you'd have in a model of their behaviour.
  • 34:29But the thing we're focusing on
  • 34:31indeed is this risk sensitivity,
  • 34:32which is then just a histogram of the values
  • 34:34that we can infer from there and ourselves.
  • 34:36It's a nicely aligned,
  • 34:38nicely arrayed across the different
  • 34:40possible values of alpha as you can see.
  • 34:44So we then try to interrogate our
  • 34:47mechanism for changing values of alpha.
  • 34:49And here we had what to us was a bit of
  • 34:52a surprise in terms of what happened.
  • 34:54So here what we're looking at is how
  • 34:57did alpha change on if on one trial,
  • 35:00one maze,
  • 35:01you've got a you've got a win or
  • 35:04the OR the OR or you've got a loss.
  • 35:06So mostly So what this shows,
  • 35:08as we said, if we then infer the
  • 35:10value of alpha on one maze,
  • 35:12if you then one on that maze,
  • 35:13what happens to the next value of alpha?
  • 35:15Are you more risk averse or
  • 35:17more risk seeking on that case?
  • 35:19And so from the PC bar
  • 35:20mechanism I talked about,
  • 35:21what we would have expected is if you are
  • 35:24lucky on that case you didn't get the maze,
  • 35:26you'd become more risk averse.
  • 35:27Next what we actually saw was
  • 35:29the opposite interestingly which
  • 35:30is that after a lava pit,
  • 35:32so after you saw a after you got trapped
  • 35:35in one maze then in fact you became a
  • 35:38bit more risk averse in the next maze.
  • 35:40And so we're we're sort of
  • 35:42contemplating why that might be.
  • 35:43We did see A and and and we are also
  • 35:46looking inside the choices you make
  • 35:48inside a single maze because if you
  • 35:51remember we have noisy actions so
  • 35:53sometimes you're lucky or unlucky
  • 35:55inside a single maze and they do see
  • 35:57APC bar like effect which is that if
  • 35:58you've been lucky then in the future
  • 36:00you're more a little bit more risk
  • 36:02averse and if you've been unlucky
  • 36:03you've been a little bit less risk averse.
  • 36:05So there's a conflict between
  • 36:07different time scales of how
  • 36:09of how this is operating.
  • 36:10And that conflict also comes up a little
  • 36:13bit when we look across the the the
  • 36:15first and second-half of these mazes,
  • 36:17the 1st 15 mazes versus the 2nd 15 mazes.
  • 36:20Whereby if you had the more losses,
  • 36:23if you had more losses in the first half,
  • 36:25we can ask are you more risk averse and
  • 36:27more risk seeking in the second-half.
  • 36:29And there's some small evidence that
  • 36:30in on average or a bit more risk
  • 36:32seeking in the second-half and you've
  • 36:34had more losses in the first half.
  • 36:35So that suggests that this phenomenon
  • 36:37which is a trial like a maze to maze
  • 36:40effect may itself not completely generalise
  • 36:42over the whole context of the mazes.
  • 36:44So really some interesting things to
  • 36:47investigate in this in this domain.
  • 36:50OK it's an interim summary.
  • 36:51So what we have,
  • 36:52what I'll try to show you is this
  • 36:54sort of parametric risk avoidant
  • 36:56behaviour which can come from this pre
  • 36:58committed PC bar and pre commitment.
  • 37:00Is that you think,
  • 37:01well how much risk am I willing?
  • 37:03How much you know?
  • 37:04Which part of this distribution
  • 37:05am I willing to think about right
  • 37:07from the beginning.
  • 37:08And that requires you to
  • 37:09have this gambler's fallacy.
  • 37:10So change the value of alpha as you
  • 37:13as as you are unlucky or unlucky.
  • 37:15So obviously the inference is a
  • 37:16little bit more complicated here,
  • 37:18but in fact many ways almost every
  • 37:20way that we have of thinking about
  • 37:22risk in the sequential case is
  • 37:24going to rely on a more complicated
  • 37:26way of doing evaluation.
  • 37:27Because you know for instance if you
  • 37:29have a non linear a utility function,
  • 37:32then if you think about my
  • 37:33total utility on a path,
  • 37:34you're going to have to monitor what
  • 37:36that total utility you know which is
  • 37:37how you which is the non linearity.
  • 37:39Then you're going to have to monitor,
  • 37:40you're going to have to modify your,
  • 37:42you're going to have to monitor the
  • 37:44total utility so that you can then
  • 37:46manipulate it in this non linear way.
  • 37:48You also see in prospect theory
  • 37:50for instance as well,
  • 37:51if we have this nested what
  • 37:53we sometimes call NC bar,
  • 37:55that's the one where we just fix
  • 37:57the value alpha and just apply the
  • 37:58same value as you go down and down,
  • 38:00then in some cases you can
  • 38:02get excessive risk aversion.
  • 38:03So in the random walk that we saw
  • 38:05there and then again we we can
  • 38:07still think about that at different
  • 38:10values of alpha itself.
  • 38:12We think that there's we're now
  • 38:15worrying about indeterminacy
  • 38:17between your prior expectation,
  • 38:19for instance getting caught in
  • 38:20the maze by a lava pit versus
  • 38:22the degree of risk aversion.
  • 38:24And those two work opposite to each other
  • 38:26in terms of the in terms of PC bar.
  • 38:29So you get caught.
  • 38:30That increases your prior to the
  • 38:32possibility of getting caught,
  • 38:33but it also increases the value of alpha,
  • 38:36makes you a little bit less risk averse.
  • 38:38And so those two things are
  • 38:39fighting with each other we think
  • 38:41in the context of these mazes.
  • 38:43And of course it would be interesting
  • 38:44to look at ambiguity as well as risk.
  • 38:45So here all I did talked about is
  • 38:47cases where you know the probabilities
  • 38:49are frankly expressed as subjects
  • 38:50know exactly what the probability
  • 38:52is of getting caught by the the,
  • 38:54the sorry, they know exactly probably
  • 38:55of having a lapse in terms of the
  • 38:58the way that they move in the maze.
  • 39:00They know the values of everything.
  • 39:01We didn't make it ambiguous.
  • 39:03But of course ambiguity as a sort
  • 39:04of 2nd order probability also makes
  • 39:06you gives you an extra aspect of
  • 39:09probability that you don't know.
  • 39:10And so then if you think about the law,
  • 39:13so a tale of those properties you
  • 39:15don't know that's a way of inducing
  • 39:17ambiguity aversion because of the of
  • 39:19the extra uncertainty that you have,
  • 39:20the 2nd order uncertainty you
  • 39:22have in those cases too.
  • 39:24From a psychiatric point of view,
  • 39:27you what you can see is a sort of an aspect
  • 39:29of sort of pathological avoidance right here.
  • 39:32The way you're evaluating what
  • 39:33could be a relatively benign world
  • 39:35is you're thinking about all the
  • 39:36nasty things that can happen.
  • 39:38That's what that's what what what is
  • 39:40becomes really critically important.
  • 39:41And then if you're living in
  • 39:43a stochastic environment,
  • 39:44which of course we we all do,
  • 39:46then if you're really extremely risk averse,
  • 39:48so alpha is really near to zero,
  • 39:50then that's a route to indifference
  • 39:52or helplessness.
  • 39:53Because it doesn't matter what you try to do,
  • 39:55you're always worried about the
  • 39:56nastiest thing that can happen.
  • 39:58So that makes life super complicated.
  • 40:02OK, so that's online behaviour.
  • 40:04So, so here we think about planning.
  • 40:06We won't imagine what are our subjects
  • 40:08doing as they're thinking about how to move
  • 40:10in that maze with the with the choices.
  • 40:12So there we can do what as Phil
  • 40:14mentioned at the beginning as sort
  • 40:16of forms of something a bit like say
  • 40:18model based reinforcement learning
  • 40:19where we have a model of the world
  • 40:21and we're planning in that model.
  • 40:22We're thinking about the risk that
  • 40:24accumulates along these paths and
  • 40:26changing these values of alpha as we go.
  • 40:28But there's a lot of interest at the
  • 40:30moment in also thinking about offline
  • 40:32processing that can happen during periods of,
  • 40:34for instance,
  • 40:35quiet wakefulness or sleep in animals.
  • 40:37Also into in into trial intervals in in
  • 40:40humans that we've been looking at too.
  • 40:42And so the idea has been that
  • 40:45there's a coordinate,
  • 40:46that there's hippocampal and cortical
  • 40:48replay which themselves are coordinated,
  • 40:50which can be used to do
  • 40:52aspects of offline planning.
  • 40:54Which is to say that we normally
  • 40:56think about a model of the world
  • 40:58that's like a generative model
  • 41:00of the of the environment.
  • 41:01The inverse of that model is a policy.
  • 41:04It's like what should I do in the
  • 41:06environment in order to optimise
  • 41:08my my return or optimise my C
  • 41:10bar return? And so in that case,
  • 41:11the inverse of the model is something
  • 41:13you can calculate offline when you're
  • 41:15not having to use the model to make your
  • 41:17choices as it as it as it as it goes.
  • 41:19And there's evidence in both rodents
  • 41:21and also in humans in the last few
  • 41:25years using typically using Meg that
  • 41:28subjects are actually engaging in
  • 41:30offline processing which actually
  • 41:31has an impact on their behaviour
  • 41:33when it happens in the future.
  • 41:35So in the reinforcement learning world,
  • 41:37this has been closely associated with
  • 41:39an idea from Rich Sutton in the 90s
  • 41:41called Dyna where he thought about
  • 41:43offline processing this replay like
  • 41:45processing to enable exploration and
  • 41:47then got embedded in in in the sort of
  • 41:50forms advanced forms of reinforcement
  • 41:51learning for for in AI in replay
  • 41:54buffers for things like the DQN.
  • 41:56So deep Q learning the networks that for
  • 42:00instance DeepMind used very successfully
  • 42:02for things like Alphago to win it go.
  • 42:05And then slightly more recently,
  • 42:06there's a lovely paper from Marcelo
  • 42:09Mata and Nathaniel Door which was
  • 42:12was speculating that that the replay
  • 42:14that we see in rodents might be
  • 42:16optimised to improve the the way
  • 42:18that that these rodents are planning
  • 42:20in the in the environment.
  • 42:22So given that they discover something
  • 42:23about the world they discover like
  • 42:25a reward they didn't know about or
  • 42:27maybe they've forgotten then then
  • 42:28they have to do some relearning.
  • 42:30Then what Matter and Dole suggested is
  • 42:33that the sequence of which the animal
  • 42:36engages in replay well is informative,
  • 42:38is chosen in order to optimize the way
  • 42:40that the animals will then subsequently
  • 42:42move through the world using a
  • 42:44simpler way of making doing planning.
  • 42:46And they pointed out that that you should
  • 42:48choose to make updates to your model
  • 42:50based on the product of 2 quantities,
  • 42:52gain and need.
  • 42:53So gain is if you were to do a replay
  • 42:56at a particular location in the main,
  • 42:58maybe somewhere where you're not
  • 43:00you have this motion of distal
  • 43:01replay near the Campbell world.
  • 43:03Then the game is how much you would
  • 43:06change your policy if you made an update.
  • 43:08So there's no point in making an update.
  • 43:10It is not going to change your
  • 43:12actions because it will have no
  • 43:14impact on your final return and
  • 43:15the need is how frequently you're
  • 43:17going to visit that state in the
  • 43:19future given your current policy.
  • 43:21So it turns out the product of those two
  • 43:23governs the sequencing that you should
  • 43:25apply to looking at states in the world.
  • 43:27And so if you think about,
  • 43:28you know you discover something,
  • 43:29how should you go about planning
  • 43:31using during this offline,
  • 43:32during these offline cases.
  • 43:34So we thought about, well,
  • 43:37what does optimal planning
  • 43:38look like for Seva?
  • 43:40You have if you're risk risk averse.
  • 43:42So here,
  • 43:44excuse me,
  • 43:44we're showing again another simple
  • 43:46domain where you have a start state.
  • 43:47There's just a single word
  • 43:49at this location here and
  • 43:50there's one of these
  • 43:51lava pits at the at here.
  • 43:53But what these numbers show is if all
  • 43:55you know about is where you start,
  • 43:57you have a model of the world,
  • 43:59but you don't and you know about the
  • 44:00law of a pit and the and the reward,
  • 44:02but you don't know how to plan.
  • 44:03You haven't got a plan of what to do.
  • 44:04We're thinking of the replay
  • 44:06in matter and door world.
  • 44:07The replay is constructing that plan
  • 44:10for you by by by essentially focusing
  • 44:12on a state in the world and then
  • 44:15doing a little little Bellman update.
  • 44:17Just one step of reinforcement learning
  • 44:19and the steps the the order of the
  • 44:22steps is shown by these numbers.
  • 44:23So it turns out that if you prioritise
  • 44:26based on on being risk neutral and
  • 44:29what I mean by prioritisation here is
  • 44:31you're thinking about what planning
  • 44:33should I do that has the most effect on
  • 44:35the value of the start state because
  • 44:37that's the value where you're you're
  • 44:39where you're where you're beginning.
  • 44:41So it turns out that in the if you
  • 44:44prioritise based on this neutrality you
  • 44:46for some reason you do one step at the
  • 44:49this location away from the lava pit
  • 44:52and then all the subsequent steps you do,
  • 44:54in this case the subsequent 7 steps or
  • 44:57seven six steps essentially plan in
  • 44:59this instance backwards from the goal
  • 45:01from the reward back to the beginning.
  • 45:03And this notion about backward sequencing
  • 45:06like reverse replay in in the in the
  • 45:09in the hippocampal world is also seen
  • 45:11in something called Prioritised sweeping,
  • 45:13which is an old idea in Reinforcement
  • 45:16Learning from Andrew Moore where
  • 45:18you'd optimise the the sequence of of
  • 45:21updates you would do if you prioritise
  • 45:23instead based on a value of alpha,
  • 45:25which is much lower,
  • 45:26so much more risk averse.
  • 45:28Now you can see that you spend
  • 45:30all your planning time instead of
  • 45:32planning how to get to the reward.
  • 45:34You spend all your planning time
  • 45:36thinking about the about the lava pit,
  • 45:38thinking about where you can.
  • 45:39You know how to avoid the lava
  • 45:40pit if you were there,
  • 45:41so the first is the same one,
  • 45:43but then all the subsequent ones
  • 45:44are all avoiding the lava pit and
  • 45:46have nothing to do with getting to
  • 45:48the reward So you can see how you're
  • 45:50even the structure of of thinking
  • 45:52offline is going to be really could
  • 45:54could get really dominated by the
  • 45:56by these nasty things that could by
  • 45:58the nasty things that could happen.
  • 45:59And if alpha equals 0,
  • 46:01there's no point in doing planning
  • 46:02at all because you can't mitigate
  • 46:04the child the the risk of getting
  • 46:05to the log pit as well.
  • 46:06So you just sit there and do
  • 46:08you just can't help yourself.
  • 46:11So as I mentioned,
  • 46:12this is not only for humans.
  • 46:14So there's a lovely study that
  • 46:15comes from the from Mitsuko Wataba,
  • 46:18Yushida's Yushida's lab,
  • 46:19where she's a very simple task
  • 46:22for for for mice.
  • 46:24So here she had a simple arena,
  • 46:27just an open like an open
  • 46:29field arena shown here.
  • 46:30And then the mice were put
  • 46:32in for a couple of days.
  • 46:33There's nothing there.
  • 46:34They had 25 minutes for a
  • 46:36session just to run around.
  • 46:37And here's some here's a path of a,
  • 46:39of a, of a one of the mice just
  • 46:41running around this this maze.
  • 46:42Then on the third day after this habituation,
  • 46:45Mitsuko put in a novel object,
  • 46:48just basically a bunch of Lego
  • 46:49blogs near to one corner of
  • 46:52the of the environment and then
  • 46:54monitored how the animals,
  • 46:55then what what the animals then
  • 46:56did over the subsequent days,
  • 46:58so subsequent 4 days with
  • 46:59this same novel object in the
  • 47:01same location of the maze.
  • 47:03And you can see even just eyeballing
  • 47:05the the trajectories that the
  • 47:07animal have this really interesting
  • 47:10mix of essentially neophobia and
  • 47:12neophilia and neophobia is much
  • 47:13more much more apparent here.
  • 47:15So it changes really the structure
  • 47:16of the of the of the movement
  • 47:19through the environment.
  • 47:20So for various reasons,
  • 47:21Mitsuko characterized being within
  • 47:237 centimetres of the object as being
  • 47:25sort of a critical distance as where
  • 47:27the animal is is sort of inspecting this,
  • 47:30is inspecting this object.
  • 47:31And then what what she's showing
  • 47:33here is how much per minute of
  • 47:35these 25 minutes in each of these
  • 47:37sessions does the animals spend
  • 47:39within 7 centimetres of the object.
  • 47:41So in the habituation days is just
  • 47:42within 7 centimetres of that circle.
  • 47:44That's this circle shown here.
  • 47:45And you see that that, you know,
  • 47:47the animals spent some time there.
  • 47:48But there's nothing,
  • 47:49there's nothing failing those locations here.
  • 47:51When she puts in the novel object,
  • 47:54you can see that then that
  • 47:55really dramatically changes the
  • 47:56structure of behaviour.
  • 47:57And here she's ordered the animals that
  • 48:00like 26 animals by the amount of total
  • 48:02time they spend near the near the object.
  • 48:04So these animals,
  • 48:05these early animals spend a sit barely
  • 48:08anytime near the object at all.
  • 48:09These animals which are late here,
  • 48:12they spend much more time near to the
  • 48:14object than the than the first ones do.
  • 48:16And so there's a sense in which
  • 48:17these are very risk averse animals.
  • 48:19They had what we would think of
  • 48:21as being this low value of alpha,
  • 48:22whereas these animals are much more,
  • 48:24much less risk averse,
  • 48:25They're much more willing to go
  • 48:27get close to the to the object.
  • 48:29And so you can see that the way that
  • 48:31they approach the object is also changes.
  • 48:33So here you can see that in the
  • 48:34first day of the object they
  • 48:36what she's done is used.
  • 48:37They use deep lab cut from the mathesis
  • 48:40to classify whether the animal has
  • 48:42his nose pointing to the object
  • 48:43or the tail point of the object.
  • 48:45You see in the early days the animal only
  • 48:47has what they call cautious approach,
  • 48:49so only approaches the object with
  • 48:51its nose in front and its tail behind.
  • 48:54Then over time the animals are then
  • 48:55more willing or some of the animals
  • 48:56are more willing to just engage the
  • 48:58object that they're not protecting
  • 48:59their tail in this particular way.
  • 49:01Very appropriate for tail
  • 49:02risk as you can imagine.
  • 49:04So if we look at the frequency of approach,
  • 49:07so frequency per minute of
  • 49:09approach with the tail behind,
  • 49:11you can see that the that the
  • 49:14all the animals are here.
  • 49:15Again this is set up segmented
  • 49:17into these sessions.
  • 49:18So all the animals start
  • 49:19off with their tail behind.
  • 49:20So this is this cautious approach and then
  • 49:23again using the same sort of the animals,
  • 49:25so the same sorting between one and 26.
  • 49:27You can see that the animals who are timid,
  • 49:29who don't approach the object
  • 49:30they are or barely approach to
  • 49:31spend any time near to the object.
  • 49:33They also never risk their tail.
  • 49:35So their tail is always but is always
  • 49:37they they they're spending no time
  • 49:38with their tail exposed whereas the
  • 49:40brave animals these ones down at the
  • 49:42bottom they not only spend more time
  • 49:44near the object they also do it with
  • 49:46their their tail exposed in this way.
  • 49:48But we were very struck by this huge
  • 49:50individual differences in the in
  • 49:51the in the way that these animals
  • 49:53approach the object and so we're
  • 49:55interested in in in modelling that
  • 49:56so at Kitty Egal they they they
  • 49:59characterize various aspects of the
  • 50:01of the behaviour so the fraction of
  • 50:04time they're close to the object.
  • 50:06I showed you that already here showing
  • 50:07with confident and cautious approach.
  • 50:09So cautious in green, confident in blue.
  • 50:12And again you can see with their
  • 50:13sort of the animals that there's
  • 50:14only green at the top when there's
  • 50:16some blue at the bottom.
  • 50:17And this is only showing the days.
  • 50:18Since the only showing the days
  • 50:20off the object has been evaluated.
  • 50:23You can look at the how long they
  • 50:24spend near the object and again you
  • 50:26can see that that's shown again
  • 50:28shown by this colour.
  • 50:29So the brave ones spend a lot of time,
  • 50:30the the timid ones spend very little
  • 50:32time and how frequently they visit
  • 50:34the object, they they go there.
  • 50:36And again the brave ones visit frequently
  • 50:38the the timid ones are barely visited at all.
  • 50:42So it goes a model of this,
  • 50:43but I'm not going to,
  • 50:44I haven't got time to go through
  • 50:45all the details of the model,
  • 50:46but just to just to give you the,
  • 50:47the, the hint of what's inside it.
  • 50:49So why do they visit the object at all?
  • 50:51Well, that's Neophilia.
  • 50:52They're interested.
  • 50:52There's an exploration bonus we imagine
  • 50:54which is associated with that and we
  • 50:56imagine that this exploration bonus
  • 50:58replenishes as if they don't know,
  • 50:59they don't know that the object is not,
  • 51:01is not,
  • 51:01is not never actually gives them
  • 51:03a real return, right.
  • 51:05The object is just a bunch of Lego blocks.
  • 51:06There's no food or anything
  • 51:08positive associated with it
  • 51:09and we imagine that when the animals
  • 51:12have due confidence approach they
  • 51:13they can stay enjoy more than
  • 51:15they consume the reward faster.
  • 51:17Then we have a hazard function.
  • 51:19Why are they neo phobic?
  • 51:20Well they're why that maybe at some
  • 51:22point a predator or something is
  • 51:24going to jump out from this object
  • 51:26or something naughty might happen
  • 51:27and we imagine that that increases
  • 51:29over time spent near the object.
  • 51:31So the longer they spend near the object,
  • 51:33the more that they're worried
  • 51:35about predation.
  • 51:35And then that we imagine that that then
  • 51:37resets when they move away from the object.
  • 51:39And we imagine that it's less
  • 51:41dangerous when they do cautious
  • 51:42approach than confident approaches
  • 51:43of why they want to approach in
  • 51:46this cautious way in the 1st place.
  • 51:47And we critical to this is that
  • 51:50the uncertainty about that,
  • 51:51about their about whether there's
  • 51:52a predator or not only will reduce
  • 51:54if they actually visit the object.
  • 51:56If they don't visit the object
  • 51:57or don't spend time there,
  • 51:58they're not going to find out that
  • 51:59in fact the object is completely
  • 52:01benign and never hurts them.
  • 52:02And so we have this nice parcel,
  • 52:04this important path dependence whereby
  • 52:06the timid animals don't visit for long,
  • 52:08they don't find out the object is
  • 52:10safe and therefore they they carry
  • 52:11on not visiting for long because
  • 52:13they haven't found out this,
  • 52:14this safety itself.
  • 52:15And then we have this risk of
  • 52:17aversion 2 and then when we then
  • 52:20build a model of their behaviour.
  • 52:22So here I just characterised that
  • 52:23sort of abstracted away from
  • 52:25the animal data themselves.
  • 52:26You can see we sort of capture
  • 52:27the sort of the, the,
  • 52:28the general trends in the animal in the,
  • 52:30in the, in the.
  • 52:31With this abstraction you can
  • 52:32see we do a really good job.
  • 52:33We have quite a lot of parameters I must say.
  • 52:35We can do a really good job of
  • 52:37fitting their data by essentially
  • 52:38synergising the amount by which
  • 52:40they're to which they're risk averse,
  • 52:42this PC bar mechanism and also
  • 52:44the amount by which to which they
  • 52:47are with their prior over what
  • 52:49the object is like and that prior
  • 52:51is not not influenced enough.
  • 52:53If they don't visit the object,
  • 52:54they don't disturb the object.
  • 52:55It's it's safe in the way that I described.
  • 52:58OK.
  • 52:59So because I'm running out of time,
  • 53:00let me just go to the general discussion
  • 53:03that's really discussion about that.
  • 53:05So just to sum up then on this risk aversion,
  • 53:08I think we can,
  • 53:09it's nice to think from a sort of
  • 53:11computational psychiatric point of
  • 53:12view about the things that the thing,
  • 53:14the way that evaluation happens in
  • 53:16the context of this risk aversion.
  • 53:17So you think of sort of people who
  • 53:20are highly risk averse in some sense.
  • 53:21Maybe they're solving a different
  • 53:23problem from others.
  • 53:24And so here we've shown that
  • 53:26you that optimally,
  • 53:27if you have a really low value
  • 53:28of alpha or in some context
  • 53:30this this nested C bar,
  • 53:31NC bar, then you'll see this
  • 53:34dysfunctional avoidance.
  • 53:34And also this rumination process
  • 53:36in the sense that you'll keep on
  • 53:37worrying about all the nasty things
  • 53:39that can happen if alpha is near 0.
  • 53:40You have action,
  • 53:41indifference and helplessness,
  • 53:42and that's the correct answer.
  • 53:44That's the right thing to do.
  • 53:45If your value of alpha is so low
  • 53:47and you live in a stochastic world,
  • 53:49how much rumination you should do?
  • 53:51There's some sort of threshold.
  • 53:52How much planning you want to to to do,
  • 53:54how much improvement you need to have is
  • 53:56something which again is under your control.
  • 53:58Maybe you want to really squeeze
  • 54:00out all possibilities.
  • 54:01Then you're going to have to do an
  • 54:02awful lot of rumination to worry
  • 54:04about all the really low probability
  • 54:05outcomes that can happen.
  • 54:07And then for humans we have this problem
  • 54:08that we live in a very complicated world.
  • 54:10We can always imagine another
  • 54:12catastrophe around the corner.
  • 54:13If you pay a lot of attention
  • 54:15to low probability outcomes,
  • 54:16then we can always invent nasty low
  • 54:19probability outcomes that will cause
  • 54:20you to to to to to have problems.
  • 54:22And then as then in the
  • 54:23case of the the rodents,
  • 54:24we can see there's an effect on this
  • 54:27exploration exploitation trade off
  • 54:28in the sense that the animals that
  • 54:29don't explore can't find out about
  • 54:31safety and therefore they can never,
  • 54:32they will never be able to to to
  • 54:35essentially treat the object in its
  • 54:37natural way in terms of to another
  • 54:39source of problems and risk in terms
  • 54:41of evaluation is that maybe when
  • 54:43we're thinking about this rumination,
  • 54:45we think maybe there's some subjects
  • 54:47who try to do this ruminative planning,
  • 54:50they try to think, well, OK,
  • 54:51if I'm at the native object,
  • 54:52here's what I would do to go away from it.
  • 54:54But it's so aggressive to think about it.
  • 54:56They will never consummate that planning.
  • 54:58They never stop doing that
  • 54:59planning in this way.
  • 55:00And so that's an idea that Quentin
  • 55:02Hughes and I worked on a long,
  • 55:04long time ago was that this,
  • 55:05this is a sort of internal behavioural
  • 55:08inhibition associated with a,
  • 55:10with a thought,
  • 55:10if you like,
  • 55:11about a piece of planning.
  • 55:12So maybe that leads you never
  • 55:13to consummate the planning,
  • 55:14which means you have to do
  • 55:15it again and again and again.
  • 55:17So again leading to a sort
  • 55:18of rumination itself,
  • 55:19you can imagine that you don't
  • 55:21adjust for luck appropriately.
  • 55:22So if you're unlucky you don't
  • 55:24think that I'm now,
  • 55:25I can now afford to be a bit more
  • 55:27risk avert risk neutral again.
  • 55:28So again you'll then have more
  • 55:30negative evaluation,
  • 55:31you should have itself and then in terms
  • 55:33of the the maybe the environment you have,
  • 55:36the way that you're evaluating risk is
  • 55:38not appropriate the environment you have.
  • 55:39I think one nice way to think
  • 55:41about that is in terms of over
  • 55:43generalizing representations.
  • 55:43So with something again you see in
  • 55:45depression which is I've shown you
  • 55:46that this sort of infects states so
  • 55:48if you think that something nasty
  • 55:49might happen then the value of that
  • 55:51state gets associated with the nastiest
  • 55:53thing that can possibly happen.
  • 55:54So if you over generalize
  • 55:56your representations,
  • 55:56you're putting nice States and
  • 55:58nasty states together and therefore
  • 56:00the value of the nasty states over
  • 56:02infects the values of the nice
  • 56:03states you could possibly have.
  • 56:04So lots of things to investigate in
  • 56:06in in in risk in the in the future
  • 56:08using hopefully these different
  • 56:10aspects of sequential evaluation.
  • 56:11So thank you very much.