Trading Spaces: Dimensionality Reduction for Neural Recordings
In this episode, friend of the podcast Vikash Gilja reprises his role as Vikash Gilja. We are also joined by Konrad Kording, Chethan Pandarinath, and Carsen Stringer. We talk about how dimensionality reduction is used to better understand large scale neural recordings. This episode is fairly technical, but it contains many great references if you are interested in learning more. We open with a brief explainer video by Paradromics’ own Aditya Singh.
00:40 | Dimensionality Intro
04:42 | Podcast Start
07:50 | Janelia Research Campus
08:56 | Translational Neuroengineering Lab
09:35 | Stanford Neural Prosthetics Translational Lab
10:10 | Shenoy Lab
12:00 | Deep Brain Stimulation
12:57 | Chethan’s work on retinal prosthetics
15:00 | Immunology
15:20 | Jonathan Ruben
15:30 | Byron Yu
15:41 | Gatsby Computational Neuroscience Unit
18:00 | Joshua Tenenbaum
18:30 | Kording Lab at UPenn
18:46 | Neuromatch Academy
19:47 | Neuromatch Academy Q&A
21:21 | Dimensionality reduction for neural recordings
26:22 | The Curse of Dimensionality
30:11 | Principal Component Analysis
32:20 | Neural Firing as a Poisson Process
33:13 | Shared Variance Component Analysis
35:18 | Cross validation in large scale recording
38:29 | A theory of multineuronal dimensionality
39:10 | Random projections explained with visuals
42:24 | Correcting a reductionist bias
48:30 | Noise Correlations
49:35 | More on Noise Correlations
57:40 | LFADS
01:01:51 | What is a stationary process?
01:06:02 | Inferring single-trial neural population dynamics
01:06:46 | Task Specificity
01:07:28 | Lee Miller
01:08:18 | “I don’t know, I might be wrong”
01:13:16 | Neural Constraints on Learning
01:15:00 | A recent exciting paper from Yu and Batista Labs
01:19:01 | Hume on Causation
Matt Angle:
Today, we’re going to talk about dimensional reduction and its importance in analyzing neural population activity. For this discussion, we welcome back Vikash Gilja, Professor of Electrical and Computer Engineering at UC San Diego. We’re also joined by Konrad Kording, Professor of Bioengineering and Neuroscience at the University of Pennsylvania. Chethan Pandarinath, Professor of Biomedical Engineering at Emory University and Georgia Tech. And Carsen Stringer, group leader at Janelia Farm Research Campus. In this episode, we dive right into the deep end. So for those of you who are new to the topic, I’ve asked our own Aditya Singh to prepare a short explainer video about the dimensional reduction. Without further ado, I leave it to Aditya.
Aditya Singh:
This episode of Neurotech Pub is a little bit special, as we’re diving into a fundamental but often misunderstood concept in neural engineering, dimensionality, and dimensionality reduction. When we first think about dimensionality, a couple of real world ideas readily come to mind. We see dimensions all around us, living in our 3D world, exploring new ones with VR and AR, and watching our online personas grow in a 2D world of screens and social media. But more generally, a dimension isn’t necessarily a simple length with hype or time. To many scientists, the dimensionality of a dataset is the number of columns of data in an Excel sheet. When we work with high-dimensional datasets, each dimension refers to a unique observation of a real world phenomena acquired at some sampling frequency. No single column of data or dimension can explain the whole phenomena. But when we have hundreds or thousands of unique observations of the same real world event from different perspectives at different sampling frequencies interacting with each other over time, we can understand the phenomenon in aggregate. But such a dataset contains much more information than the pure underlying structure that drives this activity.
Aditya Singh:
This underlying structure, it can be understood more efficiently with far fewer features or in data speak in a lower dimensional representation. This low-dimensional representation can reconstruct the thousands of original observations across time with much less features and more efficiently encoding information. Another useful benefit of this low D representation is that it allows you to reconstruct the original information with much less noise, squeezing the important bits of information into less features removing the jitter and randomness that comes with real-world data. A neat example to understand this is if we look at a private Boston, the Formula One racing track. Here we have race cars hurdling down the track at adrenaline pumping speeds and fans eagerly watching the cars cling to every turn. Now, imagine if we gave speed guns to a hundred of these fans, but different views of the racetrack, and told them to take recordings every time they saw something fly by the truck. We would get a hundred streams of data showing each person’s speed gun reading from different angles and at different times during the race.
Aditya Singh:
With a hundred people all collecting data around the racetrack, we get a hundred dimensional dataset where each dimension is a unique person’s observation of cars racing around the track at different sampling frequencies. Expanding those to a thousand, 10,000, and a hundred thousand fans with speed guns and this suddenly becomes a very high-dimensional dataset with many more perspectives. But the underlying structure, race cars going around the 2D track is still the same. Now let’s imagine giving this dataset to a data scientist with no understanding of where this dataset came from or how it was populated. At first sight, they’re going to see an Excel sheet, hundreds of thousands of rows stretching for some time, but some visual similarities between different rows and recognize full patterns. But to get a better sense of what this data is actually observing that data scientist can use a dimensional reduction technique to extract the simpler, lower dimensional, underlying activity that can be used to reconstruct the original high-dimensional speed gun dataset.
Aditya Singh:
For example, if we would use this hundred dimensional dataset to two, we see the underlying activity that is driving our data. In this case, it is the race car circuit, which makes sense. As the changes in velocity for the race cars are based on the turns and straights it can take on this convenient two dimensional X, Y grid. We can therefore reconstruct a hundred dimensional dataset of car speeds based on this two dimensional mapping of the circuit. This lower dimensional space in neural data becomes the dynamical systems that drive the activity of hundreds and thousands of different neurons. When we record from a lot of brain data, our sensors capture the activation of neurons, and we can see this activity in a sparse format called spiking or Byron rates. This almost binary code is how the networks in your brain communicate to each other, to get things done, whether it’s moving, talking, or thinking. Each electrode like each F1 fan holding a speed gun becomes a row of feature, a dimension in our high-dimensional, large scale neural datasets that use thousands of these sensors to record from thousands of neurons.
Aditya Singh:
But what we want to understand is the underlying force in the brain, the racetrack that is triggering these neurons to activate. We can then use similar dimensional reduction techniques to model these high-dimensional datasets as manifolds that constrain this neural activity space unraveling dynamically over time across some feature rich axis, that we called this latent subspace. Here, we can more efficiently understand how these neurons are tied to each other. We can visualize not just the speed of the race cars or the activity of the neurons, but the racetrack that is the underlying trajectory guiding the patterns of activity we see from thousands of neurons in the brain over time. The denoising process of dimensional reduction also removes the randomness and noise we see from single isolated neurons, and instead allow scientists to predict the more important stimuli dependent brain activity.
Aditya Singh:
In this podcast, some of the leading experts from the neurotech field are going to be discussing how they utilize their understanding of this latent neural activity space with respect to the high-dimensional neural datasets that they collect. I hope this analogy has given you a more intuitive understanding of dimensionality and how it applies to neural decoding.
Matt Angle:
Cheers. Today, we are, again, supporting our local brewery Jester King.
Konrad Kording:
Nice. Well, if I had known that you’ll all be drinking, I would have gotten myself a beer before we call this.
Matt Angle:
It’s Neurotech Pub, Konrad.
Konrad Kording:
I didn’t know. Okay. In that case, can you guys wait for one moment?
Matt Angle:
Yeah, we can wait. We can wait.
Vikash Gilja:
I was out of beer so I mixed up a quick cocktail. It’s not very good, but it’ll do the job.
Matt Angle:
It actually doesn’t look very good, to be honest. It looks like the ice is melted.
Chethan Pandarinath:
What about you, Carsen?
Carsen Stringer:
I have water, but I’ll be fine. So wait, where are you located? Where is Paradromics located?
Matt Angle:
We’re in Austin, Texas.
Carsen Stringer:
Cool. I guess it’s definitely a good place to be now.
Matt Angle:
Yeah, definitely. And you’re in Janelia, right?
Carsen Stringer:
Yeah.
Matt Angle:
I love Janelia Farm.
Carsen Stringer:
It is a very nice place to do science.
Konrad Kording:
They’re no longer called farm.
Chethan Pandarinath:
Oh, it’s not a farm anymore?
Konrad Kording:
No, and it’s very important to them that they’re no longer a farm.
Matt Angle:
Oh, no. It’s not called that?
Carsen Stringer:
Technically now, it’s Janelia Research Campus. But I’m not going to correct people.
Matt Angle:
I was thinking if everyone could give just a really quick introduction, yourself, where you are. And also a lot of young sort of aspiring scientists, and entrepreneurs, and engineers have been watching this. I think they’re very interested in knowing how one comes to the various disciplines of neurotechnology. I think particularly, in neural decoding and computational neuroscience, there’s so many different paths to get there. I think it’d be very interesting for people to hear how your own paths kind of brought you here.
Konrad Kording:
So do you want to kick that off? Give us an example.
Matt Angle:
Maybe Vikash, can you tell us a little bit about how you got to where you are.
Vikash Gilja:
Yeah. Sure. I’m not even sure I’d call myself a computational neuroscientist, but I can give my background. So I’m currently an Associate Professor at UC San Diego in Electrical and Computer Engineering. I got here from a longstanding interest in neuroscience and engineering in my undergrad. I majored both in ECE and brain and cog sci, and that was mostly due to indecision. I really didn’t know what I wanted to focus on but that allowed me to take some depth on the engineering side. Looking back the way I picked my classes in brain and cog sci, I was getting a lot of breadth in neuroscience. And then I went over to Stanford to do my PhD with Krishna Shenoy developing neural prosthesis.
Vikash Gilja:
So there I was able to really take a depth in neural engineering, really integrating those two fields had been playing in and then stayed on as a postdoc working on clinical translation, so getting a little more depth on the clinical side of the problem. And that really drove me to be where I am today. Along the way I got to hang out with that other guy on screen, Chafin, during the postdoctoral years.
Chethan Pandarinath:
I guess I started out as an undergrad. I was definitely not good at picking one direction so I triple majored in Computer Engineering, and Physics, and a Humanities major on science policy. But none of that was anywhere near biology. I thought I was done with biology and then I went to grad school at Cornell. I was planning on going into Electrical Engineering, working on solid state devices and maybe quantum computing. Pretty undecided at that point, but very, very different from neuroscience around that time. So my dad, he has Parkinson’s disease and he’s had it for awhile. But really, when I was growing up, it wasn’t that big of a deal. It wasn’t really affecting him. When I started grad school, that was around the time when things started to get pretty bad. He had to stop working. He wasn’t able to drive on his own anymore.
Chethan Pandarinath:
Just Parkinson’s is a degenerative condition where you lose control of your movements to some degree. So he couldn’t reliably move, which meant he couldn’t drive himself. There’s one time when he was driving to work, and I remember he just had to pull over because he couldn’t control the car. So my mom had to go pick him up and that’s when I, kind of, it really just hit me like how serious this was. So a couple things, one, that was probably my first real exposure to neurodegenerative diseases and how they can really affect people and it was very personal. But two, around that time was when he got deep brain stimulators implanted.
Chethan Pandarinath:
So for people who maybe aren’t familiar, deep brain stimulation is where a neurosurgeon will plant electrodes into subcortical structures that are important for movement. And it was pretty amazing to see when you implant these things and you turn them on and start delivering current. We have very little idea, I’d say, how this entire system works. But you just turn these things on and it’s like, it’s night and day. And now my dad really can’t live without those stimulators. So it was a really cool experience to see everything I was interested in, in terms of Electrical Engineering being applied to this really complex system which I didn’t understand but turns out a lot of other people didn’t understand either. But you could really make a difference in people’s lives. So that kind of inspired me to take a sharp course correction and head over to Neuroengineering and Computational Neuroscience.
Chethan Pandarinath:
So PhD was in the visual system and understanding how information is transmitted from the eye to the brain, and developing visual prosthesis to help people with, ultimately, hopefully, people with macular degeneration or retinitis pigmentosa or other cause of blindness. And then afterwards took another completely, turned into a very different direction, got to hook up with this guy over here, Vikash, with Krishna Shenoy and Jaimie Henderson over at Stanford, working on clinical neural prosthesis, which was really, as I think Vikash was saying, it was just an amazing experience to really see translational neuroengineering and work with people directly. So it was a long and winding path, but I really, I think got hooked on the power of BCIs and what they can really do for people and that’s where I am now.
Konrad Kording:
Yeah. Just to add to this, I think these brain stimulators are the closest thing we have to magic in all of neuroscience in a way. I think it’s, you turn them on and lives are better and it’s just wonderful to see that.
Chethan Pandarinath:
Yeah, truly amazing. Especially when we have all these debates about, what do we really know about the brain? Oftentimes I come out of that thinking, not a whole lot, to be honest. Yet with this very simple device, we can work magic. It’s pretty cool.
Vikash Gilja:
Also a reminder that semi-educated guess in check can be really impactful. And kind of in our day jobs, we’re very focused on understanding mechanism and knowing what we know with complete rigor. But sometimes on the translational side, you got to try it out, right? Carefully, but you got to try it out.
Matt Angle:
Yeah. I mean, I always think of really extreme example of that is early vaccinations. I mean, vaccines proceeded immunology by decades, that’s I agree with you, it’s pretty impressive what medical science has done empirically, and often leading the kind of mechanistic work.
Carsen Stringer:
I started in engineering more like some of you, but then I took a nonlinear dynamics course with Jonathan Rubin at Pitt, and I fell in love with chaos theory and all of this work, and modeling neurocircuits in that way. So I switched to an applied math and physics major. And then I took a course with Byron Yu, who some of you might know, and he convinced me to go to Gatsby for my PhD. So Gatsby Computational Neuroscience Unit at University College London. And yeah. So I’ve been combining what I’ve learned in math and computer science, and yeah, to try to make good tools for the community to try to understand large neural data.
Chethan Pandarinath:
Wow, that’s awesome. At some point I’ll have to thank Byron for steering you into our field. That’s very well done there. That’s a win for neuroscience, I think.
Matt Angle:
Konrad, what was your entrance to computational neuroscience?
Konrad Kording:
Yeah, it’s interesting. I started as a physicist and in a way I always wanted to become a physicist. But then around term three or so, I started really being interested in biology and molecular biology. And then I did this rotation in a neuroscience lab, and I just madly fell in love with neuroscience. What I was doing back then, I was looking at video microscopy of little cultures of neurons. I still remember how I had my culture on video taken over by lots of bacteria because I wasn’t so good at it. But just seeing the neurites grow in a dish was just really making a huge difference to me. I then tried to convince the physicist in Heidelberg that in a way, lots of aspects of neuroscience are almost like physics so they should let me get my undergraduate degree in doing neuroscience.
Konrad Kording:
And they were adamant, that was a no. So then I defected to Zurich because the physicist in Zurich did not think that neuroscience wasn’t physics. They felt that there’s a lot of area in between. And it was just wonderful then kind of moving from my PhD, from the end of my undergraduate thesis and for my PhD to Zurich, where I developed really an interest in computational neuroscience. And then, well, my path wasn’t very direct from then. All right. So I simulated neurons while being in Zurich and I failed as an experimentalist recording from cap primary visual cortex. And then I moved to London doing mostly movement experiments. And then I moved to MIT working with Josh Tenenbaum, mostly doing statistical models. And then I joined the faculty of Northwestern for a while. And rising through the ranks, basically, always combining the areas that I love in that area, which is a physicsy way of thinking about models and technology, and a data analysis focus and kind of really caring about biology.
Konrad Kording:
Now, I joined University of Pennsylvania about three years ago. And they are again, people in my lab, again, combine all of them. And more recently I’ve been very interested in how we can teach computational neuroscience. We ran for the first time, Neuromatch Academy this summer, where Carsen was one of the great speakers. We had thousands of students that were participating at the same time. We tried to give them an understanding on how to think about neural data. It was just a great opportunity to link with them.
Matt Angle:
Where are most of those students coming from? What are their backgrounds? Are they coming from physics, engineering, math, biology?
Konrad Kording:
Yeah. All of them. So cognitive science as well, I should add to your list. There are people who come from a strong computational background who learn about biology. People who come from a strong biology background, learn about statistics, and everything in between. A lot of experimentalists that realize that thinking about data is useful for what they’re doing. It was really like this great experience of people coming together from all directions.
Matt Angle:
Carsen, what was your experience as a speaker there?
Carsen Stringer:
Yeah. So the speaking part was the easy part. I actually was also co-managing the TAs with another great person, Kate Bonnen. So it was really fascinating to watch people learning over Zoom. We had these small pods of about eight students with one TA, teaching assistant. We kind of told the TAs how to organize the sessions and how to guide the students through the material. I think overall, we were very impressed with how positive the experience was despite the fact that it was all virtual. And the fact that it’s able to have such a broad reaching impact because it’s virtual, I think made it, in my opinion, a big success.
Konrad Kording:
Yeah. The TA experience was the soul of Neuromatch. It was really, it brought people together in small groups and it was just fantastic how well the TAs were able to guide their teams with Carsen’s help.
Matt Angle:
Are you doing it again for people who might be interested in applying? What should they do?
Konrad Kording:
Yes. The plan is that this will be a yearly program now. I mean, there was so much interest in there. And we believe that these computational techniques… And even if you’re an experimentalist making sense of the data is one of the big problems for all of us. So I think that therefore there’s like a real need for people to learn those skills. Therefore, I think there’s a real need for a large summer school to exist every year.
Matt Angle:
So getting into neural data, one of the things, the main thing I’d like to talk about today is the use of dimensional reduction in analyzing population data and neural recordings. That’s something that, to all of you seems very natural and intuitive, but to a lot of people coming from the biology angle, they may not have heard about those techniques when they were coming up through their undergrad. And I think it actually is kind of conceptually accessible. So I think it would be great if we could talk about, first of all, what is dimensional reduction? Could anyone take a stab at a broad definition?
Konrad Kording:
Okay, why don’t I get started. So in a typical experiment, now let’s start with the kind of data. In a typical experiment, we might get data from hundred nerve cells that attack. So we have their spikes as a function of time. So we would have a matrix where we have hundred neurons by 10,000 time points or something. If we are very lucky, great and gifted with producing the right kind of data like Carsen, we might have a lot of, more like thousands of neurons at the same time. Of course, as we know, there’s companies trying to push this numbers higher. But what we have is we have these matrices where we have lots of neurons and lots of time points. Now, when it comes to the things that we care about. Like much of my background is in movement control. You want to move your arm. If you want to build a prosthetic device, you want to know how people move, what matters there? It matters kind of this low-dimensional control of what I do with the arms. So the idea is that instead of having something that is thousand channels over time, I want to have something that is maybe 20 channels over time where the interesting…
Konrad Kording:
Maybe 20 channels over time, where the interesting things happen in the data. Now defining interesting is very complicated and that’s where a lot of the tricks in dimensionality reduction happen. But it’s the idea of … we have the activity of all those neurons over time. How can we represent it as the activity in some low dimensional space? And let’s say in the case of the arm, you could say good ways of describing the movement of my arm or the representation in my head might be basically X and Y position, maybe the velocity that we have, maybe the acceleration over time, which is a relatively low dimensional signal. Dimensionality reduction is basically is find the low dimensional spaces in which the interesting things happen.
Vikash Gilja:
Kind of summarizing a little bit in a very simple way. It’s to take data that may have a more complex representation and finding a more distilled, simpler form for representing those data without eliminating the important parts of the data. Taking something complex and giving yourself a simpler explanation, [crosstalk 00:24:17]
Matt Angle:
Does anyone have a good example of dimensionality reduction outside of neuroscience in a maybe tangible practice that someone might be able to latch on to? Easily visualized?
Konrad Kording:
Yeah. Here’s a fun example. In fact, one that we used at Neuromatch Academy. English language has countless words. You could say text live in this super high dimensional space. You can say every document I can describe by the set of all the words that are used within that document. The problem is this is still the super, super mega high dimensional space. What we do with dimensionality reduction in text is we will present the paper with maybe thousand words out of a dictionary of hundred thousand words in maybe a 600 dimensional space. Every word then is in this space that is relatively low dimensional that tells me what this document is about. And we can use this low dimensional space to then ask to which level is the paper that Vikash similar to a paper that Konrad writes or similar to a paper that Carsen writes. And that’s really useful if we want to understand large collections of texts.
Vikash Gilja:
Other examples that are really common and day to day life. We all talk to our smartphones. We all have our smartphones. Take pictures of things. Along the way, the representation of images, the representation of acoustics are projected from the original, higher dimensional sensing space. In the case of images, you have tons of pixels, you have millions of pixels. In the case of audio, you have the individual samples over time. In both cases, there is a lower dimensional representation of those data that are generated prior to the machine understanding it. Along the way, there’s a simpler explanation,
Matt Angle:
Probably a lot of people that have touched neural data in some way have heard about the cursive dimensionality. And what is the cursive dimensionality mean for neural decoding and how are dimensional reduction techniques used to overcome that? Because it seems … probably strikes many as counter-intuitive that you have the Paradromics, the Neuralinks, the Janelias of the world trying to push toward these really high channel count recordings and the first thing that everyone wants to do is then reduce the dimensionality of the recording. It seems counter-intuitive, but I think maybe some of you could help.
Konrad Kording:
Yeah. Let’s briefly talk about the cause of dimensionality. If I have data that lives in a very high dimensional space, there are lots of ways of mapping that data onto something low dimensional. Like if I say want to steer an arm movement prosthetic, I want to have maybe a four dimensional output or something. And if I have lots and lots of inputs to such a system, we have a big matrix effectively, namely from each of the inputs to each of the outputs. And it’s very hard to represent, to estimate them. And it’s not just hard for our algorithms, but there’s a lot of different ways of getting the same quality of a mapping. And because there’s so many different ways, we need a lot of data to know which one is the right one. If we can use dimensionality reduction, we can reduce this problem with lots of dimensions to a problem with far fewer dimensions, for which we need much less data to train, which therefore is a really useful way of reformulating it.
Konrad Kording:
In general, the problem of the cause of dimensionality is if we have data that has lots of dimensions, it makes the estimation problems that we have more difficult.
Carsen Stringer:
Just in terms of what can we do with this high dimensional data, we want to think about, like Konrad was saying, there’s many ways to reduce the dimensionality, but we want to think about ways that take advantage of the structure we think we know about that’s in the data already, that we know the way that neurons fire and their probability of connections and these sorts of things. But there’s ways to kind of add those kinds of principles to dimensionality reduction techniques as well, that might make this problem slightly easier. Although, I would say it’s definitely unsolved at this point.
Vikash Gilja:
Yeah. On a practical side, you can use data-driven approaches where you’re looking for structure in the actual data traces that Konrad described. And then I think, Carsen, what you’re suggesting is you could also use your knowledge, your domain-specific knowledge. If you know something about the underlying physiology in the case of neuroscience, you can bring that knowledge to play. A simple example would be if you had sensors like electrocorticography sensors that measure data through volume conductance, if we know were those electrodes are in the brain relative to one another, you may be able to leverage some of that anatomical information to set some prior, some pre-established knowledge, on what the structure might look like.
Vikash Gilja:
I think some of the bigger questions that, Carsen, you are alluding to are there may be more … there might be more generalized rules in the structure of data that we can learn over time from experience on particular problems and bring that knowledge to bear on new datasets.
Konrad Kording:
When we talk about dimensionality reduction, I want to geek out just for one minute if you’ll indulge me there. The most commonly used dimensionality reduction technique is what’s called principle component analysis. The first principal component is the one dimension that describes most of the variance of the inputs that we have. What does that mean? We have lots of neurons, they’re all correlated with one another. The first principal component, if you want, is the axis along which most neurons co-vary up or down. The second one is the access where most of the remaining variants happens. And so on and so forth.
Konrad Kording:
Now, what does that mean? It means that the first principal component that I get out of dimensionality reduction in that case is something where there’s a lot of change happening over time and where there’s relatively little noise. The signal to noise ratio is biggest. We have a lot of clean signal and very little noise if we look at the first principle component. And as we go further down the list of principal components, they have less signal relative to the amount of noise that they have.
Konrad Kording:
The first principal component is the dimension along which we can learn most or know most at a given point of time. And that also means that if we do principal component analysis, we can usually get most of … describe most of what’s happening in a group of neurons with the smallest number of parameters, and therefore we will have basically a representation that has less noise. And that is why almost all dimensionality reduction techniques that people use in a way relate to a principal component analysis.
Vikash Gilja:
I want to push back on one thing you said, which was the idea that looking for the axes of highest variants are going to allow us to zero in on signal versus noise. Because this is where … I think you’re trying to give a big picture, but I want to dive in here a little bit where the assumptions we make with respect to dimensionality reduction are going to obviously affect the result. And in the case of firing neurons, we know roughly you can model firing neurons with a Poisson process. What does that mean? That means that the mean firing rate and the variance are roughly equal, which means that if you have a higher firing rate, you have higher variance. If you believe the firing rate is the true information and not the spike counts, then that means that your highest variance neurons are going to be the ones that the first principal component tilts towards. Those in a way are your noisier neurons. This is where some of these choices we have to make really carefully relative to our knowledge of the problem and the underlying data. And I want to give a shout out to [inaudible 00:33:07] because he was the first person to teach me that principle
Matt Angle:
Carsen, some of your very known work right now is related to a variant of principal component analysis. That’s a sort of cross validated principal component analysis. Can you tell us about why you started using that technique and the advantages that it has over principle component analysis and what you can pull out?
Carsen Stringer:
Yeah. I think I should clarify first what it’s for. It’s not … you don’t necessarily need to use it to get the dimensions like PCA. It’s more of a way to quantify how much variance is in the top maybe 100 linear dimensions of the data. It’s more you’re doing this cross validation step because of exactly this Poisson noise in single neurons. Every neuron is noisy. If you were to take the top components of that neural data using principal components for instance, that noise is going to be inside of those principle components and inside the variances that you’re estimating when you project those principle components onto the data.
Carsen Stringer:
And so to avoid those problems, we actually do two splits. We split the data in time. You want to see our … We’re looking for components that are shared across the population. That’s what we care about. We care about neurons that are firing together. We care about how much variance those have. And we say, are they firing in the same way in one half of the data versus the other half of the data?
Carsen Stringer:
And then we also look at … to avoid this problem of single neuron noise, we look at the co-variant structure between neurons. We look at population of neurons A versus population neuron B, how similar is that covariance matrix between that set of neurons on one half of data versus the other half of data? And that gives us kind of upper bound of how … the best model we could possibly make of the neural data, that’s how much variance we could possibly … that’s the most variance we could possibly explain because of this Poisson variability and so on. It’s kind of more of a technique to give you these kinds of upper bounds for the sorts of models you would be making rather than using it day-to-day I would say.
Konrad Kording:
And maybe we can group that space a little bit. I like what Vikash said before. The first thing is for dimensionality reduction is what do we care about? And we might care about just describing the data in which principal components get there. And the second one is we might care about being good at a machine learning task in which we want to have the variance of that that get us into the right space there. Or we might care about asking I want to pull out the dimensions out of this data that are interesting in some other way. So there’s like a whole set of approaches there.
Konrad Kording:
And then there’s the dimension that … who said it earlier? Where you can ask what’s the nature of the data. There’s data for, there’s dimensionality reduction for. [inaudible 00:36:06] variables in this dimensionality reduction for. Poisson variables and there’s a lot of these different things that go in there. And those two in a way, what do we care about and what do we assume about the data is ultimately what defines where we are. And Carsen has this cool approaches of basically finding out how good these things are. Finding out how we can be [inaudible 00:36:30] about them. Which is if you want like this evaluation is the thought part of the dimensionality reduction space.
Chethan Pandarinath:
I think all three of you touch upon a point that we all, when you’re in neuroscience and computation or sense especially it’s like second nature, but maybe people outside the field, it’s not as obvious. Let’s say when we’re talking about these dimensions, we’re typically talking about, let’s say a given neuron is … the firing rate or the spiking activity of a neuron or the calcium activity of a neuron might be one dimension. And these neurons are fundamentally, as far as we can tell, for a first [inaudible 00:37:10], they’re fairly unreliable. That’s what we mean by saying they’re are noisy.
Chethan Pandarinath:
I think that’s one thing that … people maybe outside of neuroscience might not be as used to is just like the measurements you make seem to be very unreliable. You ask a monkey to make the same movement twice and look at the neurons respond. And it’s very different from repeat to repeat of the same movement. Or even in sensory areas, they’re … I think things can be more reliable in sensory areas for sure. But to some degree, there’s just this fundamental amount of variability and unreliability that we in neuroscience have just gotten used to. We have been working with these techniques at the single neuron level exactly. And it seems to be that a lot of this low dimensional structure that we’re so focused on extracting is more reliable than the individual responses … the responses of individual neurons.
Matt Angle:
Vikash, we talked about principal component analysis, and certainly there are a lot of sophisticated methods for dimensional reduction, but one of the most interesting methods random projections is actually in some ways the dumbest possible method, but has been the basis for some really interesting work that your former advisor Krishna Shenoy and [inaudible 00:38:32] have done on latent dimensionality and as it relates to test complexity. Can you talk a little bit about how they took this framework of random projections and kind of built on that?
Vikash Gilja:
Sure. This work is … Their work is very theoretically driven, and I think they’re taking a lot of ideas from the field of compressed sensing, where they start with this basic idea that you have a set of dimensions that the neural activity lives on. What happens if you are given a random projection in that … random being that you have a random set of weights applied to those generating dimensions. And you’re observing the generating dimensions through this random projection of the generating dimensions. What can you infer about the generating dimensions? And one really interesting theoretical finding that that they’ve described is that the core number of neurons that you need to measure to recover the generating dimensions scales with the complexity of the task and by analogy the likely complexity of the stimuli. Complexity of what …
Vikash Gilja:
Let me describe task complexity. We’ve been talking about reaching. Actually in their work they’re focused on reaching. And so they describe task complexity with respect to the types of reaches. The directions you’re reaching, how long those reaches are. And so you can use the nature of those reaches to describe a set of dimensions that would allow you to generate those reaches. And what they find is that when you measure from a population of neurons, you only need to measure approximately those … that equivalent number of dimensions that you need to measure that equivalent number of neurons.
Vikash Gilja:
It’s a really important theoretical result, but I think it’s also important to dial into what it means empirically. To be able to get back to that generating set of dimensions, so that would be the truth we’re after as scientists, you may need to record many, many trials, because we’ve been talking about noise in this conversation. So to be able to get back at the back to seeing those generating dimensions, you may need to see the process over and over again.
Vikash Gilja:
One way to shortcut that is to have parallel measures of the process. It isn’t that the neural test dimensionality tells you … Sorry, the task complexity tells you how many neurons you need to record for a given application. It provides some theoretical guidance. But if you want to have certainty sooner or certainty in real time, you may need to play with that model. I don’t know if that helps. It kind of expanded a little bit beyond your question.
Matt Angle:
Yeah, no. I think that that’s something that neural engineers are very interested in right now is how many, for a given task, how many neurons do we need? Is there a way that we can [inaudible 00:42:08] or predict the number of neurons that we’ll need in a certain area based on the complexity of the task? I’m curious if anyone else wants to jump in.
Vikash Gilja:
Maybe before we get there. Maybe we’ll get there later, but I think we should also be asking whether tasks should be the way we look at this problem at all. Tasks have been a driving force in neuroscience, partially because we’ve been limited in our measurement capabilities. We can also invert this question and say, Hey, given the capabilities that we’re creating as a field, should we redefine the way we look at tasks? Great.
Konrad Kording:
Yeah, but task is important, no? You could say that as long as we always do almost the same thing and neuroscience has this history of taking behaviors and making it so that they’re always almost the same. Like in motor science, we know that we have lots of labs doing like monkeys only doing this in eight different directions. And it’s a very popular idea. Then it might give us this false impression that everything is very simple because in that local area, it kind of is. I think that we do need to worry about task complexity. If we built commercial devices that people will carry in their heads, it’s essential for them that it doesn’t just work while they do always the same thing, but that it carries over to the rest of their life. And I would certainly hope my life contains thousands of different tasks. In a way, do we expect that we need to visit all those tasks to be able to build walking devices? And I think that that is a very important question.
Chethan Pandarinath:
I actually agree with both of you, and I think the way I kind of interpreted what Vikash was saying there was we maybe need to ask the question if it even makes sense to characterize the system using these simple tests and then expect we can build something that will scale based on our knowledge … that will generalize, excuse me, based on our knowledge that we obtained from the simple task. Whereas we have recording technologies now where we might be able to record large volumes of data over long time periods, maybe the right thing to do is scale up the complexity of the behaviors we’re recording from so that whatever kind of representations we study span this large space of a complex system.
Chethan Pandarinath:
Put another way, if we keep studying, as Konrad mentioned, this classic task in neuroscience, we call it [inaudible 00:44:41] eight where a human or a monkey will make movements in eight different directions. If we keep setting that, will we ever gain enough knowledge to build something that will work across activities of daily living for somebody who’s paralyzed? I think probably not.
Matt Angle:
Carsen, I think you’ve worked with the largest data sets of all of us. I’m curious, what is your take on this?
Carsen Stringer:
Yeah. I think coming back to this idea of noise and how many neurons we need to record to overcome this noise, it’s a question of whether we can better characterize what we’re calling noise first. And maybe we can explain that what I’m calling noise away and then have these underlying latent factors that correspond to the movement for instance of the arm. In the case of mice, not monkeys, that’s not been shown, we see that the whole brain of the mouse is driven by the behaviors that the mouse does. For instance, if the mouse is running, visual cortex, which is an area of processing images, is activated. If the mouse is whisking, different neurons in visual cortex are activated. There’s all these different patterns going on while the mouse is seeing images at the same exact time.
Carsen Stringer:
This what people would normally call noise actually has a certain structure that we can in fact subtract away, particularly if it’s a linear subspace. And then once we get rid of that, you could think of a better characterizing these lanes or you might …
Carsen Stringer:
And you could think of better characterizing these lanes, or you might want to use that data in some cases.
Vikash Gilja:
Building on Carsen’s response, I think we have to think very carefully about this, this kind of classic definition of signal versus noise. I think in some ways it’s… The classic definitions are driven by being task-centric, right? When you’re task-centric, there is something you can repeat, right. There’s a critical set of variables that need to be repeated across trials of that task. And as Jason, Chethan was saying, we might need to completely move away from tasks. I think we’re all kind of suggesting that, that framework or thinking in that direction and as soon as you move away from tasks and you’re, you’re kind of in real quote-unquote “real-world behavior”, there’s very little repeatability, right? I mean, even if I, if I, every morning I make coffee in a certain way, yeah, I might have my fixed ritual, but the exact environment is, is different. right?
Vikash Gilja:
The, the temperature is different. What’s on TV, what’s on the radio while I’m making that cup of coffee is going to change. right. There, there is no repeatability in the real world, right. There’s always variation. And that in the brain, what we’re calling signal versus noise, I think we have to be really careful because some of what we’re calling noise may just be variability in the environment or subtle variations in the behavior that aren’t accounted for in our task variables.
Konrad Kording:
Yeah. So, so to be honest about it, noise is the part of the data that we don’t understand. And there could be the things in the data that we don’t understand yet, all the things that are truly un-understandable. And I don’t think we know it at the moment.
Matt Angle:
Potentially it sort of… Some of that would come out if something that’s truly what we would think of as noise, like a kind of random process would be relatively unstructured. But I guess a lot of the things related to confounding variables, you would expect to have some, some sort of structure, right. Reflecting that they were. Even if you don’t know exactly what that kind of confounding influence is. Can, can you tease that out? Is it…Konrad you’re shaking your head?
Konrad Kording:
Oh yeah. The reason why I was smiling is Neuroscientists’ call that, talk about noise correlations, noise correlation means animals do something together and I cannot predict that based on maybe what the stimulus is or what I’m, what the animal is doing or something like that. Carsen’s work has nicely shown us that in parts, that’s because we don’t really understand what the animal is doing because traditionally we’re like, yeah, that’s the stimulus we know which sound we are playing to the animal. Why would anyone watch the visual stimuli we play? Why would anyone think that movement matters, no way, Carsen and her coworkers nicely showing, it matters for everything.
Konrad Kording:
And so basically when Neuroscientists talk about noise correlation, they say of the signal that we don’t understand as a correlation now, to me that suggests that it’s not really noise, that there’s something going on and it could be something simple as Hey, while I’m in this boring experiment, I’m imagining the coffee I’ll have once I come home. And it’s, it just appears as noise because no one can know that. Right at that moment I talk about co I think about coffee.
Chethan Pandarinath:
Yeah. I really hope that we, as a field can ditch the name noise from noise correlation, hopefully soon because it’s really a misnomer it’s really like we have to assume that everything is stimulus driven or test dependent and nothing else matters. So hopefully we as a field we’ll, we’ll ditch that. But I think that also kind of gives us a clue at other ways of thinking about signal, which is especially as we start to get these larger and larger recordings, one way you might look at signal is just, if I, if I look at a restricted portion of my recording, what fraction of that is predictive of the other portion of the recording. So like split your data into train and test or whatever, but just at least we can say that if this part of the data is informative about this other part of the data, there’s some structure there that we should maybe care about it. And that might be kind of a helpful way of thinking about signaling these large recordings that we’re getting at.
Vikash Gilja:
Yeah. And I think, I think you were alluding to that… Those splits, they can be both in terms of the dimensions we’re talking about. They can also be with respect to time, which the key buzz being dynamics, right? So there… These are perspectives that, that are gaining in popularity, but they, they are fundamental to understanding how a system works in the world. There, there is… We’re starting with the assumption that there is some regularity and, and that’s about it, right?
Matt Angle:
What sorts Of priors have been helpful in, in neural decoding? What… Are there examples I’m thinking maybe of treating this, treating this neural activity is a dynamic system, a dynamical system, and then making certain assumptions about the dynamics. Where have we been successful in applying constraints and priors and, and what has yielded fruit and where do we think maybe we’ve been too restrictive in trying to apply our priors [inaudible 00:51:43] on both questions.
Chethan Pandarinath:
I have a hard time talking about anything that has to do with priors, with Konrad on. I’d be too embarrassed to.
Konrad Kording:
Why, why? So but maybe Matt, let’s make sure that our audience knows what we mean when we talk about priors. So we usually… When we built decoders or something, we built in some of the aspects that we know about brains. Now let’s list a few of them, like what other priors we believe that there exists relatively low dimensional manifold of neural activity that matters. We believe that a spike sent by a neuron at one point of time means something very similar to the spike being sent at just like a millisecond later or something. We believe, we believe that for any given decoding task, most neurons are probably mostly irrelevant.
Konrad Kording:
So we have all these prior knowledge about properties of the world and by building in some of them, we can build better systems than not building in these intuitions. So and the reason why is relates to what you said earlier, Matt, where you can say, it’s, it’s, it’s basically, we have this, this problem of, if we’re in height dimensions, we have the cause of dimensionality. It’s very hard for us to know what the right solutions are. By us building in things that we already know about the brain in a way we make that problem easier, because there’s effectively fewer possible decoders that we’re willing to admit.
Carsen Stringer:
Konrad can I just quickly say, not everyone makes the assumption though, that the data is low dimensional. So I have to add that. In terms of sensory, so in terms of motor systems and output and of behavior and limb movement, that, that seems like a relatively reasonable assumption, but in terms of our representation of the sensory world, like the images you see, you’re seeing many, many millions of pixels, every time you move your head and that representation and how to do object recognition and see and figuring out where things are in the world, that it’s helpful to be in a high dimensional space to do that. So I will say in that context, we don’t assume a low dimensional space, but, but yeah. Otherwise, I [crosstalk 00:54:07] hear what you say-
Konrad Kording:
Yeah, I totally agree with Carsen there. The idea that our thoughts are very low dimensional is just insane, if you start thinking it through, clearly the way we experience our visual world is very high dimensional. So, so when they say low dimensional, I’m not saying… And I think most of my colleagues aren’t saying is like really low dimensional. It might be a little more low dimensional than the neurons could be if we had all of them do the independent things. And yet it’s-
Matt Angle:
Low dimensionality relative to the number of neurons participating in the encoding.
Konrad Kording:
Yeah, I guess so. Carsen, would you, could you, could you be willing to accept that it’s at least a little bit smaller than the number of neurons that we have?
Carsen Stringer:
Maybe a little bit, but it’s still open for debate. I think until we have a good model of how they’re… The brain is encoding these visuals stimuli, we can’t really say how, how many dimensions are being shared among the neurons and, and get a real… A true number on it that’s below a large number. [inaudible 00:55:13] right now the linear dimensionality is very large, I should say. So there’s the… You can think about squishing things in various ways, and there might be a lower dimensional non-linear manifold that we’re not seeing as well. So…
Matt Angle:
You could probably cap it though, based on correlations across neurons so you have a lot of neuronal, neuronal firing is correlated enough that it’s, it can’t be the number of neurons. It has to be smaller than that.
Carsen Stringer:
But it could be close to that number because there can be many dimensions, they could be correlated, but there could be many dimensions that each neuron participates in. So you’re, you’re right. There will be directions that multiple neurons participate in, but that neuron could participate in many different directions. So it could still be as high dimensional as the number of neurons, even though there are correlations in the population. But, but yeah, no, that is a… It is a good point. And, and it could be very different from area brain area to brain area too. So I’m talking very much about my experience with visual cortex.
Chethan Pandarinath:
So Carsen, I just, I don’t want to misspeak about your work, but kind of one of, one of the key takeaways that I got was that as, as you look at higher and higher dimensions, they’re lower and lower fractions at the total variance of the signal, which means we need larger and larger. If we have a restricted 100 channel array or something like that, I’m very unlikely to see that the data is high dimensional because those higher dimensions are such a small component of my signal power that I might not be able to distinguish them from noise with my minuscule recording. So I guess I would be very surprised if the motor system was high dimensional in the same way the visual system was, but I’m actually just not even sure if we’ve been able to test that question, given kind of the techniques we’ve brought to bear on the problem.
Matt Angle:
I guess that’s the other thing that also kind of raises the question of maybe the, the dimensionality may not be that relevant. You could have 10,000 dimensions, but if all of the power is in the first five dimensions, might be more, that might be more salient. If, if there’s such, such little power distributed across the kind of-
Vikash Gilja:
I think there we’re, we’re starting to get into correlation versus causation land, which is-
Matt Angle:
That’s Konrad’s favorite topic, isn’t it?
Konrad Kording:
Yeah, you don’t want to prompt me on causation and correlation… [crosstalk 00:57:30] and I will pretend for this time that I didn’t hear that one.
Matt Angle:
Carsen mentioned kind of non-linear manifolds. And I wanted to give Chethan a chance to talk about LFADs and some of the, the introduction of neural networks to do dimensional reduction. And I’d be really curious to get the group’s thoughts about how that differs from more kind of traditional linear methods, like PCA, what we’ve learned from that.
Chethan Pandarinath:
Sure. Yeah. Happy to. I’d say this actually also relates, I think, to your previous question about kind of priors and how they might be useful for modeling data. One of the things that a lot of sort of classic dimensionality reduction techniques maybe don’t do very well is take into account structure over time, which is, let’s take PCA or another technique, like factor analysis, for example, both of them sort of assume that the data can basically change arbitrarily from time point to time point. So you could, you could shuffle your data in time and get the exact same result from PCA as you did from the original data. And we know that’s not true about neural data. We know that there’s very clear structure in time. So if you’ve know that the neurons are doing something right now, they’re not going to be doing something completely different in the next time step, right? There’s some relation over time.
Chethan Pandarinath:
So Matt, I think what we did in our LFADs paper would just say effectively what we were asking was how could we try to model some of that temporal structure? How could we essentially what, what type of prior could we develop on how your activity changes over time and a reasonable priors is that of, well, we treat the system as a dynamical system, which actually, if you get down to the details of that, it’s not really saying much, but a dynamical system is one in which the future activity is predictable based on the current activity and potentially any inputs that might come into the system.
Chethan Pandarinath:
So basically that… Basically what we’re saying is can we develop a system that can take current activity and predict what’s going to happen in the future? So if we believe that activity has a lot of structure and time, that’s, that’s a pretty good idea. And we did that with recurrent neural networks, which are themselves dynamical systems, they’re non-linear dynamical systems, so they can model a non-linear changes over time. So that effectively gives us a way to capture structure in the activity that, that changes over time, happy to go into more technical details. But-
Matt Angle:
What did we learn from applying non-linear methods that we didn’t know using linear methods? What, what does that bring to the table because obviously those models are… They’re more complicated to train, they’re more complicated to understand, but often in, often in machine learning, neural networks end up being better. And, and I guess from, from a functional standpoint, that’s great from an understanding standpoint, do we, do we feel like we know what LFADs brought, brought that was new?
Chethan Pandarinath:
Yeah, to be honest, I don’t know that that non-linear is really the important thing to consider. I think it’s really more taking into account structure over time, which is a little bit harder to do with, with traditional methods.
Vikash Gilja:
Yeah, as, as a LFADs user, kind of comment on, on a few, few things that come to my mind in, in terms of the structure of that model, I mean, non-linearity across time is, is important in terms of the expressiveness of the dynamics, the types of dynamics you can generate. But I think there’s another important comparison point relative to other techniques that bring in dynamics. One of the common strategies is to bring in a form of linear dynamics. And many of those approaches also assume stationarity that you’re going to see constancy in the nature of those dynamics, whereas the approach that you guys took with LFADs… That particular model and the way is applied in many of the early papers does allow for, for non-stationarity, as well and we were talking about getting away from trials, but coming back to trials, most, in most cases, these approaches are being applied to trials where there’s, there’s a key beginning and end that are defined and different things are happening at different time markers and LFADs and its generation of dynamics can find structure related to absolute position in time.
Vikash Gilja:
Whereas many of the other methods couldn’t and the, the types of structure can learn are way more complicated than that of the existing methods. So it allows it to be more expressive and one of the key hallmarks of, of success is that we’ve been talking about dimensionality reduction… Well, how do you know if your dimensionality reduction technique works well? Well, one measure of success is how well does it represent the high dimensional data? And that’s something we know from Chethan, from your paper and from follow-up work with LFADS that others have done that in many cases, it can… The factors that it generates, the representations across time are… They better capture what was happening in the high dimensional space? And so that, that at least as a user of LFADs, that’s something that gives me confidence in the technique.
Chethan Pandarinath:
Sure. Yeah and I think maybe kind of relating to the topic of decoding that’s, that’s been brought up. I think one of the things we often wonder with dimensionality reduction techniques is, is this just a convenient way for us to visualize our data or is there anything, more meaning… Is there any reason to really do it other than we can’t plot anything higher than three dimensions, right? And I think one of the things that was pretty surprising to us when we tried the LFADs method is that we’re taking these spike trains. And as we’ve all talked about throughout this conversation, on a given trial, on a given observation of your neurons activity, it seems pretty noisy. It, it seems pretty variable across trials. So then we train a recurrent neural network to try to describe that data.
Chethan Pandarinath:
And effectively we throw out things that can’t be captured by the structure of that network. So things that, we have the spiking activity, but instead we say, we’re going to describe it by this firing rate as if this firing rate reflects the underlying dynamics. What we found was that the kind of predictions from that model ended up being really informative about behavior on a trial to trial basis. So even though kind of the model itself didn’t know anything about, let’s say the behaviors that the animal was doing. We just found that by finding a lower dimensional representation, that’s captured by non-linear dynamical system, that was really informative of the moment by moment behavior of the animal so we could decode armed velocities with much higher accuracy, let’s say than we previously could have. So that’s kind of, I think for me, it was a little bit eye-opening to know that all this dimensionality reduction stuff that, that we see, it’s not just kind of a convenient way to visualize our data, but actually is more informative about other aspects like animals’ behaviors.
Vikash Gilja:
I’m going to try admittedly, I think it’ll be a very imperfect analogy.
Matt Angle:
Good. Then we can all criticize it.
Vikash Gilja:
Yeah. One that comes to my mind quite often, which is, you look at our understanding of planetary motion, right? There are the older models that put the earth at the center of the universe and, and said, okay, let’s model planetary motion with the idea that the earth sits at the center of the universe. And we want to model all these celestial bodies based on that supposition. And you can do it, the models get increasingly complex. It gets harder to explain the underlying principles and the underlying structure, or you can shift at least within our solar system to the sun being the center and you get a simpler explanation.
Vikash Gilja:
And I think in, in some ways, Chethan, what you are describing, generating what you, you guys found it is that you could generate this set of factors that were a better explanation of movement or a more robust explanation of movement has some similarities, right? It’s not, it’s not quite the same because we, we can’t see those planets, right. We can’t measure them directly, but it’s, in some ways a simpler rule set that gets you from where you’re starting and where you want to end up.
Carsen Stringer:
I just want to say, do you feel like the way that you’ve, you’ve modeled it as a dynamical system, and we’re going back to this idea of how high dimensional the system is, do you, do you think, because it… You think it’s a dynamical system, this is the way that the neurons are connected in some way. And if you had a different task, do you think you’d similar factors and is that something you’ve looked into?
Chethan Pandarinath:
That’s a, that’s a really great question. I think, I don’t know. We were trying, one of the challenges is that it’s honestly hard to get data which is more complex than what we see with complex two dimensional reaches, but with enough structure that we can tell the signal from the noise, getting back to what we’ve all been talking about, which is that we, we often use tasks as a crutch to discriminate signal from noise in our data. So I don’t know what kind of the ideal is here. We’ve been working with Konrad’s old buddy, Lee Miller at Northwestern who’s been setting up these amazing recording set ups where they’re monitoring monkeys, wirelessly. So you can monitor the monkey as it runs around its own cage. And you’re streaming out neural activity, you’re streaming out muscle activity at the same time.
Chethan Pandarinath:
And so what we are trying to find out is, as you kind of get to these more, much more complex behaviors, do we still see one, the dimensionality? How does it relate between the in-lab behaviors that we’re also used to versus the complex behaviors and two, can we build decoders that give you that kind of that predictability across multiple behaviors? I’d say it’s an open question. And it’s certainly a challenging one.
Matt Angle:
Konrad, there was recently an influential professor of computational neuroscience that tweeted that the manifolds were fad… And I’m curious for your take on that.
Konrad Kording:
So let’s, let’s give a little bit of background there. So the first thing is there is a community that often likes manifolds and manifolds is basically just a term used by the community that uses dimensionality reduction. So, so often if people say there are manifolds in the brain, they just mean we can run dimensionality reduction and see interesting things there. So there is this split between the people who often walk on the motor, on motor cortex and who really like…
Konrad Kording:
Often work on motor cortex and who really like dimensionality reduction, because it just turns out that we study low dimensional behavior, which seems to be really well represented in a low dimensional projection of things. And then people that say maybe work more on the vision side, whole brain imaging side of brains, or whole brain recording set of things, where it looks like all things are very, very high dimensional. We saw that in a way like Chatham in the motor movement community, like the low dimensionality idea. People like Carsen who work more on visual things suddenly have trouble with it.
Konrad Kording:
And the question is for some people, dimensionality reduction and the resulting representations which they call manifolds, obtain the role of being a model of how the brain works. I think a critique of the dimensionality reduction slash manifold idea is that in a way it doesn’t really produce an understanding. I can project it in low dimensions and I can say if I projected in two dimensions, every time you move your hand forward, this [inaudible 01:10:17] of movement that looks like this.
Konrad Kording:
It’s very hard to convert statements like that from dimensionality reduction into statements of how something works in the brain. I think the reason why the idea that manifolds are a fad is in a way justified is because people take dimensionality reduction, which is a very useful technique, every neuroscientist needs that in their toolkit. But elevated to the status of a theory of how the brain works, which just hides the real thing, because something makes the dynamics and that’s then where it happens. It’s a weird way of producing a non-theory theory of the brain, and in that sense, I think the critique that it is a fad seems justified to me.
Carsen Stringer:
Yeah. I think I am on the side where I think you’re not getting a causal model of how the neural activity is working and you really have these strong constraints. Basically the question I just posed to take on as well, that you don’t know if these are the modes of the system that it’s always exploring. Yeah. I think that’s why we’re at such an exciting point in neuroscience where we can record this really large scale data and have all these different behaviors and we can really start to answer these questions. If a low dimensional manifold really is where neural activity’s sitting, then we need models that recapitulate these manifolds and so on. Or maybe we need a new way of looking at neural activity.
Vikash Gilja:
Any tool, these machine learning tools that we’re applying more readily in the neurosciences, we have to be really careful in the way we wield it. We still need to be hypothesis driven. If you’re searching for a needle in a big enough haystack, you’re going to find tons of things that look like needles. And I think that’s where the danger is. And Konrad, I believe that that is what you were alluding to, which is if we use these tools in an unguided way, and we’re not hypothesis driven, we’re not query driven, it could lead us to mal impressions. Mal impressions that aren’t the fundamental truth. They’re just a simplifying description of the data. And I think really to understand what’s at play, we have to bring causality back in. And we need to be able to tie these models to causal perturbations. That has been a lifeblood in neuroscience. That’s how we advance our knowledge is by introducing causal perturbations, and these techniques don’t get us away from that need.
Chethan Pandarinath:
I was just going to maybe … I don’t disagree with anything that’s been said, but to push back a little bit. I think one of the examples that I’ve seen that has been really compelling in terms of believing in these low dimensional representations that we found, I think of Aaron Battista and Byron Yu’s work in studying learning in monkeys using BCIs. And one of the key things that they found in some of their experiments a few years ago was that if you train a monkey … A monkey’s controlling a cursor using a BCI let’s say. And you look at the low dimensional structure of the data just using factor analysis in their case. That’s telling you that essentially there are these patterns of covariation amongst the neural population. Neurons fire together in certain ways.
Chethan Pandarinath:
With the BCI, because you’re directly accessing the neural activity, you can change the relationship. You can change your decoder. You can change essentially what the monkey has to do in order to make the cursor move on the screen. One of their key experiments was just testing whether … they tried decoders that either … that sort of changed the relationship between neural activity and the cursor’s movements. But they could either preserve the correlation structure, meaning preserving the low dimensional representation, or they could break the correlation structure. And what they found was that it was fairly easy for the monkeys to learn to move the cursor, as long as you preserve that correlation structure. But once you break it, within a day, it was almost impossible for the monkeys to learn new ways of controlling the cursor.
Chethan Pandarinath:
Maybe that’s not causal in the same sense of dropping optogenetics or dropping in electrical simulation. But it does say there is something kind of special about that structure in that the monkey’s ability to produce new patterns of neural activity was somehow fundamentally constrained by this low dimensional structure. And they’ve had several I think followup studies looking at long-term learning and how a monkey can learn to break that correlational structure. But to me, it was really compelling evidence that there’s something meaningful about these representations that we’re looking at.
Vikash Gilja:
I just want her to respond, because this might just be semantics, but I do see the experiments that Byron and Aaron did as causal perturbation experiments, because they are generating a new control system, and they are testing the animal’s ability to engage with that control system. I think that’s a new type of causal experiment that BCI enables.
Chethan Pandarinath:
That’s true. And it’s important to note that that’s a very … there’s a rich history of studying like motor learning, like with normal motor control, but it’s very hard to make exactly that perturbation with kind of a standard experiment where you have the entire system in between the neural activity and the thing being controlled, as opposed to a BCI where you’re directly accessing the neural activity and you can manipulate how it relates to the thing being controlled.
Konrad Kording:
Aaron Batiste has pioneered a new kind of experiment of that type we’ve never seen before. And I think it’s like wonderful, and it’s the best evidence we have in that area. But let me try to argue that it doesn’t in a way … I’ll explain why I would interpret it slightly differently.
Konrad Kording:
Let’s take that experiment. We have two neurons. Let’s say these two neurons have strong correlation. Let’s assume we only work out from two neurons because the experiment still works with two neurons. And what we find is that usually there’s two neurons are either both very active or both not very active at all. In that case what the equivalent of the experiment says is if I built a decoder that will only do what the animal wants if they’re both high, that’s easy for the animal and it will only do what the animal wants. If one is high and the other one is low, that’s hard for the experiment, for the animal.
Konrad Kording:
Now, the question is where does the variance come from? Which is in a way like the ultimate question for dimensionality reduction that we’re discussing today. Now, if we assume that it comes from the equivalent of thoughts or plans or something like that, then what that just means is that the reason why these two neurons are active often at the same time is because there are thought processes that often happen that make both of them go up or both of them go down, and thought processes that rarely happen that make one go up and the other one go down.
Konrad Kording:
And so in that sense, the way I interpret the experiment is mostly that it shows that it’s easy for an animal to basically produce one of the thought patterns that they often do, and it is hard for the animal to produce a thought pattern that they do. In that sense, like that highlights what the causal [inaudible 01:17:59] is. This is a case of so-called confounding, that we have variables, namely thought patterns or intentions, that drive the variables of interest. They are therefore confounded, and in that space, the animal can arguably do control. But imagining, okay, let’s imagine that I try and throw something at the experimentalists. Imagine that I try to move my hand forward and those seem to be the natural search space in which the animal would try and explore things.
Konrad Kording:
But those are amazing experiments. It’s really one of the things that get me most excited in all of motor control at the moment generally. They’re great.
Chethan Pandarinath:
Vikash, you see what I was trying not to use the word causal?
Vikash Gilja:
Yeah. It’s an overloaded term. That’s where the semantics come in. And I think as a field, as we generate these new techniques and these new paradigms, we’re going to have to refine the language.
Matt Angle:
It’s pretty difficult because David Hume basically killed the word causal a long time ago. And yet causality is still a kind of useful assumption in our everyday lives that if we do something and something happens that we caused it. And it’s even a useful philosophical cheat in experimental science, but it’s still kind of a cheat.
Konrad Kording:
I’m not sure why you’d say that. There are domains were causality is perfectly well-defined. Let’s say randomized clinical trials, where it is like half of the people gets the COVID vaccine, the other half doesn’t get the COVID vaccine. No one knew the people who treated and all the guys that get it know is it one or the other? And ultimately I think we are all very much …
Konrad Kording:
I’m personally so much looking forward to getting the COVID vaccine. I really hate being in isolation. I don’t like anything about the pandemic. And I will have absolutely no trouble like accepting the causal injection of the COVID vaccine makes it less likely I’m going to get COVID. The question is kind of how far can we drive that principle.
Konrad Kording:
At the same time, I think all of you neuroscientists here, you’ll be perfectly fine if I go in in a neurone and I sometimes inject a few extra spikes we caught from another neuron. And I find that when I inject current, it makes the other neuron regularity spike. We’ll be fine calling that there’s a causal influence of the stimulated neuron on the neuron that we caught. It’s all complicated philosophically. For me, causality just means if we go in and put up the system, what are the things that change? And in many domains where I do work that seems perfectly well-defined.
Konrad Kording:
And it embodies something that we believe in. In a way, the reason why a lot of people do the research they do in neuroscience is because they believe we can cure diseases and so forth. That’s all causal questions. If we do the procedure, will it make you better than if we don’t do the procedure? If we stimulate that brain area, will you be … will your Parkinson’s disease be better? Those are real causal questions.
Vikash Gilja:
I think a lot of this comes down to interpretation, which is still open with these new paradigms. I think we can argue about how far to take a result and the level of conclusion you can have about the fundamentals. In the case of the work that we’re talking about, that Aaron and Byron completed, there is this change to a control system. There is a causal change to that control system. But I think what we’re questioning now is whether that tells us something fundamental about the underlying biological neural network. And I think that is open to debate, and the interpretation is not clear. Hopefully it leads to follow on experiments that get us to those answers.
Matt Angle:
Coming back to my interests, my selfish interest in BCI, Vikash and Chethan and Konrad, most of your analysis has been in the hundreds of neurons territory. Carsen, you have the privilege of now sitting with the most exciting data sets probably in neuroscience, and you’re making a lot of headway with those in the thousands of neurons territory. I’m curious what changes when you’re working with thousands instead of hundreds, and when we get to tens of thousands or hundreds of thousands, does anything change? Are there practical challenges? Do you run into hard problems? Where are we going as the experimental capabilities get much better?
Carsen Stringer:
Yeah. I think you are able to take the task domain into this higher dimensional realm, definitely. So if you only have a few neurons and you have this inherent noise in the single neuron activity, trying not to use the word noise in the wrong way. But you have the single neuron independent noise, it can be very hard to figure out what the circuit is doing with only a few neurons or what it’s encoding from the external world. It really opens up the possibility of studying how the brain is encoding a really complex representation, like the natural world for instance of the images we see as we move around.
Carsen Stringer:
I think that’s what it allows. It gives us the possibility to study that. Whereas before we’re just having these glimpses of a few neurons and we can’t take advantage of this idea of, for instance, like averaging over neurons to reduce the noise. If we find this dimension in space where like maybe these 10 neurons are always active at the same time, if one of those neurons isn;t firing on a given trial that’s fine because we’re taking the average of those 10 neurons. And so we can reliably trial by trial, moment by moment, second by second, say what the population is doing. Whereas if you were only recording a couple of neurons, you wouldn’t be able to find those spaces where neurons are co-variants.
Matt Angle:
Are there practical challenges to working with data sets of that size?
Carsen Stringer:
Yes. I would say we also do principal components analysis often to reduce the noise in the data, and then try to study the principal components of the data. There’s practical challenges in terms of you’re trying to fit an encoding model and you have many neurons and it’s going to be very slow. I think all of the challenges are because we don’t really understand the structure of the data. It comes back to we need to figure out these dimensionality reduction techniques that give us if there is a low dimensional non-linear manifold, we need to be able to find it. And if we’re going to find it, it will be easier to find it if we have more neurons. I would say the challenges are challenges that weren’t present before we got this big data.
Matt Angle:
What do you think would be the most exciting experimental capabilities to see in the next five to 10 years? What do you think would really change your ability to ask kind of new questions? What would you like to see of the hardware engineers that are tuning in?
Carsen Stringer:
I would say I want all my neurons at millisecond precision. But I don’t know what other people want.
Konrad Kording:
You guys know that I’m into causality. What I would love is the ability to stimulate lots of neurons while recording from lots of neurons, that would allow me to get causal effects, because if you randomly stimulate one neuron, then all the neurons that correlate to it, kind of by definition, that correlate to the stimulation, that could be random. All the neurons that correlate to it must be causally effected by that neuron. If we could do high dimensional read right, we could ask a lot of the cultural questions that I just happen to obsess about.
Carsen Stringer:
But the number of neurons you would need for that is also very large, because the probability of getting a connected pair is relatively low? Or what do you think, Konrad?
Konrad Kording:
Yeah. It depends on how we think about it. Now you could say, if you allow me to stimulate one neuron and I want to have a single synapse connection, then the probability is very low. The alternative is to say if I stimulate somewhere I’m going to hit some local neurons and they’re going to record the activity of a bunch of neurons that I will effectively be able to average over in which case I could basically build lower dimensional causal models that are still fully causal models. They’re just not at the level of this is what a single neuron does. But I think it would be a super cool experiment. SO once that capability comes in, give me a call.
Chethan Pandarinath:
I think actually relating to one of your previous questions, what kinds of technologies would we like to see. What I love … I think in neuroscience in general is sort of moving away from the model where a given lab will study a given brain area in a given behavior. But I especially love that we’re starting to see a lot of multi area recordings. And I hope that having multi area recordings will allow us to start to test whether they’re kind of precise timing effects in between areas where you see activity in one area, does it drive activity in another area with low latency? And hopefully, if we can also go in perturb, we can say causally that activity in this area drives activity in this other area causally with perturbations.
Chethan Pandarinath:
That’s a case where I think we might be able to make measurements where smaller or a short time skills matter when we’re talking about like inter area communication, for example.
Vikash Gilja:
Wearing more of my neural engineering hat in thinking about designing practical devices. What I’d like to see are … for the science side of that development are devices that give me access to multiple areas as Chethan said as well, because that allows us to explore a wider search space of potential models to get prostheses working, as well as multi-scale recording.
Vikash Gilja:
We have been quite limited in the sampling that we’ve done of cortex. And here I’m being cortex centric. But even within cortex, we target typically for these devices specific layers and focus pretty heavily at least in this group on intracortical recording. And I think other strategies have not been tested well enough to know how well they’d work in an engineering context. Here I’d like devices that give me surface recordings, as well as laminar recordings, but across many areas and many nuclei so that I could test alternatives like electrocorticography style alternatives versus depth alternatives.
Carsen Stringer:
Maybe we don’t need these multi area recordings if we’re additionally doing non-neural recordings of what a person is doing. Like for instance, I-position. And so gays and kind of like arousal and these other things, it might be interesting to see what kinds of signals those drive and how correlated those are to the types of movements that people are making that might be … you might find those signals with multi area recordings, but then only need a single area recording to ultimately create a good BCI.
Vikash Gilja:
Yeah. And I think maybe in compliment with the neural devices, I think based on a lot of what we’ve been talking about, I think we just want systems that would record every aspect of behavior along with every single neuron. That’s the dream, right? We want all of the data about the entire state of the organism up on a cloud server that we can now run virtual experiments against. Can you make that happen, Matt?
Konrad Kording:
Apt to that, the direct recording of the thoughts and dreams that you’re having.
Matt Angle:
Okay. Thank you all very much for your time. This has been a real pleasure.