Our method ‘MLspike’ tackles the first two challenges by finding the most likely spike train underlying the recorded fluorescence using a maximum-likelihood approach. Despite the advantages of determining such ‘activity levels’ at low SNRs, lacking the actual spike trains hampers investigating temporal coding, causal network relations and the like. As a consequence, often only spiking rates or -probabilities are extracted from Ca 2+ signals, rather than the individual spikes 24, 25, 26, 27, 28, 29, 30, 31. Third, model parameters (for example, the unitary Ca 2+ fluorescence transient’s amplitude A and decay time τ) are inhomogeneous across neurons and cortical areas. Second, the baseline fluorescence level often fluctuates. To make computation costs affordable, approximations become necessary, thus curbing estimation accuracy.
In popular spike estimation methods based on template matching 10, 16, 17, 18, 19, 20, 21, 22 the time needed to find the optimal spike train underlying a recorded fluorescence time series grows exponentially with its number of time points, just like the number of possible spike trains. However, none tackles all three of the following critical challenges: first, finding the optimal spike train is algorithmically challenging.
Therefore, accurately reconstructing spikes from noisy calcium signals is a critical challenge on the road to optically monitoring the firing activity of large neuronal populations. Moreover, the signals are often contaminated by large noises, including by baseline fluctuations similar to the actual responses. Single spikes lead to intracellular Ca 2+ increases with fast rise- but slow decay time (time-to-peak ∼8–40 ms, slightly longer in case of certain GECIs decay constant ∼0.3–1.5 s (refs 10, 13, 14)), causing the transients induced by individual spikes to overlap, often adding up nonlinearly 15. Indeed, action potentials (spikes) need to be extracted from the recorded fluorescence changes of a synthetic or genetically encoded (GECI) calcium (Ca 2+) indicator 12, 13. However, the maximal number of neurons from which workable functional signals can so far be obtained (a few hundred at best) is at least an order of magnitude smaller than what the current state of the technology allows to scan, because the signal-to-noise ratio (SNR) of the recorded fluorescence drops with the number of recorded cells. Because of the recent introduction of acousto-optic (AO) random-access scanning 8, 9, 10, 11, it has also become technically possible to rapidly scan such large populations in two and three dimensions. Unlike multi-electrode probes 1, 2, 3, two-photon laser scanning microscopy 4, 5, 6, 7 allows unbiased sampling and unambiguous three-dimensional (3D) localization of up to thousands of neurons. To understand how local networks process information, we need experimental access to the activity of large sets of individual neurons in vivo. Combined with the finding obtained from systematic data investigation (noise level, spiking rate and so on) that photonic noise is not necessarily the main limiting factor, our method allows spike extraction from large-scale recordings, as demonstrated on acousto-optical three-dimensional recordings of over 1,000 neurons in vivo. Benchmarked on extensive simulations and real data from seven different preparations, it outperformed state-of-the-art algorithms. MLspike is computationally efficient thanks to its original discretization of probability representations moreover, it can also return spike probabilities or samples. Model parameters can be either provided by the user or estimated from the data themselves. It relies on a physiological model including baseline fluctuations and distinct nonlinearities for synthetic and genetically encoded indicators. We propose a method, MLspike, which returns the most likely spike train underlying the measured calcium fluorescence. Extracting neuronal spiking activity from large-scale two-photon recordings remains challenging, especially in mammals in vivo, where large noises often contaminate the signals.