Mapping Spiking Neural Networks的论文汇总以及思考

首先感谢CSDN平台，发现不是我一个人在SNN Mapping方面纠结着。去年看了Mapping方面的内容后感觉想创新还是有点难度的，毕竟优化就是生物进化算法类似的套路，可是你会发现自己实现的结果就是没人家论文的结果优秀。所以去年看了几篇后就不想整这个方向，但是你懂的，这样那样的原因一大堆，还是要我整Mapping。我想哭了，原因是这是一个苦力活，实验设置多，仿真数据多，算法优化变化多，和别人比较很难秀起来。关键是论文发的一般不行。

不过苦活累活总得有人做啊，作为一个经常表现十分老实的我自然是很好的苦工。其实我内心是拒绝的，不然我去年就整理了。2021年了，还是落在我头上了。不想说什么别的了，我的实话是我内心是拒绝的。我的回答依然是“那好吧”。anyway，今年开整这个。希望CSDN小伙伴们看到我写的不对的地方热心帮我指出来，不要让我走很多弯路哦。你们的热心帮助在拯救我的N多脑细胞和N多头发哦。Thanks♪(･ω･)ﾉ

目前我看到CSDN关于Mapping SNN的有：

1. Mapping Spiking Neural Networks onto a Manycore Neuromorphic Architecture

Lin C K , Wild A , Chinya G N , et al. Mapping spiking neural networks onto a manycore neuromorphic architecture[C]// Acm Sigplan Conference. ACM, 2018:78-89.

2. Optimized Mapping Spiking Neural Networks onto Network-on-Chip

3.神经网络映射的论文阅读

4.A Cross-layer based mapping for spiking neural network onto network on chip

好像都是我关注的博主“嘀嗒一声小刺猬”，感谢他。

5.Mapping Spiking Neural Networks to Neuromorphic Hardware

A. Balajiet al., "Mapping Spiking Neural Networks to Neuromorphic Hardware," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 1, pp. 76-86, Jan. 2020, doi: 10.1109/TVLSI.2019.2951493.

Neuromorphic hardware implements biological neurons and synapses to execute a spiking neural network (SNN)-based machine learning. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition an SNN into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and intercluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion and improves application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a metaheuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on a state-of-the-art neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and spike latency by 21%, compared to the best-performing SNN mapping technique.

6.Run-time Mapping of Spiking Neural Networks to Neuromorphic Hardware

Balaji, A., Marty, T., Das, A. et al. Run-time Mapping of Spiking Neural Networks to Neuromorphic Hardware. J Sign Process Syst 92, 1293–1302 (2020). https://doi.org/10.1007/s11265-020-01573-8

Balaji A , Marty T , Das A , et al. Run-time Mapping of Spiking Neural Networks to Neuromorphic Hardware[J]. arXiv e-prints, 2020.

Neuromorphic architectures implement biological neurons and synapses to execute machine learning algorithms with spiking neurons and bio-inspired learning algorithms. These architectures are energy efficient and therefore, suitable for cognitive information processing on resource and power-constrained environments, ones where sensor and edge nodes of internet-of-things (IoT) operate. To map a spiking neural network (SNN) to a neuromorphic architecture, prior works have proposed design-time based solutions, where the SNN is first analyzed offline using representative data and then mapped to the hardware to optimize some objective functions such as minimizing spike communication or maximizing resource utilization. In many emerging applications, machine learning models may change based on the input using some online learning rules. In online learning, new connections may form or existing connections may disappear at run-time based on input excitation. Therefore, an already mapped SNN may need to be re-mapped to the neuromorphic hardware to ensure optimal performance. Unfortunately, due to the high computation time, design-time based approaches are not suitable for remapping a machine learning model at run-time after every learning epoch. In this paper, we propose a design methodology to partition and map the neurons and synapses of online learning SNN-based applications to neuromorphic architectures at run-time. Our design methodology operates in two steps – step 1 is a layer-wise greedy approach to partition SNNs into clusters of neurons and synapses incorporating the constraints of the neuromorphic architecture, and step 2 is a hill-climbing optimization algorithm that minimizes the total spikes communicated between clusters, improving energy consumption on the shared interconnect of the architecture. We conduct experiments to evaluate the feasibility of our algorithm using synthetic and realistic SNN-based applications. We demonstrate that our algorithm reduces SNN mapping time by an average 780x compared to a state-of-the-art design-time based SNN partitioning approach with only 6.25% lower solution quality.

挑战：To map a spiking neural network (SNN) to a neuromorphic architecture, prior works have proposed design-time based solutions, where the SNN is first analyzed offline using representative data and then mapped to the hardware to optimize some objective functions such as minimizing spike communication or maximizing resource utilization. In many emerging applications, machine learning models may change based on the input using some online learning rules. In online learning, new connections may form or existing connections may disappear at run-time based on input excitation. Therefore, an already mapped SNN may need to be re-mapped to the neuromorphic hardware to ensure optimal performance. Unfortunately, due to the high computation time, design-time based approaches are not suitable for remapping a machine learning model at run-time after every learning epoch.

方案：Our design methodology operates in two steps – step 1 is a layer-wise greedy approach to partition SNNs into clusters of neurons and synapses incorporating the constraints of the neuromorphic architecture, and step 2 is a hill-climbing optimization algorithm that minimizes the total spikes communicated between clusters, improving energy consumption on the shared interconnect of the architecture.

结果：We demonstrate that our algorithm reduces SNN mapping time by an average 780x compared to a state-of-the-art design-time based SNN partitioning approach with only 6.25% lower solution quality.

为什么crossbar NOC：A neuromorphic architecture is typically designed using crossbars, which can accommodate only a limited number of synapses per neuron to reduce energy consumption. To build a large neuromorphic chip, multiple crossbars are integrated using a shared interconnect such as network-on-chips (NoC)

通常mapping 方式：To map an SNN to these architectures, the common practice is to partition the neurons and synapses of the SNN into clusters and map these clusters to the crossbars, optimizing hardware performance such as minimizing the number of spikes communicated between crossbar, which reduces energy consumption。

Prior methods to partition and map an SNN to neuromorphic hardware, such as PSOPART [16], SpiNeMap [6], PyCARL [4], NEUTRAMS [25] and DFSynthesizer [42] are design-time approaches that require significant exploration time to generate a good solution. Although suitable for mapping supervised machine learning models, these approaches cannot be used at run-time to remap SNNs frequently.

DFSynthesizer:S. Song et al., “Compiling spiking neural networks to neuromorphic hardware,” in LCTES, 2020.

作者方法：For online learning, we propose an approach to perform run-time layer-wise mapping of SNNs on to crossbar-based neuromorphic hardware. The approach is implemented in two steps. First, we perform a layer-wise greedy clustering of the neurons in the SNN. Second, we use an instance of hill-climbing optimization (HCO) to lower the total number of spikes communicated between the crossbars

offline training 弊端：However, data collected by IoT sensors constantly evolve over time and may not resemble the representative data used to train the neural network model. This change in the relation between the input data and an offline trained model is referred to as concept drift [23]. Eventually, the concept drift will reduce the prediction accuracy of the model over time, lowering its quality. Therefore, there is a clear need to periodically re-train the model using recent data with adaptive learning algorithms.

Mapping decisions for a supervised SNN are made at design-time before the initial deployment of the trained model. However, in the case of online learning, when the model is re-trained, (1) synaptic connections within the SNN may change, i.e. new connections may form and existing connection may be removed as new events are learned, and (2) weights of existing synaptic connections may undergo changes after every learning epoch. In order to ensure the optimal hardware performance at all times, a run-time approach is required that remaps the SNN to the hardware after every learning epoch.

自适应算法：periodically re-train the model using recent data with adaptive learning algorithms. Examples of such algorithms include transfer learning [38], lifelong learning [43] and deep reinforcement learning

Contributions:

Following are our key contributions.

We propose an algorithm to partition and map online learning SNNs on to neuromorphic hardware for IoT applications in run-time;
We demonstrate suitability of our approach for online mapping in terms of the exploration time and total number of spikes communicated between the crossbars, when compared to a state-of-the-art design time approach

Overview of a SNN hardware: a connection of pre- and post-synaptic neurons via synapses in a spiking neural network, b a crossbar organization with fully connected pre- and post-synaptic neurons, and c a modern neuromorphic hardware with multiple crossbars and a time-multiplexed interconnect.

A crossbar is a two-dimensional arrangement of synapses (n2 synapses for n neurons). Figure 1b illustrates a single crossbar with n pre-synaptic neurons and n post-synaptic neurons. The pre- and post-synaptic neurons are connected via synaptic elements. Crossbar size (n) is limited (< 512) as scaling the size of the crossbar will lead to an exponential increase in dynamic and leakage energy. Therefore, to build large neuromorphic hardware, multiple crossbars are integrated using a shared interconnect, as illustrated in Figure 1c.

As these partitioning approaches aim to find the optimal hardware performance, their exploration time is relatively large and therefore not suitable for partitioning and re-mapping of online learning SNNs.

[44]Wen, W., Wu, C. R., Hu, X., Liu, B., Ho, T. Y., Li, X., & Chen, Y. (2015). An eda framework for large scale hybrid neuromorphic computing systems. In 2015 52Nd ACM/EDAC/IEEE design automation conference (DAC) (pp. 1–6): IEEE.

[45]Wijesinghe, P., Ankit, A., Sengupta, A., & Roy, K. (2018). An all-memristor deep spiking neural computing system: a step toward realizing the low-power stochastic brain. IEEE Transactions on Emerging Topics in Computational Intelligence, 2(5), 345–358.

[46]Xia, Q., & Yang, J. J. (2019). Memristive crossbar arrays for brain-inspired computing. Nature Materials, 18(4), 309.

44,45,46的论文侧重点是提高crossbar的性能

In [44,45,46] the authors proposes techniques to efficiently map the neurons and synapses on a crossbar. The aim of these techniques is to maximize the utilization of the crossbar. NEUTRAMS partitions the SNN for crossbar-based neuromorphic hardware [26]. The NEUTRAMs approach also looks to minimize the energy consumption of the neuromorphic hardware executing the SNN. PyCARL [4] facilitates the hardware-software co-simulation of SNN-based applications. The framework allows users to analyze and optimize the partitioning and mapping of an SNN on cycle-accurate models of neuromorphic hardware. DFSynthesizer [42] uses a greedy technique to partition the neurons and synapses of an SNN. The SNN partitions are mapped to the neuromorphic hardware using an algorithm that adapts to the available resources of the hardware. SpiNeMap [6] uses a greedy partitioning technique to partition the SNN followed by a meta-heuristic-based technique to map the partitions on the hardware. PSOPART SNNs to a crossbar architecture [17]. The objective of SpiNeMap and PSOPART is to minimize the spike communication on the time-multiplexed interconnect in order to improve the overall latency and power consumption of the DYNAP-SE hardware.

其他的Mapping工具：

PSOPART：2018

【16】Das, A., Wu, Y., Huynh, K., Dell’Anna, F., Catthoor, F., & Schaafsma, S. (2018). Mapping of local and global synapses on spiking neuromorphic hardware. In Design, automation & test in europe conference & exhibition (DATE) (pp. 1217–1222). https://doi.org/10.23919/DATE.2018.8342201.

NEUTRAMS：2016

【25】Ji, Y., Zhang, Y., Li, S., Chi, P., Jiang, C., Qu, P., Xie, Y., & Chen, W. (2016). NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. In International symposium on microarchitecture (MICRO): IEEE.

SpiNeMap：也是作者自己整的 2019

【6】Balaji, A., Das, A., Wu, Y., Huynh, K., Dell’Anna, F., Indiveri, G., Krichmar, J. L., Dutt, N., Schaafsma, S., & Catthoor, F. (2019). Mapping Spiking Neural Networks on Neuromorphic Hardware. IEEE transactions on VLSI systems.

DFSynthesizer：作者写的【42】实际上是错的，应该是： 2020

S. Song et al., “Compiling spiking neural networks to neuromorphic hardware,” in LCTES, 2020.

作者提出要实用pyCARl 也是作者自己整的 2020

PyCARL [4] facilitates the hardware-software co-simulation of SNN-based applications. The framework allows users to analyze and optimize the partitioning and mapping of an SNN on cycle-accurate models of neuromorphic hardware.

Balaji, A., Adiraju, P., Kashyap, H. J., Das, A., Krichmar, J. L., Dutt, N. D., & Catthoor, F. (2020). Pycarl: a pynn interface for hardware-software co-simulation of spiking neural network. In 2020 International joint conference on neural networks (IJCNN).

实时mapping的方法有哪些？：这个组的自引非常多，几乎可以看出都是一个组相互引。
Run-time approaches are proposed for task mapping on multiprocessor systems. A heuristic-based run-time manager is proposed in [12]. The run-time manager controls the thread allocation and voltage/frequency scaling for energy efficient execution of applications on multi processor systems. In [30], the authors propose a genetic algorithm-based run-time manager to schedule real-time tasks on Dynamic Voltage Scaling (DVS) enabled processors, with an aim to minimize energy consumption. A workload aware thread scheduler is proposed in [20] for multi-processor systems. In [14], the authors propose a multinomial logistic regression model to partition the input workload in run-time. Each partition is then executed at pre-determined frequencies to ensure minimum energy consumption. In [13], the authors propose a technique to remap tasks run on faulty processors with a minimal migration overhead. A thermal-aware task scheduling approach is proposed in [11] to estimate and reduce the temperature of the multi processor system at run-time. The technique performs an extensive design-time analysis of fault scenarios and determines the optimal mapping of tasks in run-time. However, such run-time techniques to remap SNN on neuromorphic hardware are not proposed. To the best of our knowledge, this is the first work to propose a run-time mapping approach with a significantly lower execution time when compared to existing design-time approaches. Our technique reduces the spikes communicated on the time-multiplexed interconnect, therefore reducing the energy consumption.

【11】Cui, J., & Maskell, D. L. (2012). A fast high-level event-driven thermal estimator for dynamic thermal aware scheduling. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(6), 904–917.

【12】Das, A., Al-Hashimi, B. M., & Merrett, G. V. (2016). Adaptive and hierarchical runtime manager for energy-aware thermal management of embedded systems. ACM Trans. Embed. Comput. Syst 15(2). https://doi.org/10.1145/2834120.

【13】Das, A., & Kumar, A. (2012). Fault-aware task re-mapping for throughput constrained multimedia applications on noc-based mpsocs. In International symposium on rapid system prototyping (RSP): IEEE.

【14】Das, A., Kumar, A., Veeravalli, B., Shafik, R., Merrett, G., & Al-Hashimi, B. (2015). Workload uncertainty characterization and adaptive frequency scaling for energy minimization of embedded systems. In Design, automation & test in europe conference & exhibition (DATE).

【20】Dhiman, G., Ayoub, R., & Rosing, T. (2009). PDRAM: a hybrid PRAM and DRAM main memory system. In Proceedings of the Annual Design Automation Conference (DAC) (pp. 469–664).

【30】Mahmood, A., Khan, S. A., Albalooshi, F., & Awwad, N. (2017). Energy-aware real-time task scheduling in multiprocessor systems using a hybrid genetic algorithm. Electronics, 6(2), 40.

算法详细过程：

The network model is built using a directed graph, wherein each edge represents a synapse whose weight is the total number of spikes communicated between the two SNN neurons. The input to the mapping algorithm is a list of all the neurons (A), the total number of spikes communicated over each synapse and the size of a crossbar (k). The mapping algorithm is split into two steps, as shown in Figure 3.

Mapping of online learning SNN on Neuromorphic Hardware.

有向图 a directed graph，each edge represents a synapse，weight is the total number of spikes communicated between the two SNN neurons.

Figure 4 illustrates the partitioning of an SNN with 6 neurons into 3 sub-lists. The spikes communicated between the neurons is indicated on the synapse. First, we divide the input list of neurons into sub-lists (Section 3.1), such that each sub-list can be mapped to an available crossbar. Second, we reduce the number of spikes communicated between the sub-lists (Section 3.2), by moving the neurons between the sub-list (indicated in blue).

Building Sub-lists

Algorithm 1 describes the greedy partitioning approach. The objective is to greedily cut the input list of neurons (A) into s sub-lists, where s is the total number of crossbars in the given design. The size of a sub-list is determined by the size of the crossbars (k) on the target hardware. A variable margin (line 3) is defined to store the unused neuron slots available in each sub-list. The mean (line 4) number of spikes generated per crossbar is computed using the total number of spikes communicated in the SNN-based application. A cost function (Algorithm 2) is defined to compute the total number of spikes communicated (cost) between each of the sub-lists.

The algorithm iterates over the neurons (ni) in the input list (A) and updates the slots in the current sub-list (line 8). Neurons are added to the current sub-list until one of following two criteria are met - (1) the length of the sub-list equals k, or (2) the cost (number of spikes) is greater than the mean value and sufficient extra slots (margin) are still available. When the criteria is met, the current sublist is validated and its boundary stored. When the penultimate sub-list is validated, the execution ends because the boundary of the last sub-lists is already known (nth element in list). The list p contains the sub-lists boundaries.

Local Search

The solution obtained from Algorithm-1 is naive and not optimal. Although each sublist s obtained from Algorithm-1 meets the cost criteria, it is possible to have unevenly distributed costs across the sublists. We search for a better solution by performing multiple local searches to balance the cost. This is done by using the hill-climbing optimization technique to iterate through the sublist and move its boundary

Algorithm 3 describes the hill-climbing optimization technique. The technique relies on a cost function (line 2) to compute and evaluate a solution. The cost function used in the optimization process is shown in Algorithm 2. The cost function computes the maximum cost (number of spikes) for a chosen sub-list. The optimal solution should contain the lowest cost. The algorithm iterates through each subslist to search for the best solution (cost) of its neighbors. The algorithm begins by moving the boundary of a sub-list one position to the left or one position to the right. Each neuron (ni) in the sublist is moved across the boundary to a neighboring sub-list and the cost of the neighbors are computed. The algorithm selects the solution with the local minimum cost. The process is repeated for every neuron in the list (A) until the sub-lists with the minimum cost is found.

Evaluation

Simulation Environment

We conduct all experiments on a system with 8 CPUs, 32GB RAM, and NVIDIA Tesla GPU, running Ubuntu 16.04.

CARLsim [10] : A GPU accelerated simulator used to train and test SNN-based applications. CARLsim reports spike times for every synapse in the SNN.
DYNAP-SE [36]: Our approach is evaluated using the DYNAP-SE model, with 256-neuron crossbars interconnected using a NoC. [47].

Evaluated Applications

In order to evaluate the online mapping algorithm, we use 2 synthetic and 2 realistic SNN-based applications. Synthetic applications are indicated with an ‘S_’ followed by the number of neurons in the application. Edge detection (EdgeDet) and MLP-based digit recognition (MLP-MNIST) are the two realistic applications used. Table 2 also indicates the number of synapses (column 3), the topology (column 4) and the number of spikes for the application obtained through simulations using CARLsim [10].

Evaluated Design-time vs run-time Approach

In order to compare the performance of our proposed run-time approach, we choose a state-of-the-art design-time approach as the baseline. The crossbar size for both the algorithms is set to 256 (k = 256).In this paper we compare the following approaches:

PSOPART [16]: The PSOPART approach is a design-time partitioning technique that uses and instance of particle swarm optimization (PSO) to minimize the number of spikes communicated on the time-multiplexed interconnect.
HCO-Partitioning: Our HCO-partitioning approach is a two-step layer-wise partitioning technique with a greedy partitioning followed by a HCO-based local search approach to reduce the number of spikes communicated between the crossbars.

Results

Table 3 reports the execution time (in seconds) of the design-time and run-time mapping algorithms for synthetic and realistic applications, respectively. We make the following two observations. First, on average, our HCO partitioning algorithm has an execution time 780x lower than that of the PSOPART algorithm. Second, the significantly lower run-time of the HCO partitioning algorithm (< 50 seconds) allows for the online learning SNN to be re-mapped on the edge devices, before the start of the next training epoch.

Figure 5 shows the lifetime of an online learning application with respect to the execution times of each training epoch (t) and the HCO partitioning algorithm (h). The execution time of the partitioning algorithm needs to be significantly lower than the time interval between training epochs. This is achieved with the HCO-partitioning algorithm as its execution time is significantly (780x) lower than the state-of-the-art design-time approaches.

In Figure 6, we compare the number of spikes communicated between the crossbars while partitioning the SNN using the HCO partitioning algorithm when compared to the design-time PSOPART approach. We see that, on average, the PSOPART algorithm reduces the number of spikes by a further 6.25%, when compared to the HCO partitioning algorithm. The PSOPART will contribute to a further reduction in the overall energy consumed on the neuromorphic hardware. However, this outcome is expected as the design-time partitioning approach is afforded far more exploration time to minimize the number of spikes communicated between the crossbars. Also, the effects of concept drift will soon lead to the design-time solution becoming outmoded. Therefore, a run-time partitioning and re-mapping of the SNN will significantly improve the performance of the SNN on the neuromorhpic hardware and mitigate the effects of concept drift.

PSOPART传的spikes数目更少

In this paper, we propose an algorithm to re-map online learning SNNs on neuromorphic hardware. Our approach performs the run-time mapping in two steps: (1) a layer-wise greedy partitioning of SNN neurons, and (2) a hill-climbing based optimization of the greedy partitions with an aim to reduce the number of spikes communicated between the crossbars. We demonstrate the in-feasibility of using a state-of-the-art design-time approach to re-map online learning SNNs in run-time. We evaluate the our approach using synthetic and realistic SNN applications. Our algorithm reduces SNN mapping time by an average 780x when compared to a state-of-the-art design-time approach with only 6.25% lower performance.

Discussion

In this section we discuss the scalability of our approach. Each iteration of Algorithm-1 performs basic math operations. The hill-climbing algorithm computes as many as 2x(s-2) solutions, and performs a comparison to find the minimum cost across all the solutions. In our case, the co-domain of the cost function are well-ordered positive integers. The cost function is also linear in n, however the hill-climb optimization algorithm only terminates when the local minimum cost function is computed. Therefore, it is in our interest to optimize the number of times the cost function is to be run.

结论是损失了6.25%的spikes数量，但是速度快了780倍。

Adarsha Balaji received a Bachelors degree from Visvesvaraya Technological University, India, in 2012 and a Master’s degree from Drexel University, Philadelphia, PA, in 2017. He is currently pursuing a Ph.D. degree from the Department of Electrical

and Computer Engineering, Drexel University, Philadelphia, PA. His current research interests include design of neuromorphic computing systems, particularly data-flow and power optimization of spiking neural networks (SNN) hardware

印度韦斯科技大学（VTU）Visvesvaraya Technological University是印度25强大学之一。从成立以来，她致力于领先前沿科技研究、培养科技创新高级领导人才的一流的高等学府。印度大学排名位列第15名。德雷塞尔大学（Drexel University），简称DU，建立于1891年，是一所一流的四年制综合性私立大学，坐落在美国东海岸宾夕法尼亚州最大的城市费城，并在加州首府萨克拉门托设有分校，被称为“费城三大名校”之一（另两所为宾夕法尼亚大学和天普大学）。费城是全美第五大城市，也是仅次于纽约、洛杉矶和芝加哥的全美第四大都会区。2020年QS美国大学排名第59名。

7.Compiling Spiking Neural Networks to Neuromorphic Hardware

S. Song et al., “Compiling spiking neural networks to neuromorphic hardware,” in LCTES, 2020.

Shihao Song, Adarsha Balaji, Anup Das, Nagarajan Kandasamy, and James Shackleford. 2020. Compiling Spiking Neural Networks to Neuromorphic Hardware. In The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES '20). Association for Computing Machinery, New York, NY, USA, 38–50. DOI:https://doi.org/10.1145/3372799.3394364

Machine learning applications that are implemented with spike-based computation model, e.g., Spiking Neural Network (SNN), have a great potential to lower the energy consumption when they are executed on a neuromorphic hardware. However, compiling and mapping an SNN to the hardware is challenging, especially when compute and storage resources of the hardware (viz. crossbar) need to be shared among the neurons and synapses of the SNN. We propose an approach to analyze and compile SNNs on a resource-constrained neuromorphic hardware, providing guarantee on key performance metrics such as execution time and throughput. Our approach makes the following three key contributions. First, we propose a greedy technique to partition an SNN into clusters of neurons and synapses such that each cluster can fit on to the resources of a crossbar. Second, we exploit the rich semantics and expressiveness of Synchronous Dataflow Graphs (SDFGs) to represent a clustered SNN and analyze its performance using Max-Plus Algebra, considering the available compute and storage capacities, buffer sizes, and communication bandwidth. Third, we propose a self-timed execution-based fast technique to compile and admit SNN-based applications to a neuromorphic hardware at run-time, adapting dynamically to the available resources on the hardware. We evaluate our approach with standard SNN-based applications and demonstrate a significant performance improvement compared to current practices.

5.SNEAP: A Fast and Efficient Toolchain for Mapping Large-Scale Spiking Neural Network onto NoC-based Neuromorphic Platform

Li S , Guo S , Zhang L , et al. SNEAP: A Fast and Efficient Toolchain for Mapping Large-Scale Spiking Neural Network onto NoC-based Neuromorphic Platform[J]. 2020.

Spiking neural network (SNN), as the third generation of artificial neural networks, has been widely adopted in vision and audio tasks. Nowadays, many neuromorphic platforms support SNN simulation and adopt Network-on-Chips (NoC) architecture for multi-cores interconnection. However, interconnection brings huge area overhead to the platform. Moreover, run-time communication on the interconnection has a significant effect on the total power consumption and performance of the platform. In this paper, we propose a toolchain called SNEAP for mapping SNNs to neuromorphic platforms with multi-cores, which aims to reduce the energy and latency brought by spike communication on the interconnection. SNEAP includes two key steps: partitioning the SNN to reduce the spikes communicated between partitions, and mapping the partitions of SNN to the NoC to reduce average hop of spikes under the constraint of hardware resources. SNEAP can reduce more spikes communicated on the interconnection of NoC and spend less time than other toolchains in the partitioning phase. Moreover, the average hop of spikes is reduced more by SNEAP within a time period, which effectively reduces the energy and latency on the NoC-based neuromorphic platform. The experimental results show that SNEAP can achieve 418x reduction in end-to-end execution time, and reduce energy consumption and spike latency, on average, by 23% and 51% respectively, compared with SpiNeMap.

6.Liquid State Machine Applications Mapping for NoC-Based Neuromorphic Platforms

Li S , Wang L , Wang S , et al. Liquid State Machine Applications Mapping for NoC-Based Neuromorphic Platforms[M]// Advanced Computer Architecture, 13th Conference, ACA 2020, Kunming, China, August 13–15, 2020, Proceedings. 2020.

Liquid State Machine (LSM) is one of spiking neural network (SNN) containing recurrent connections in the reservoir. Nowadays, LSM is widely deployed on a variety of neuromorphic platforms to deal with vision and audio tasks. These platforms adopt Network-on-Chips (NoC) architecture for multi-cores interconnection. However, a large communication volume stemming from the reservoir of LSM has a significant effect on the performance of the platform. In this paper, we propose an LSM mapping method by using the toolchain - SNEAP for mapping LSM to neuromorphic platforms with multi-cores, which aims to reduce the energy and latency brought by spike communication on the interconnection. The method includes two key steps: partitioning the LSM to reduce the spikes communicated between partitions, and mapping the partitions of LSM to the NoC to reduce average hop of spikes under the constraint of hardware resources. This method is also effective for large-scale of LSM. The experimental results show that our method can achieve 1.5 reduction in end-to-end execution time, and reduce average energy consumption by 57% on 8 8 2D-mesh NoC and average spike latency by 23% on 4 4 2D-mesh NoC, compared to SpiNeMap.

7.End-to-End Implementation of Various Hybrid Neural Networks on a Cross-Paradigm Neuromorphic Chip

Wang G , Ma S , Wu Y , et al. End-to-End Implementation of Various Hybrid Neural Networks on a Cross-Paradigm Neuromorphic Chip[J]. Frontiers in Neuroscience, 2021, 15:615279.

机构：

Department of Precision Instrument, Center for Brain-Inspired Computing Research (CBICR), Beijing Innovation Center for Future Chip, Optical Memory National Engineering Research Center, Tsinghua University, Beijing, China

Integration of computer-science oriented artificial neural networks (ANNs) and neuroscience oriented spiking neural networks (SNNs) has emerged as a highly promising direction to achieve further breakthroughs in artificial intelligence through complementary advantages. This integration needs to support individual modeling of ANNs and SNNs as well as their hybrid modeling, which not only simultaneously calculates single-paradigm networks but also converts their different information representations. It remains challenging to realize effective calculation and signal conversion on the existing dedicated hardware platforms. To solve this problem, we propose an end-to-end mapping framework for implementing various hybrid neural networks on many-core neuromorphic architectures based on the cross-paradigm Tianjic chip. We construct hardware configuration schemes for four typical signal conversions and establish a global timing adjustment mechanism among different heterogeneous modules. Experimental results show that our framework can implement these hybrid models with low execution latency and low power consumption with nearly no accuracy degradation. This work provides a new approach of developing hybrid neural network models for brain-inspired computing chips and further tapping the potential of these models.

Figure 1. Illustration of the Tianjic chip architecture: (A) fine-grained configurable operation modules; (B) unified communication format; (C) adjustable timing schedule.

Fine-Grained Configurable Operation Modules

Functional core (FCore) is the basic unit of the Tianjic chip, which consists of four modules, including an axon for input organization, a dendrite (with synapses) for integration operations, a soma for non-linear neuronal transformation, and a router for activation transmission (Figure 1A). Each module can be configured to work in different modes or perform different operations, which enables the chip to support both ANN and SNN models. Among these modules, the dendrite and the soma are the main computing engines. Equipped with the synapse memory, the dendrite constitutes a 256 × 256 virtual crossbar, which can realize various vector and matrix operations. Table 1 lists the vector and matrix operations used in this paper, including vector-matrix multiplication (VMM), vector-vector accumulation (VVA) and vector buffering (VB).

Table 1. Integration and transformation operations in Tianjic.