C/C++ 高性能计算之多线程简单基础入门教程

比起别人的盲目罗列函数接口,鹦鹉学舌式的解释每一个输入参数和输出参数,一味求全而无重点,我的文章更侧重于入门知识的讲解,宁缺毋滥,只有一些最简单的入门用法,先亮代码,让你以最快地速度感受到这个东西原来是这个样子。更深入的知识,可以查看文档以及其他参考。
不管什么东西,都是尽可能地用新不用旧,所以,文章涉及的库和工具,都是基于 C++11 标准的。

前面有相关的博文已经介绍过 MPI,OpenMP 等相关的并行技术,以及 hadoop 生态圈相关的一些知识。今天给大家介绍一下 C/C++(下面统称为 C)中的利用多线程并行或者说并发进行计算的一个简单基础入门教程。

我这篇教程适合的人群为那些没接触过 C 的多线程编程,想要入门的同学,以及那些有程序基础,单纯的想要给自己的程序譬如循环部分,加加速,想要拷贝一份模板进行修改的人。

让我们开始吧。

从一个极简的例子开始

假设你现在写了一段 C 代码,大概长成下面这样。

using namespace std;
void fun(para_ref,para)
{for (int j = 0; j < task_size; j++) {dosomething}
}int main(int argc, char *argv[])
{fun(para_ref,para);return 0;}

你哇的一下突然发现,你调用的这个函数里面有个 for 循环,这个 for 循环的每一次 j,似乎有没什么关系,各算各的,你也知道自己的电脑有四个核心,总不能浪费资源吧,你开始谋划并行的策略。你老板说不能用 OpenMp 的编译选项,MPI 的编程门槛有很高,然后你想到了似乎在哪里听到过的多线程。

一个最容易想到的就是把 for 循环给均分拆成几个部分,每个线程执行 for 中的一个部分。怎么做呢?非常简单,对于这种无需异步的需求,你像下面这样写就 OK 了。

#include <thread>//使用 c++11 标准的多线程库 thread
#define THREAD_NUM 4//定义多线程的个数
using namespace std;void fun(para_ref, para, int index_begin, int task_num)
//原来的函数,被拆成了一片片,需要添加循环起始索引,和个数
{for (int j = index_begin; j < index_begin + task_num; j++){dosomething;}
}int main(int argc, char *argv[])
{thread threads[THREAD_NUM];//定义4个线程int task_num = (int)(task_size / THREAD_NUM);//计算每个线程负责的任务数for (int i = 0; i < THREAD_NUM - 1; i++){threads[i] = thread(fun, ref(para_ref), para, i*task_num, task_num);//每个线程开始做各自的任务}//因为 task_size 可能不能被平均分成 4 分,最后一份要特别处理threads[THREAD_NUM - 1] = thread(func, ref(para_ref), para, (THREAD_NUM - 1)*task_num, int(task_num + (task_size% THREAD_NUM)));for (auto& it : threads){it.join();//4个线程同步}return 0;}

代码中有较为详尽的解释,仔细看一遍,就知道 C++ 多线程是怎么用了。你可以把它当成也给模板,拷贝到自己的代码中,随便改改就能用了。简单解释一下

threads[i] = thread(fun, ref(para_ref), para, i*task_num, task_num);

这里的 thread 表示开始多线程的调用,其中 fun 是你每个线程要调用的回调函数,回调函数要求必须是静态函数,或者说是全局函数,如果你写在 C++ 的某个 Class 里面,可能需要处理的问题就比较多了,因为静态成员函数只能类的静态成员,牵一发而动全身。

fun 后面紧跟的是 fun 的输入参数,这个没什么好说的,需要注意的是,如果 fun 的输入参数有引用的类型的(C++ 中用 & 符号标识),那么多线程调用就要用 ref 关键字说清楚。如上所示。

任务池的方式实现并行的负载均衡(无需等待的 mutex 互斥锁设计)

上面提到的方法是利用循环变量的均匀划分来实现任务的划分,有个问题在于,这样划分真的是负载均衡的吗?有没有可能三个线程秒算完了,一个线程在还在一直算?这样是不是没有拉满 CPU,并不能达到最大性能?

举例来说,循环前 1/4 的占了 99% 的时间,那么这种多线程的任务分配方式就不尽合理。有人提出除 4 取余,余数相同的用同一个线程来做的方法,能做到负载均衡吗?也是看情况而定,假如刚好某个循环变量点的计算占了大部分时间,那么,4 个线程几乎同时算完的概率就很低。

那要怎么搞?一个基本的想法就是维护一个任务池。就像工厂里面的多人同时作业一样。把所有的任务丢到一个筐里面,每个工人都去筐那里认领任务,做完了手头闲了,接着去领任务,这样能够保证没有空闲劳动力,是最好的剥削工人方式。

这个思路听着好像很有道理,但是其实面临着一个非常严峻的的问题——任务争抢和读写脏数据。什么意思呢?我还是用工人的例子来做个假设。

假设有甲乙丙丁四个工人去图书馆拿书,书都是按 1、2、3、4、5……这种顺序编好号并分区放好按号来取的。那么,这时候最好需要在图书馆入口处放一本本子,来登记工人们取的书的编号,以免不知道取没取,就搞乱了。登记的思路是,比如甲乙丙丁各自拿了1、2、3、4四本书并登记好出去了,乙手脚麻利,很快回来,然后翻看一下本子,发现1、2、3、4都被取走了,这时候我取 5 吧,做了登记,就去图书馆领书了……以此类推。这时候,如果甲乙丙干完一个任务,同时回到了图书馆做登记,就一个本子啊,为了多干点活多挣点钱,他们就会开始抢这个本子,一会儿,他们就打起来了,这样就出问题了……怎么办,一个解决的方法,就是让他们排队登记,谁先踏入图书馆的门,就排前面,正在登记的人的本子“锁起来”,不让后面的人碰他,直到他写完,把锁打开。这个锁在 C++ 中有很多类型,比如说互斥锁(mutex),所谓的互斥,就是我在用,排斥你,你在用,就排斥我。学过数据库的就很清楚。

用互斥锁的思路是可以实现的。但是,新问题来了,让工人们排队等,岂不是白白浪费时间,本来一天可以拿 100 本书,现在排队浪费了时间,一天只能拿 90 本书,血亏啊。一般,对于 C++ 程序,如果对性能有很高的要求的话,多线程加锁血崩。那怎么办呢?别急,我有一个好办法。

就是把所有的任务编号全部写到本子上,工人取走一个任务,划掉一个任务。比如说你干完一个任务 5 回到图书馆了,然后看本子上的任务 6 被划掉了,接着看任务 7 也被划掉了,直到任务 8,还没被划掉,那么你就可以认领并划掉任务 8。这里的划掉这个动作,也可以用加锁来实现。这为什么能解决问题呢?譬如说两个工人同时回到了图书馆,然后,甲看到乙正在划任务 10(被上锁了),他不需要等你划完,他可以接着往下找,找到一个没被划掉的,然后取划掉它。这个过程,就不需要等待。不争,才是最好的状态,对于人生来说,又何尝不是如此。

怎么实现呢?同样,我提供一个写法的模板,看代码,永远是最好的学习程序的方式。

同样的,假如你现在手头有这样一份代码

#include<thread>
using namespace std;
void run_dfs()
{for (int i = 0; i < task_num; ++i){xxxxxxxxxxi}
}
int main()
{run_dfs();exit(0);
}

你希望对 run_dfs 里面的 for 循环进行任务池式的并行。那么,你就可以这样子写:

#include<thread>
#include<vector>
using namespace std;
vector<mutex> mtx(task_num);
void dfs_thread(int thread_id)
{int i = thread_id;while (i<task_num){if (mtx[i].try_lock()){xxxxxxxxxxi//do_task(i);}i++;}}
void run_dfs()
{thread threads[4];for (int thread_id = 0; thread_id < 4; thread_id++){threads[thread_id] = thread(dfs_thread, thread_id);}for (auto& it : threads){it.join();}
}int main()
{run_dfs();exit(0);
}

是不是很简洁?用了锁,却不需要等待。当然,这里的 while if 也是可以改成 for 的。解释一下,这个模板里面没有什么新的东西。一个是声明了一个互斥锁向量 mtx,用来记录某任务是否被上锁(即是否被工人划掉了)。这里只有用到一个方法就是互斥锁的try_lock(),上锁成功返回 true,上锁失败返回 false,如果变量地址已经被锁住了,就会上锁失败,这个判断操作是很省时间的。

如果你的任务开始时有序的,做完任务之后还要求是有序的,即原来是按 1、2 、3……拍的任务,工人做好作品之后,需要按原来的顺序排好,那么在程序中就需要想办法记录每个工人认领了哪些任务号,或者说,每个任务依次是被那个任务认领的。

其他

  • 多线程只要不写同一个内存地址,是不会引起冲突的,比如说四个线程分别写数组 int a[4]的四个位置。
  • 如果你问我你的程序计算结果相互依赖,依赖关系还很复杂,要怎么设计程序才能最大化 CPU 利用,同步的 wait 和异步的 feature ,了解一下。
  • 线程跑完 join 和 detach 怎么理解?顾名思义呗。join 就像一条河流,走到了某个地方,在等待其他河流的加入,如果其他河流不加入(即没运行到这个位置),傲娇的河流就不往前流了。 detach 就是分离,单飞啦,不等你们,就像曹云金离开了郭德纲, detach 了,即使主线程德云社倒闭了,我照样无休止地飞奔下去,一直到东海。

最后,再附几个例子程序吧。
1、

#include <iostream>
#include <vector>
#include <sstream>
#include <fstream>
#include <cmath>
#include <cstdlib>
#include <numeric>
#include <string.h>
#include <thread>
#include <unistd.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/wait.h>//用于进程的监听
#include <ctime> //要包含的头文件
#define TEST
#define FEATURE_NUM 1000
#define THREAD_NUM 4
#define TRAIN_DATA_NUM 8000
#define TEST_DATA_NUM 20000
using namespace std;//一些全局变量的定义
double features[TRAIN_DATA_NUM][FEATURE_NUM];
double features_test[TEST_DATA_NUM][FEATURE_NUM];
const int MAXS = 8 * (FEATURE_NUM + 1)*TRAIN_DATA_NUM;
char buf[MAXS];
const int MAXS_TEST = 8 * FEATURE_NUM*TEST_DATA_NUM;
char buf_test[MAXS_TEST];int main(int argc, char *argv[])
{//******************************函数申明**************************************void loadTrainData2(string trainFile, double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM]);void loadTrainData(string trainFile, double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM]);void initParam(double wtInitV, double(*wtSet)[FEATURE_NUM]);void train(double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM], double(&wtSet)[FEATURE_NUM],double stepSize0, double rho, double c, double epsilon);void loadTestData2(string testFile, double(&features)[TEST_DATA_NUM][FEATURE_NUM]);void loadTestData(string testFile, double(&features)[TEST_DATA_NUM][FEATURE_NUM]);void predict(double(&features_test)[TEST_DATA_NUM][FEATURE_NUM], double(&wtSet)[FEATURE_NUM], string predictOutFile);void loadAnswerData(string awFile, int(&awVec)[TEST_DATA_NUM]);#ifdef TESTdouble ts = clock();
#endif//***************给定文件路径****************/*     string trainFile = "./data/train_data.txt";string testFile = "./data/test_data2.txt";string predictFile = "./projects/student/result.txt";string answerFile = "./projects/student/answer2.txt"; */string trainFile = "/data/train_data.txt";string testFile = "/data/test_data2.txt";string predictFile = "/projects/student/result.txt";string answerFile = "/projects/student/answer2.txt";//********************程序的多进程处理*********************//创建共享变量和内存块,创建了 ID 为WT_SET的共享内存块,用来放权重int shm;double* wtSet_p;double wtSet[FEATURE_NUM];shm = shmget(IPC_PRIVATE, FEATURE_NUM*sizeof(double), IPC_CREAT | 0644);if (shm < 0){perror("shmget");return 1;}//创建子进程pid_t pid;pid = fork();if (pid == -1){perror("fork");exit(1);}else if (pid == 0)  //子进程用来读训练数据和训练{
//      cout << "I'm parent, pid =  " << getpid() << endl;//******************给定训练参数***********const double stepSize0 = 0.5;//初始步长const double wtInitV = 0;//初始权重const int maxIterTimes = 9;//最大迭代步数,不一定用到double rho = 0.5;double c = 0.01;double epsilon = 0.01;//*****************读入训练数据***************cout << "loading train data ..." << endl;
#ifdef TESTdouble ts11 = clock();
#endifint labels[TRAIN_DATA_NUM];loadTrainData2(trainFile, features, labels);// cout << "size of train data:" << (*trainDataSet_p).size() << endl;//    cout << "label of train data:" << (*trainDataSet_p)[10].label << endl;//    cout << "feature of train data:" << (*trainDataSet_p)[10].features[10] << endl;//getchar();
#ifdef TESTdouble te11 = clock();cout << "loading train data time is " << (double)(te11 - ts11) / CLOCKS_PER_SEC << "s" << endl;
#endif//*********************开始训练************************//初始化权重,并开始训练cout << "training ..." << endl;
#ifdef TESTdouble ts12 = clock();
#endifinitParam(wtInitV, &wtSet);//初始化参数train(features, labels, wtSet, stepSize0, rho, c, epsilon);//存模型//storeModel(*wtSet_p);
#ifdef TESTdouble te12 = clock();cout << "train time is " << (double)(te12 - ts12) / CLOCKS_PER_SEC << "s" << endl;
#endif//取共享内存空间,写入入共享数据权重wtSet_p = (double*)shmat(shm, NULL, 0);for (int i = 0; i < FEATURE_NUM; i++){*(wtSet_p + i) = wtSet[i];/*          cout << wtSet[i]<<endl;getchar(); */}shmdt(wtSet_p);//lost connect#ifdef TESTdouble tes = clock();cout << "son process total time is******************** " << (double)(tes - ts) / CLOCKS_PER_SEC << "s" << endl;
#endif}else if (pid > 0)  //父进程用来读取测试数据{
//      cout << "I'm child, pid =  " << getpid() << endl;cout << "loading test data ..." << endl;
#ifdef TESTdouble ts21 = clock();
#endifloadTestData2(testFile, features_test);//载入测试数据
#ifdef TESTdouble te21 = clock();cout << "loading test data time is " << (double)(te21 - ts21) / CLOCKS_PER_SEC << "s" << endl;
#endif//预测,并保存数据cout << "predicting ..." << endl;wtSet_p = (double*)shmat(shm, NULL, 0);/*        if(int(wtSet_p) == -1){perror("shmat");return -1;} */
#ifdef TESTdouble tswait = clock();
#endifint status;wait(&status);//等待子进程 训练结束for (int i = 0; i < FEATURE_NUM; i++){wtSet[i] = *(wtSet_p + i);//写入数据/*             cout << wtSet[i]<<endl;getchar(); */}
#ifdef TESTdouble tewait = clock();cout << "waiting time is " << (double)(tewait - tswait) / CLOCKS_PER_SEC << "s" << endl;
#endif#ifdef TESTdouble ts22 = clock();
#endifpredict(features_test, wtSet, predictFile);
#ifdef TESTdouble te22 = clock();cout << "predicting time is " << (double)(te22 - ts22) / CLOCKS_PER_SEC << "s" << endl;
#endif
#ifdef TESTint answerVec[TEST_DATA_NUM];//载入答案和保存预测值,计算误差率cout << "calculating accuracy ..." << endl;//    cout << "loading answer data ..." << endl;loadAnswerData(answerFile, answerVec);//载入标准答案//    cout << "after loading answer data ..." << endl;int predictVec[TEST_DATA_NUM];//  cout << "loading predict data ..." << endl;loadAnswerData(predictFile, predictVec);int answerVec_size = sizeof(answerVec) / sizeof(int);//   std::cout << predictVec.size() << "   " << answerVec.size() << endl;int correctCount = 0;//  cout << "begin calculating accuracy ..." << endl;for (int j = 0; j < TEST_DATA_NUM; j++){if (j < answerVec_size){if (answerVec[j] == predictVec[j]){correctCount++;}}else{cout << "answer size less than the real predicted value" << endl;}}double accurate = ((double)correctCount) / TEST_DATA_NUM;//cout << "(double)correctCount: " << (double)correctCount << endl;//cout << "(*answerVec_p).size(): " << answerVec.size() << endl;cout << "the prediction accuracy is ******************** " << accurate << endl;double tep = clock();cout << "parent process total time is ******************** " << (double)(tep - ts) / CLOCKS_PER_SEC << "s" << endl;
#endif}return 0;}inline double Dot(double(&x)[FEATURE_NUM], double(&y)[FEATURE_NUM])
{double temp = 0;int i;for (i = 0; i < FEATURE_NUM; i++){temp = temp + x[i] * y[i];}return temp;
}void initParam(double wtInitV, double(*wtSet)[FEATURE_NUM])
{int i;for (i = 0; i < FEATURE_NUM; i++){(*wtSet)[i] = wtInitV;}
}void loadTrainData(string trainFile, double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM])
{
#ifdef TESTdouble ts111 = clock();
#endifFILE *fp;//int tick_end = TRAIN_DATA_NUM*(FEATURE_NUM + 1);fp = fopen(trainFile.c_str(), "r");if (fp == NULL){printf("file error\n");exit(1);}fseek(fp, 0L, SEEK_END);int len = ftell(fp);//自动判断 fread 要读入的量rewind(fp);//算完要读入的量后重置指针const int maxS = len;//先读个 60 Mchar* buf = new char[maxS];len = fread(buf, 1, maxS, fp);//因为ftell回车算的是两个字符,所以这里要重置一下lenfclose(fp);
#ifdef TESTdouble te111 = clock();cout << "fread train data time is " << (double)(te111 - ts111) / CLOCKS_PER_SEC << "s" << endl;double ts112 = clock();
#endiffor (int i = 0; i <= len; i++){//把换行符号替换掉if (buf[i] == '\n'){buf[i] = ',';}}const char * split = ",";char *p = strtok(buf, split);double temp;int row_ind = 0;int col_ind = 0;int tick = 0;//double ftf;while (p != NULL&&row_ind < TRAIN_DATA_NUM){tick = tick + 1;sscanf(p, "%lf", &temp);if (tick % (FEATURE_NUM + 1) == 0){labels[row_ind] = int(temp);row_ind = row_ind + 1;col_ind = 0;p = strtok(NULL, split);continue;}features[row_ind][col_ind] = temp;col_ind = col_ind + 1;p = strtok(NULL, split);}
#ifdef TESTdouble te112 = clock();cout << "split train data time is " << (double)(te112 - ts112) / CLOCKS_PER_SEC << "s" << endl;
#endif
}//char* buf = new char[MAXS];
void loadTrainData2(string trainFile, double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM])
{
#ifdef TESTdouble ts111 = clock();
#endifFILE *fp;fp = fopen(trainFile.c_str(), "rb");int len = fread(buf, 1, MAXS, fp);fclose(fp);
#ifdef TESTdouble te111 = clock();cout << "fread train data time is " << (double)(te111 - ts111) / CLOCKS_PER_SEC << "s" << endl;double ts112 = clock();
#endifbuf[len] = '\0';double position = 1.0;int symbol = 0;int point = 0;int i, col, row = 0;double line[FEATURE_NUM + 2] = { 0 };i = 0;for (char *p = buf; *p && p - buf < len; p++){if (row >= TRAIN_DATA_NUM){break;}if (*p == ','){if (symbol == 1)line[i] = -line[i];i++;point = 0;position = 1.0;symbol = 0;line[i] = 0.0;}else if (*p == '\n'){if (symbol == 1)line[i] = -line[i];int ftf = (int)line[FEATURE_NUM];for (col = 0; col < FEATURE_NUM; col++){features[row][col] = line[col];}labels[row] = ftf;row = row + 1;i = 0;point = 0;position = 1.0;symbol = 0;line[i] = 0.0;}else if (*p == '.'){point = 1;}else if (*p == '-'){symbol = 1;}else{if (point == 0)line[i] = line[i] * 10 + *p - '0';else{position = position*0.1;line[i] += position*(*p - '0');}}}#ifdef TESTdouble te112 = clock();cout << "split train data time is " << (double)(te112 - ts112) / CLOCKS_PER_SEC << "s" << endl;
#endif
}void wxbVecCalc(double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], double(&w)[FEATURE_NUM], double wxbVec[TRAIN_DATA_NUM])
{//just matrix multiplicationdouble temp;int i, j;for (i = 0; i < TRAIN_DATA_NUM; i++){temp = 0;for (j = 0; j < FEATURE_NUM; j++){temp = temp + features[i][j] * w[j];}wxbVec[i] = temp;}
}void wxbVecCalc_test(double(&features)[TEST_DATA_NUM][FEATURE_NUM], double(&w)[FEATURE_NUM], double wxbVec[TEST_DATA_NUM])
{//just matrix multiplicationdouble temp;int i, j;for (i = 0; i < TEST_DATA_NUM; i++){temp = 0;for (j = 0; j < FEATURE_NUM; j++){temp = temp + features[i][j] * w[j];}wxbVec[i] = temp;}
}inline double sigmoidCalc(const double wxb)
{double expv = exp(-1 * wxb);double expvInv = 1 / (1 + expv);return expvInv;
}void sigmoidVecCalc(double(&wxbVec)[TRAIN_DATA_NUM], double(&sigmoidVec)[TRAIN_DATA_NUM]){int i;for (i = 0; i < TRAIN_DATA_NUM; i++){sigmoidVec[i] = sigmoidCalc(wxbVec[i]);}
}void sigmoidVecCalc_test(double(&wxbVec)[TEST_DATA_NUM], double(&sigmoidVec)[TEST_DATA_NUM]){int i;for (i = 0; i < TEST_DATA_NUM; i++){sigmoidVec[i] = sigmoidCalc(wxbVec[i]);}
}void wrxbmVecCalc(double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], double(&labSigmoidVec)[TRAIN_DATA_NUM], double wrxbmVec[TRAIN_DATA_NUM])
{//just reverse matrix multiplicationdouble temp=0.0L;int i, j;for (i = 0; i < FEATURE_NUM; i++){temp = 0.0;for (j = 0; j < TRAIN_DATA_NUM; j++){temp = temp + features[j][i] * labSigmoidVec[j];}wrxbmVec[i] = temp/TRAIN_DATA_NUM;}
}void gradientVec(double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM], double(&sigmoidVec)[TRAIN_DATA_NUM], double(&gVec)[FEATURE_NUM])
{double labSigmoidVec[TRAIN_DATA_NUM];for (int j = 0; j < TRAIN_DATA_NUM; j++){labSigmoidVec[j] = double(labels[j])-sigmoidVec[j];}wrxbmVecCalc(features, labSigmoidVec, gVec);
}double gradientSlope(double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM], int index, double(&sigmoidVec)[TRAIN_DATA_NUM])
{//can still be modifieddouble gsV = 0.0L;int i;double sigv, label;for (i = 0; i < TRAIN_DATA_NUM; i++){sigv = sigmoidVec[i];label = labels[i];gsV += (label - sigv) * (features[i][index]);}// cout<<gsV<<endl;// getchar();gsV = gsV / TRAIN_DATA_NUM;return gsV;
}double lossCal(double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM], double(&w)[FEATURE_NUM])
{double lossV = 0.0L;int i;double wxbVec[TRAIN_DATA_NUM];double sigmoidVec[TRAIN_DATA_NUM];wxbVecCalc(features, w, wxbVec);sigmoidVecCalc(wxbVec, sigmoidVec);for (i = 0; i < TRAIN_DATA_NUM; i++){lossV -= labels[i] * log(sigmoidVec[i]);lossV -= (1 - labels[i]) * log(1 - sigmoidVec[i]);}lossV /= TRAIN_DATA_NUM;return lossV;
}void train(double(&features)[TRAIN_DATA_NUM][FEATURE_NUM], int(&labels)[TRAIN_DATA_NUM], double(&wtSet)[FEATURE_NUM],double stepSize0, double rho, double c, double epsilon)
{double dfk0_norm = 0;double dfk1_norm = 0;double beta;double stepSize;double f1, f2;int i, j;double p[FEATURE_NUM];double dfk0[FEATURE_NUM];double dfk1[FEATURE_NUM];double sigmoidVec[TRAIN_DATA_NUM];double wTry[FEATURE_NUM];double wxbVec[TRAIN_DATA_NUM];//计算 Sigmoid(Aw)wxbVecCalc(features, wtSet, wxbVec);sigmoidVecCalc(wxbVec, sigmoidVec);//cout << "debug 1" << sigmoidVec[10] << endl;//计算 p 和 grad f的normgradientVec(features, labels, sigmoidVec, dfk0);dfk0_norm = Dot(dfk0, dfk0);for (i = 0; i < FEATURE_NUM; i++){p[i] = dfk0[i];}//cout << "debug 2" << p[10] << "dfk0_norm" << dfk0_norm << endl;#ifdef TESTint iter_num = 0;
#endif//开始进入算法的循环//for (i = 0; i < maxIterTimes; i++) {while (dfk0_norm > epsilon){
#ifdef TESTiter_num = iter_num + 1;
#endif//    cout << "dfk0_norm:"<<dfk0_norm << endl;stepSize = stepSize0;//用回退方法计算步长///*double lossV = 0.0L;for (i = 0; i < TRAIN_DATA_NUM; i++){lossV -= labels[i] * log(sigmoidVec[i]);lossV -= (1 - labels[i]) * log(1 - sigmoidVec[i]);}lossV /= TRAIN_DATA_NUM;f2 = lossV - c*stepSize*Dot(dfk0, p);//     while (true)for (;;){//     cout << "stepsize%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%:" << stepSize << endl;//  cout<<"stepsize:"<<stepSize<<endl;//计算 w+alpha*p_k//   cout<<"stepsize:"<<stepSize<<endl;for (j = 0; j < FEATURE_NUM; j++){wTry[j] = wtSet[j] + stepSize * p[j];}//        cout << "debug 3" << wTry[10] << endl;//    cout << "debug LL" << wtSet[10] << " " << stepSize << " " << p[10] << endl;f1 = lossCal(features, labels, wTry);//     cout << "f1:"<<f1<<endl<<"f2:"<< f2 << endl;//cout<<"debug L"<<lossCal()<<" "<<c<<" "<< stepSize<<" " <<dfk0[10]<<" "<<p[10]<<endl;//       cout << "debug 4" << "f1:" << f1 << "f2:" << f2 << endl;if (f1 <= f2 || stepSize < 0.001){break;}stepSize = rho*stepSize;//       cout << "stepSize" << stepSize << endl;}//*///stepSize = rho*stepSize0;//先走一步共轭方向步for (j = 0; j < FEATURE_NUM; j++){wtSet[j] += stepSize * p[j];}//cout << "debug 5" << .wtSet[10] << endl;//更新 beta 和 p//算下一步梯度以及梯度模//计算 Sigmoid(Aw)wxbVecCalc(features, wtSet, wxbVec);sigmoidVecCalc(wxbVec, sigmoidVec);// cout<<"param.wtSet"<<param.wtSet[3]<<endl;gradientVec(features, labels, sigmoidVec, dfk1);dfk1_norm = Dot(dfk1, dfk1);//       cout << "debug 6:" << endl << sigmoidVec[10] << dfk1[10] << dfk1_norm << endl;//  cout<<"dfk1_norm"<<dfk1_norm<<endl;//算 betabeta = dfk1_norm / dfk0_norm;//cout << beta << endl;//  cout << "debug 7:" << beta << endl;//   cout<<"p"<<p[10]<<endl;//更新一下 pfor (j = 0; j < FEATURE_NUM; j++){p[j] = dfk1[j] + beta*p[j];}//     cout << "debug 8:" << p[10] << endl;dfk0_norm = dfk1_norm;for (i = 0; i < FEATURE_NUM; i++){dfk0[i] = dfk1[i];}//   cout << "dfk1_norm:"<<dfk1_norm << endl;}
#ifdef TESTcout << "train total iter num: " << iter_num << endl;cout << "dfk0_norm: " << dfk0_norm << endl;
#endif
}void loadAnswerData(string awFile, int(&awVec)[TEST_DATA_NUM])
{// vector<int> awVec;int k = 0;ifstream infile(awFile.c_str());if (!infile){cout << "打开答案文件失败" << endl;exit(0);}while (infile){string line;int aw;getline(infile, line);if (line.size() > 0 && k <= TEST_DATA_NUM){stringstream sin(line);sin >> aw;awVec[k] = aw;k = k + 1;}}infile.close();
}void loadTestData(string testFile, double(&features)[TEST_DATA_NUM][FEATURE_NUM])
{FILE *fp;fp = fopen(testFile.c_str(), "r");if (fp == NULL){printf("file error\n");exit(1);}fseek(fp, 0L, SEEK_END);int len = ftell(fp);//自动判断 fread 要读入的量rewind(fp);//算完要读入的量后重置指针const int maxS = len;//先读个 60 Mchar* buf = new char[maxS];len = fread(buf, 1, maxS, fp);//因为ftell回车算的是两个字符,所以这里要重置一下lenfclose(fp);for (int i = 0; i <= len; i++){//把换行符号替换掉if (buf[i] == '\n'){buf[i] = ',';}}const char * split = ",";char *p = strtok(buf, split);double temp;int row_ind = 0;int col_ind = 0;int tick = 0;//int ftf;//double ftf;while (p != NULL&&row_ind < TEST_DATA_NUM){tick = tick + 1;sscanf(p, "%lf", &temp);features[row_ind][col_ind] = temp;col_ind = col_ind + 1;if (tick % (FEATURE_NUM) == 0){row_ind = row_ind + 1;col_ind = 0;}p = strtok(NULL, split);}
}void loadTestData2(string testFile, double(&features)[TEST_DATA_NUM][FEATURE_NUM])
{
#ifdef TESTdouble ts111 = clock();
#endifFILE *fp;fp = fopen(testFile.c_str(), "rb");int len = fread(buf_test, 1, MAXS_TEST, fp);
#ifdef TESTdouble te111 = clock();cout << "fread test data time is " << (double)(te111 - ts111) / CLOCKS_PER_SEC << "s" << endl;double ts112 = clock();
#endifbuf_test[len] = '\0';double position = 1.0;int symbol = 0;int point = 0;int i, col, row = 0;double line[FEATURE_NUM + 1] = { 0.0 };i = 0;for (char *p = buf_test; *p && p - buf_test < len; p++){if (row >= TEST_DATA_NUM){break;}if (*p == ','){if (symbol == 1)line[i] = -line[i];i++;point = 0;position = 1.0;symbol = 0;line[i] = 0.0;}else if (*p == '\n'){if (symbol == 1)line[i] = -line[i];for (col = 0; col < FEATURE_NUM; col++){features[row][col] = line[col];}row = row + 1;i = 0;point = 0;position = 1.0;symbol = 0;line[i] = 0.0;}else if (*p == '.'){point = 1;}else if (*p == '-'){symbol = 1;}else{if (point == 0)line[i] = line[i] * 10 + *p - '0';else{position = position*0.1;line[i] += position*(*p - '0');}}}fclose(fp);
#ifdef TESTdouble te112 = clock();cout << "split test data time is " << (double)(te112 - ts112) / CLOCKS_PER_SEC << "s" << endl;
#endif
}int storePredict(int(&predict)[TEST_DATA_NUM], string predictOutFile)
{string line;int i;ofstream fout(predictOutFile.c_str());if (!fout.is_open()){cout << "打开预测结果文件失败" << endl;}for (i = 0; i < TEST_DATA_NUM; i++){fout << predict[i] << endl;}fout.close();return 0;
}void predict(double(&features_test)[TEST_DATA_NUM][FEATURE_NUM], double(&wtSet)[FEATURE_NUM], string predictOutFile)
{double sigVal;int predictVal;int predictVec[TEST_DATA_NUM];double predictTrueThresh = 0.5;double wxbVec[TEST_DATA_NUM];double sVec[TEST_DATA_NUM];wxbVecCalc_test(features_test, wtSet, wxbVec);sigmoidVecCalc_test(wxbVec, sVec);for (int j = 0; j < TEST_DATA_NUM; j++){sigVal = sVec[j];predictVal = sigVal >= predictTrueThresh ? 1 : 0;predictVec[j] = predictVal;}storePredict(predictVec, predictOutFile);
}

2、

#include <map>
#include <list>
#include <set>
#include <stack>
#include <unordered_map>
#include <algorithm>
#include <iostream>
#include <stdio.h>
#include <chrono>
#include <vector>
#include <array>
#include <tuple>
#include <chrono>
#include <string.h>
#include <thread>
using namespace std;
typedef vector<vector<int>> Graph;
const int maxRecN = 280000;const int maxCycleN = 3000000;
const int MAXS_R = maxCycleN * 80;
char a[MAXS_R];
char L3[MAXS_R];
char L4[MAXS_R];
char L5[MAXS_R];
char L6[MAXS_R];
char L7[MAXS_R];int ID[maxRecN][2];
const int MAXS = maxRecN * 30;
char buf[MAXS];
int readTestData(string testDataPath, Graph &gTestData, vector<int> &idxMap, \Graph &pathL2SmHead, \Graph &pathL2SmTail)
{int i = 0, commaN = 0;FILE *fp;fp = fopen(testDataPath.c_str(), "rb");int len = fread(buf, 1, MAXS, fp);buf[len] = '\0';ID[0][0] = 0;ID[0][1] = 0;set<int> vertexSet;unordered_map<int, int> idxMapRev;for (char *p = buf; *p&&p - buf < len; p++){if (*p == '\n'){++i;ID[i][0] = 0;ID[i][1] = 0;commaN = 0;}else if (*p == ','){vertexSet.insert(ID[i][commaN]);commaN++;}else if (commaN < 2){ID[i][commaN] = ID[i][commaN] * 10 + *p - '0';}}int index = 0;for (auto &it : vertexSet){idxMap.emplace_back(it);idxMapRev.insert(make_pair(it, index));++index;}int from, to;gTestData.resize(index, vector<int>());pathL2SmHead.resize(index, vector<int>());pathL2SmTail.resize(index, vector<int>());for (int j = 0; j < i; ++j){from = idxMapRev[ID[j][0]];to = idxMapRev[ID[j][1]];if (from != to)//no loop,some off line data have loop{gTestData[from].emplace_back(to);if (from < to)pathL2SmHead[from].emplace_back(to);elsepathL2SmTail[from].emplace_back(to);}}for (auto &it : gTestData)sort(it.begin(), it.end());//sort for arranged output resultfor (auto &it : pathL2SmHead)sort(it.begin(), it.end());//sort for arranged output resultfor (auto &it : pathL2SmTail)sort(it.begin(), it.end());//sort for arranged output resultreturn i;
}int getDigits(int v) {if (v < 10) return 1;if (v < 100) return 2;if (v < 1000) return 3;if (v < 1000000000000) {    // 10^12if (v < 100000000) {    // 10^7if (v < 1000000) {  // 10^6if (v < 10000) return 4;return 5 + (v >= 100000); // 10^5}return 7 + (v >= 10000000); // 10^7}if (v < 10000000000) {  // 10^10return 9 + (v >= 1000000000); // 10^9}return 11 + (v >= 100000000000); // 10^11}return 12 + getDigits(v / 1000000000000); // 10^12
}
int intToAscii(int value, char* dst) {static const char digits[] = {"0001020304050607080910111213141516171819""2021222324252627282930313233343536373839""4041424344454647484950515253545556575859""6061626364656667686970717273747576777879""8081828384858687888990919293949596979899" };const int length = getDigits(value);int next = length - 1;while (value >= 100) {const int i = (value % 100) * 2;value /= 100;dst[next - 1] = digits[i];dst[next] = digits[i + 1];next -= 2;}// Handle last 1-2 digitsif (value < 10) {dst[next] = '0' + int(value);}else {int i = int(value) * 2;dst[next - 1] = digits[i];dst[next] = digits[i + 1];}return length;
}void arr2Ascii3(array<int, 3> &arr, char* &p, vector<int> &idxMap)
{int tmp;int dig;dig = intToAscii(idxMap[arr[0]], p);p += dig;for (int i = 1; i < 3; ++i){*p++ = ',';dig = intToAscii(idxMap[arr[i]], p);p += dig;}*p++ = '\n';
}
void arr2Ascii4(array<int, 4> &arr, char* &p, vector<int> &idxMap)
{int tmp;int dig;dig = intToAscii(idxMap[arr[0]], p);p += dig;for (int i = 1; i < 4; ++i){*p++ = ',';dig = intToAscii(idxMap[arr[i]], p);p += dig;}*p++ = '\n';
}
void arr2Ascii5(array<int, 5> &arr, char* &p, vector<int> &idxMap)
{int tmp;int dig;dig = intToAscii(idxMap[arr[0]], p);p += dig;for (int i = 1; i < 5; ++i){*p++ = ',';dig = intToAscii(idxMap[arr[i]], p);p += dig;}*p++ = '\n';
}
void arr2Ascii6(array<int, 6> &arr, char* &p, vector<int> &idxMap)
{int tmp;int dig;dig = intToAscii(idxMap[arr[0]], p);p += dig;for (int i = 1; i < 6; ++i){*p++ = ',';dig = intToAscii(idxMap[arr[i]], p);p += dig;}*p++ = '\n';
}
void arr2Ascii7(array<int, 7> &arr, char* &p, vector<int> &idxMap)
{int tmp;int dig;dig = intToAscii(idxMap[arr[0]], p);p += dig;for (int i = 1; i < 7; ++i){*p++ = ',';dig = intToAscii(idxMap[arr[i]], p);p += dig;}*p++ = '\n';
}
void writeResultData(string resultDataPath, char* &p3, char* &p4, char* &p5, char* p6, char* p7, int cycleCount)
{int sizeL3 = p3 - L3;int sizeL4 = p4 - L4;int sizeL5 = p5 - L5;int sizeL6 = p6 - L6;int sizeL7 = p7 - L7;int totalSize = sizeL3 + sizeL4 + sizeL5 + sizeL6 + sizeL7;memcpy(p3, L4, sizeL4);p3 += sizeL4;memcpy(p3, L5, sizeL5);p3 += sizeL5;memcpy(p3, L6, sizeL6);p3 += sizeL6;memcpy(p3, L7, sizeL7);
#ifdef TESTauto t1 = chrono::steady_clock::now();
#endif//string L3Path = "./L3.txt";FILE *pL3 = fopen(resultDataPath.c_str(), "w+");fprintf(pL3, "%d\n", cycleCount);//fprintf(pL3, "%s", L3);fwrite(L3, 1, totalSize, pL3);fclose(pL3);
#ifdef TESTauto t2 = chrono::steady_clock::now();cout << "fwrite time: " << chrono::duration_cast<chrono::duration<double>>(t2 - t1).count() << " s" << endl;
#endif
}void L3whole(Graph &gTestData, int &vtxNum, array<int, 2> arrL2, vector<vector<array<int, 2>>> &pathL3)
{for (int h = 0; h < vtxNum; h++){for (auto &h0 : gTestData[h]){for (auto &t0 : gTestData[h0]){if (h != t0){arrL2[0] = h0;arrL2[1] = t0;pathL3[h].emplace_back(arrL2);}}}}}void L3head(Graph &pathL2SmHead, Graph &gTestData, int &vtxNum, array<int, 2> arrL2, vector<vector<array<int, 2>>> &pathL3SmHead)
{for (int h = 0; h < vtxNum; h++){for (auto &h0 : pathL2SmHead[h]){for (auto &t0 : gTestData[h0]){if (h < t0){arrL2[0] = h0;arrL2[1] = t0;pathL3SmHead[h].emplace_back(arrL2);}}}}}void L3tail(Graph &gTestData, Graph &pathL2SmTail, int &vtxNum, array<int, 2> arrL2, vector<vector<array<int, 2>>> &pathL3SmTail)
{for (int h = 0; h < vtxNum; h++){for (auto &h0 : gTestData[h]){for (auto &t0 : pathL2SmTail[h0]){if (t0 < h){arrL2[0] = h0;arrL2[1] = t0;pathL3SmTail[h].emplace_back(arrL2);}}}}}void L3tailIdx(Graph &gTestData, Graph &pathL2SmTail, int &vtxNum, array<int, 2> arrL2, vector<vector<array<int, 2>>> &pathL3SmTailIdx)
{for (int h = 0; h < vtxNum; h++){for (auto &h0 : gTestData[h]){for (auto &t0 : pathL2SmTail[h0]){if (t0 < h){arrL2[0] = h;arrL2[1] = h0;pathL3SmTailIdx[t0].emplace_back(arrL2);}}}}}void L4head(vector<vector<array<int, 2>>> &pathL3SmHead, Graph &gTestData, int &vtxNum, array<int, 3> arrL3, vector<vector<array<int, 3>>> &pathL4SmHead)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : pathL3SmHead[h]){//int h0 = it[0];//int h1 = it[1];for (auto &ite : gTestData[it[1]]){int t0 = ite;if (h < t0&&it[0] != t0)//no self loop{arrL3[0] = it[0];arrL3[1] = it[1];arrL3[2] = t0;pathL4SmHead[h].emplace_back(arrL3);}}}}}void L4tailIdx(Graph &gTestData, vector<vector<array<int, 2>>> &pathL3SmTail, int &vtxNum, array<int, 3> arrL3, vector<vector<array<int, 3>>> &pathL4SmTailIdx)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : gTestData[h]){//int h0 = it;for (auto &ite : pathL3SmTail[it]){//t0 = ite[0];//t1 = ite[1];if (h != ite[0] && ite[1] < h)//no self loop and last one is smallest{arrL3[0] = h;arrL3[1] = it;arrL3[2] = ite[0];pathL4SmTailIdx[ite[1]].emplace_back(arrL3);}}}}}void L3cycle(Graph &pathL2SmHead, vector<vector<array<int, 2>>> &pathL3SmTailIdx, int &vtxNum, array<int, 3> arrL3, char* &p3, vector<int> &idxMap, int &cycleCount)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : pathL2SmHead[h]){//h0 = it;for (auto &ite : pathL3SmTailIdx[h]){//t0 = ite[0];//t1 = ite[1];if (it == ite[0])//no self loop{arrL3[0] = h;arrL3[1] = it;arrL3[2] = ite[1];arr2Ascii3(arrL3, p3, idxMap);cycleCount++;}}}}}void L5tailIdx(vector<vector<array<int, 2>>> &pathL3, vector<vector<array<int, 2>>> &pathL3SmTail, int &vtxNum, array<int, 4> arrL4, vector<vector<array<int, 4>>> &pathL5SmTailIdx)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : pathL3[h]){int h0 = it[0];int h1 = it[1];for (auto &ite : pathL3SmTail[h1]){//t0 = ite[0];//t1 = ite[1];if (ite[1] < h&&ite[1] < h0&&h != ite[0] && h0 != ite[0])//small tail and no self loop{arrL4[0] = h;arrL4[1] = h0;arrL4[2] = h1;arrL4[3] = ite[0];pathL5SmTailIdx[ite[1]].emplace_back(arrL4);}}}}}void L4cycle(vector<vector<array<int, 2>>> &pathL3SmHead, vector<vector<array<int, 2>>> &pathL3SmTailIdx, int &vtxNum, array<int, 4> arrL4, char* &p4, vector<int> &idxMap, int &cycleCount)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : pathL3SmHead[h]){int h0 = it[0];int h1 = it[1];for (auto &ite : pathL3SmTailIdx[h]){//t0 = ite[0];//t1 = ite[1];if (ite[0] == h1&&h0 != ite[1])//small tail and no self loop{arrL4[0] = h;arrL4[1] = h0;arrL4[2] = h1;arrL4[3] = ite[1];arr2Ascii4(arrL4, p4, idxMap);cycleCount++;}}}}}void L5cycle(vector<vector<array<int, 3>>> &pathL4SmHead, vector<vector<array<int, 2>>> &pathL3SmTailIdx, int &vtxNum, array<int, 5> arrL5, char* &p5, vector<int> &idxMap, int &cycleCount)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : pathL4SmHead[h]){int h0 = it[0];int h1 = it[1];int h2 = it[2];for (auto &ite : pathL3SmTailIdx[h]){//    t0 = ite[0];// t1 = ite[1];if (h2 == ite[0] && h0 != ite[1] && h1 != ite[1])//cycle and no self loop{arrL5[0] = h;arrL5[1] = h0;arrL5[2] = h1;arrL5[3] = h2;arrL5[4] = ite[1];arr2Ascii5(arrL5, p5, idxMap);cycleCount++;}}}}}void L6cycle(vector<vector<array<int, 3>>> &pathL4SmHead, vector<vector<array<int, 3>>> &pathL4SmTailIdx, int &vtxNum, array<int, 6> arrL6, char* &p6, vector<int> &idxMap, int &cycleCount)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : pathL4SmHead[h]){int h0 = it[0];int h1 = it[1];int h2 = it[2];for (auto &ite : pathL4SmTailIdx[h]){//    t0 = ite[0];// t1 = ite[1];// t2 = ite[2];if (h2 == ite[0] && h0 != ite[1] && h0 != ite[2] && h1 != ite[1] && h1 != ite[2]){arrL6[0] = h;arrL6[1] = h0;arrL6[2] = h1;arrL6[3] = h2;arrL6[4] = ite[1];arrL6[5] = ite[2];arr2Ascii6(arrL6, p6, idxMap);cycleCount++;}}}}}void L7cycle(vector<vector<array<int, 3>>> &pathL4SmHead, vector<vector<array<int, 4>>> &pathL5SmTailIdx, int &vtxNum, array<int, 7> arrL7, char* &p7, vector<int> &idxMap, int &cycleCount)
{for (int h = 0; h < vtxNum; ++h){for (auto &it : pathL4SmHead[h]){int h0 = it[0];int h1 = it[1];int h2 = it[2];for (auto &ite : pathL5SmTailIdx[h]){//t0 = ite[0];// t1 = ite[1];//t2 = ite[2];//  t3 = ite[3];if (!(h2^ite[0])\&&h0 != ite[1] && h1 != ite[1]\&&h0 != ite[2] && h1 != ite[2]\&&h0 != ite[3] && h1 != ite[3]\){arrL7[0] = h;arrL7[1] = h0;arrL7[2] = h1;arrL7[3] = h2;arrL7[4] = ite[1];arrL7[5] = ite[2];arrL7[6] = ite[3];arr2Ascii7(arrL7, p7, idxMap);cycleCount++;}}}}}int main()
{
#ifdef TESTauto s1 = chrono::steady_clock::now();
#endifstring testDataPath = "/data/test_data.txt";string resultDataPath = "/projects/student/result.txt";
#ifdef TESTtestDataPath = "./data/1004812/test_data.txt";resultDataPath = "./data/myresult.txt";
#endifchar* p3 = L3;char* p4 = L4;char* p5 = L5;char* p6 = L6;char* p7 = L7;int cycleCount = 0;int cycleCount3 = 0;int cycleCount4 = 0;int cycleCount5 = 0;int cycleCount6 = 0;int cycleCount7 = 0;Graph gTestData, pathL2SmHead, pathL2SmTail;vector<int> idxMap;int edgeNum = readTestData(testDataPath, gTestData, idxMap, pathL2SmHead, pathL2SmTail);int vtxNum = idxMap.size();vector<vector<array<int, 2>>> pathL3(vtxNum);vector<vector<array<int, 2>>> pathL3SmHead(vtxNum);vector<vector<array<int, 2>>> pathL3SmTail(vtxNum);vector<vector<array<int, 2>>> pathL3SmTailIdx(vtxNum);vector<vector<array<int, 3>>> pathL4(vtxNum);vector<vector<array<int, 3>>> pathL4SmHead(vtxNum);vector<vector<array<int, 3>>> pathL4SmTailIdx(vtxNum);vector<vector<array<int, 4>>> pathL5(vtxNum);vector<vector<array<int, 4>>> pathL5SmTailIdx(vtxNum);array<int, 2> arrL2;array<int, 3> arrL3;array<int, 4> arrL4;array<int, 5> arrL5;array<int, 6> arrL6;array<int, 7> arrL7;thread threads[4];//3whole=2whole+2whole//L3whole(gTestData, vtxNum, arrL2, pathL3);threads[0] = thread(L3whole, ref(gTestData), ref(vtxNum),(arrL2),ref(pathL3));   //threads[0].join();//3head=2head+2whole//L3head(pathL2SmHead, gTestData, vtxNum, arrL2, pathL3SmHead);threads[1] = thread(L3head, ref(pathL2SmHead), ref(gTestData),ref(vtxNum),(arrL2),ref(pathL3SmHead));//3tail=2whole+2tail//L3tailIdx(gTestData, pathL2SmTail, vtxNum, arrL2, pathL3SmTailIdx);threads[2] = thread(L3tailIdx, ref(gTestData), ref(pathL2SmTail),ref(vtxNum),(arrL2),ref(pathL3SmTailIdx));//L3tail(gTestData, pathL2SmTail, vtxNum, arrL2, pathL3SmTail);threads[3] = thread(L3tail, ref(gTestData), ref(pathL2SmTail),ref(vtxNum),(arrL2),ref(pathL3SmTail));for (auto& it : threads) {it.join();}thread threads2[5];//4head=3head+2whole//L4head(pathL3SmHead, gTestData, vtxNum, arrL3, pathL4SmHead);threads2[0] = thread(L4head, ref(pathL3SmHead), ref(gTestData),ref(vtxNum),(arrL3),ref(pathL4SmHead)); //4tail=2whole+3tail  //L4tailIdx(gTestData, pathL3SmTail, vtxNum, arrL3, pathL4SmTailIdx);threads2[1] = thread(L4tailIdx, ref(gTestData), ref(pathL3SmTail),ref(vtxNum),(arrL3),ref(pathL4SmTailIdx));  //3cycle=3head+2whole//L3cycle(pathL2SmHead, pathL3SmTailIdx, vtxNum, arrL3, p3, idxMap, cycleCount);threads2[2] = thread(L3cycle, ref(pathL2SmHead), ref(pathL3SmTailIdx),ref(vtxNum),(arrL3),ref(p3),ref(idxMap),ref(cycleCount3));    //5tail=3whole+3tail//L5tailIdx(pathL3, pathL3SmTail, vtxNum, arrL4, pathL5SmTailIdx);threads2[3] = thread(L5tailIdx, ref(pathL3), ref(pathL3SmTail),ref(vtxNum),(arrL4),ref(pathL5SmTailIdx));  //4cycle=3head+3tail//L4cycle(pathL3SmHead, pathL3SmTailIdx, vtxNum, arrL4, p4, idxMap, cycleCount);threads2[4] = thread(L4cycle, ref(pathL3SmHead), ref(pathL3SmTailIdx),ref(vtxNum),(arrL4),ref(p4),ref(idxMap),ref(cycleCount4));for (auto& it : threads2) {it.join();}thread threads3[3];//5cycle=4head+3tail//L5cycle(pathL4SmHead, pathL3SmTailIdx, vtxNum, arrL5, p5, idxMap, cycleCount);threads3[0] = thread(L5cycle, ref(pathL4SmHead), ref(pathL3SmTailIdx),ref(vtxNum),(arrL5),ref(p5),ref(idxMap),ref(cycleCount5));//6cycle=4head+4tail //L6cycle(pathL4SmHead, pathL4SmTailIdx, vtxNum, arrL6, p6, idxMap, cycleCount);threads3[1] = thread(L6cycle, ref(pathL4SmHead), ref(pathL4SmTailIdx),ref(vtxNum),(arrL6),ref(p6),ref(idxMap),ref(cycleCount6));//7cycle=4head+5tail//L7cycle(pathL4SmHead, pathL5SmTailIdx, vtxNum, arrL7, p7, idxMap, cycleCount);threads3[2] = thread(L7cycle, ref(pathL4SmHead), ref(pathL5SmTailIdx),ref(vtxNum),(arrL7),ref(p7),ref(idxMap),ref(cycleCount7));for (auto& it : threads3) {it.join();}cycleCount = cycleCount3+cycleCount4+cycleCount5+cycleCount6+cycleCount7;writeResultData(resultDataPath, p3, p4, p5, p6, p7, cycleCount);
#ifdef TESTcout << "cycle count: " << cycleCount << endl;
#endifreturn 0;
}

3、

//多线程实现
#include <iostream>
#include <algorithm>
#include <unordered_map>
#include <string.h>
#include <vector>
#include<chrono>
#include<mutex>
#include<thread>
#include <sys/mman.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
//#define TEST
#define is_digit(c) (c >= '0')
using namespace std;const int MAX_REC = 2000000;
const int MAX_CYCLE_COUNT = 20000000;const int MAX_ID_COUNT = 500000;//maximum:MAX_REC<<2
const int MAX_DEG = 300;//maximum:MAX_ID_COUNT-1
const int MAX_TEST_DATA_SIZE = MAX_REC * 33;//maximum: MAX_REC * 33
const int MAX_CYCLE3_CHAR_SIZE = MAX_CYCLE_COUNT*0.25 * 33;//maximum: MAX_CYCLE_COUNT * 33
const int MAX_CYCLE4_CHAR_SIZE = MAX_CYCLE_COUNT*0.25 * 44;//maximum: MAX_CYCLE_COUNT * 44
const int MAX_CYCLE5_CHAR_SIZE = MAX_CYCLE_COUNT*0.5 * 55;//maximum: MAX_CYCLE_COUNT * 55
const int MAX_CYCLE6_CHAR_SIZE = MAX_CYCLE_COUNT*0.8 * 66;//maximum: MAX_CYCLE_COUNT * 66
const int MAX_CYCLE7_CHAR_SIZE = MAX_CYCLE_COUNT*0.8 * 77;//maximum: MAX_CYCLE_COUNT * 77
const int MAX_CYCLE_CHAR_SIZE = MAX_CYCLE_COUNT * 77;//maximum: MAX_CYCLE_COUNT * 77struct Origin_Id{int id;char* pos;int len;Origin_Id() :id(), pos(), len(){};Origin_Id(int id_, char* pos_, int len_) :id(id_), pos(pos_), len(len_){};
};string test_data_path = "/data/test_data.txt";
string result_data_path = "/projects/student/result.txt";
//char file_ptr[MAX_TEST_DATA_SIZE];
//int Rec[MAX_REC][3];
int(*Rec)[3];
//Origin_Id id_set[MAX_REC];
Origin_Id* id_set;
int rec_count = 0;
int id_count = 0;
void read_raw_data()
{char* file_ptr = new char[MAX_TEST_DATA_SIZE];id_set = new Origin_Id[MAX_REC * 2];Rec = new int[MAX_REC][3];FILE *fp;fp = fopen(test_data_path.c_str(), "rb");int data_sz = fread(file_ptr, 1, MAX_TEST_DATA_SIZE, fp);char* buf_end = file_ptr + data_sz;char* buf = file_ptr;int comma_count = 0;int cur_num = 0;int id_len = 0;char* id_pos = buf;int comma_num = 0;while (buf < buf_end){switch (*buf){case '\n'://reset counter{rec_count++;Rec[rec_count][0] = 0;Rec[rec_count][1] = 0;Rec[rec_count][2] = 0;comma_count = 0;cur_num = 0;id_len = 0;id_pos = buf + 1;break;}case ',':{Origin_Id id_tmp(cur_num, id_pos, id_len);id_set[id_count++] = id_tmp;comma_count++;cur_num = 0;id_len = 0;id_pos = buf + 1;break;}default:{if (is_digit(*buf)){cur_num = cur_num * 10 + *buf - '0';Rec[rec_count][comma_count] = cur_num;id_len++;}}}buf++;}
}bool id_cmp(Origin_Id a, Origin_Id b)
{return a.id<b.id;
}
bool id_unq(Origin_Id a, Origin_Id b)
{return a.id == b.id;
}
int id_num;
void id_map(unordered_map<int, int> &id_map_rev)
{//unordered_map<int, int> id_map_rev;sort(id_set, id_set + id_count, id_cmp);id_num = unique(id_set, id_set + id_count, id_unq) - id_set;//now id_set is id_mapfor (int i = 0; i < id_num; ++i){id_map_rev.insert({ id_set[i].id, i });}}
struct Edge{int id;int money;Edge() :id(), money(){};Edge(int id_, int money_) :id(id_), money(money_){};
}__attribute__((packed));
//__attribute__((aligned(x))),
vector<vector<Edge>> Graph(MAX_ID_COUNT);
vector<vector<Edge>> ReGraph(MAX_ID_COUNT);//Edge Graph[MAX_ID_COUNT][MAX_DEG];
//Edge ReGraph[MAX_ID_COUNT][MAX_DEG];
int deg_G[MAX_ID_COUNT];
int deg_RG[MAX_ID_COUNT];
bool id_cmp_edge(Edge a, Edge b)
{return a.id<b.id;
}
void buid_graph()
{unordered_map<int, int> id_map_rev;id_map(id_map_rev);int id_from;int id_to;int money;for (int i = 0; i < rec_count; i++){id_from = id_map_rev[Rec[i][0]];id_to = id_map_rev[Rec[i][1]];if (id_from == id_to) continue;money = Rec[i][2];Edge edge_tmp(id_to, money);
/*      if (money > 715827882){edge_tmp.money3 = 2147483647;}if (money > 429496729){edge_tmp.money5 = 2147483647;} */Graph[id_from].emplace_back(edge_tmp);edge_tmp.id = id_from;ReGraph[id_to].emplace_back(edge_tmp);deg_G[id_from]++;deg_RG[id_to]++;}for (int i = 0; i<id_num; i++){sort(Graph[i].begin(), Graph[i].begin() + deg_G[i], id_cmp_edge);//sort(ReGraph[i].begin(), ReGraph[i].begin() + deg_RG[i], id_cmp_edge);}
}inline bool judgeMoneyRe(double money_3,long long int money5,int money)
{return (money_3<=money)&&(money<=money5);
}
inline int mtp3(int momey)
{int money3;
if(momey<=715827882){money3 = momey*3;}else{money3 = 2147483647;}
return money3;
}inline int mtp5(int momey)
{int money5;
if(momey<=429496729){money5 = momey*5;}else{money5 = 2147483647;}
return money5;
}bool DFSInv(int &minCur, \bool(&toMinCurL1)[MAX_ID_COUNT], \bool(&toMinCurL2)[MAX_ID_COUNT], \bool(&toMinCurL3)[MAX_ID_COUNT], \int(&toMinCurL1Idx)[MAX_ID_COUNT])
{int money[3];double money_3[3];long int money5[3];bool flg = false;for (int i = 0; i < deg_RG[minCur]; ++i){int it = ReGraph[minCur][i].id;int money0 = ReGraph[minCur][i].money;money[0] = money0;money_3[0] = money0/3.0;money5[0] = mtp5(money0);if (minCur < it){toMinCurL1[it] = true;toMinCurL1Idx[it] = i;for (int j = 0; j < deg_RG[it]; ++j){int ite = ReGraph[it][j].id;int money1 = ReGraph[it][j].money;          if (minCur < ite&&judgeMoneyRe(money_3[0],money5[0],money1)){//if(!flg)flg = true;toMinCurL2[ite] = true;money[1] = money1;money_3[1] = money1/3.0;money5[1] = mtp5(money1);for (int k = 0; k < deg_RG[ite]; ++k){int iter = ReGraph[ite][k].id;int money2 = ReGraph[ite][k].money;         if (minCur < iter&&iter != it&&judgeMoneyRe(money_3[1],money5[1],money2)){toMinCurL3[iter] = true;}}}}}}return flg;
}bool judgeMoneyCycle(int(&stk_mny)[7], int num)
{int money3;int money5;for (int i = 0; i < num - 1; ++i){if(stk_mny[i]<=715827882){money3 = stk_mny[i]*3;}else{money3 = 2147483647;}if(stk_mny[i + 1]<=429496729){money5 = stk_mny[i+1]*5;}else{money5 = 2147483647;}if (money3<stk_mny[i + 1] || stk_mny[i]>money5)return false;}   if(stk_mny[num-1]<=715827882){money3 = stk_mny[num-1]*3;}else{money3 = 2147483647;}if(stk_mny[0]<=429496729){money5 = stk_mny[0]*5;}else{money5 = 2147483647;}if (money3<stk_mny[0] || stk_mny[num - 1]>money5)return false;return true;}void piece_memcpy(char *dest,char *src, const int n) {switch(n) {case 33 : *dest++ = *src++;case 32 : *dest++ = *src++;case 31 : *dest++ = *src++;case 30 : *dest++ = *src++;case 29 : *dest++ = *src++;case 28 : *dest++ = *src++;case 27 : *dest++ = *src++;case 26 : *dest++ = *src++;case 25 : *dest++ = *src++;case 24 : *dest++ = *src++;case 23 : *dest++ = *src++;case 22 : *dest++ = *src++;case 21 : *dest++ = *src++;case 20 : *dest++ = *src++;case 19 : *dest++ = *src++;case 18 : *dest++ = *src++;case 17 : *dest++ = *src++;case 16 : *dest++ = *src++;case 15 : *dest++ = *src++;case 14 : *dest++ = *src++;case 13 : *dest++ = *src++;case 12 : *dest++ = *src++;case 11 : *dest++ = *src++;case 10 : *dest++ = *src++;case 9 : *dest++ = *src++;case 8 : *dest++ = *src++;case 7 : *dest++ = *src++;case 6 : *dest++ = *src++;case 5 : *dest++ = *src++;case 4 : *dest++ = *src++;case 3 : *dest++ = *src++;case 2 : *dest++ = *src++;case 1 : *dest++ = *src++; break;default:    exit(0);}
}inline int idToAscii(int value, char* dst) {int len = id_set[value].len;memcpy(dst, id_set[value].pos, len);return len;
}
char* L[5];
char* pL[5];void arr2Ascii(int(&arr)[7], int num)
{char* &p = pL[num - 3];int dig;dig = idToAscii(arr[0], p);p += dig;for (int i = 1; i < num; ++i){*p++ = ',';dig = idToAscii(arr[i], p);p += dig;}*p++ = '\n';
}char* pT[4][5];
char* T[4][5];
int cycle_counts[4];
char* task_cycle_pos[20000][10];
void arr2AsciiThread(int(&arr)[7], int num, int &thread_id, int &task_id)
{int cycle_type = num - 3;char* &p = pT[thread_id][cycle_type];int dig;dig = idToAscii(arr[0], p);p += dig;for (int i = 1; i < num; ++i){*p++ = ',';dig = idToAscii(arr[i], p);p += dig;}*p++ = '\n';task_cycle_pos[task_id][cycle_type + 5] = p;
}inline void pushMoney(int (&money)[6],double (&money_p2)[6],int (&money3)[6],int money_tmp, int layer)
{money[layer] = money_tmp;money_p2[layer] = 0.2*money_tmp;money3[layer] = mtp3(money_tmp);
}inline bool judgeMoney(double money_p2,int money3, int money)
{return (money_p2<=money&&money<=money3);
}void DFS(const int minCur, \int(&stk)[7], int(&stk_mny)[7], bool(&onStack)[MAX_ID_COUNT], bool(&toMinCurL1)[MAX_ID_COUNT], \bool(&toMinCurL2)[MAX_ID_COUNT], \bool(&toMinCurL3)[MAX_ID_COUNT], \int(&toMinCurL1Idx)[MAX_ID_COUNT], int &thread_id, int &task_id,\int (&money)[6],double (&money_p2)[6],int (&money3)[6])
{stk[0] = minCur;int s1 = minCur;onStack[s1] = true;int money_tmp;double moneyO_3;int moneyO5;for (int i2 = 0; i2 < deg_G[s1]; i2++){auto s2 = Graph[s1][i2].id;if (s2 <= minCur) continue;stk[1] = s2;onStack[s2] = true;money_tmp = Graph[s1][i2].money;pushMoney(money,money_p2,money3, money_tmp, 0);moneyO_3 = money_tmp/3.0;moneyO5 = mtp5(money_tmp);     for (int i3 = 0; i3 < deg_G[s2]; i3++){auto s3 = Graph[s2][i3].id;money_tmp = Graph[s2][i3].money;if (s3 <= minCur || onStack[s3] || !judgeMoney(money_p2[0],money3[0],money_tmp)) continue;pushMoney(money,money_p2,money3, money_tmp, 1);//pushMoney(stk_mny, s2, i3, 1);stk[2] = s3;onStack[s3] = true;if (toMinCurL1[s3]){//pushMoneyLast(stk_mny, minCur, toMinCurL1Idx[s3], 2);money_tmp = ReGraph[minCur][toMinCurL1Idx[s3]].money;//if (judgeMoneyCycle(stk_mny, 3))if (judgeMoney(money_p2[1],money3[1],money_tmp)&&judgeMoney(moneyO_3,moneyO5,money_tmp)){arr2AsciiThread(stk, 3, thread_id, task_id);cycle_counts[thread_id]++;}}for (int i4 = 0; i4 < deg_G[s3]; i4++){auto s4 = Graph[s3][i4].id;money_tmp = Graph[s3][i4].money;if (onStack[s4] || s4 <= minCur||!judgeMoney(money_p2[1],money3[1],money_tmp)) continue;stk[3] = s4;onStack[s4] = true;//pushMoney(stk_mny, s3, i4, 2);pushMoney(money,money_p2,money3, money_tmp, 2);if (toMinCurL1[s4]){money_tmp = ReGraph[minCur][toMinCurL1Idx[s4]].money;//pushMoneyLast(stk_mny, minCur, toMinCurL1Idx[s4], 3);//if (judgeMoneyCycle(stk_mny, 4))if (judgeMoney(money_p2[2],money3[2],money_tmp)&&judgeMoney(moneyO_3,moneyO5,money_tmp)){arr2AsciiThread(stk, 4, thread_id, task_id);cycle_counts[thread_id]++;}}for (int i5 = 0; i5 < deg_G[s4]; i5++){auto s5 = Graph[s4][i5].id;money_tmp = Graph[s4][i5].money;if (minCur >= s5 || onStack[s5] || !judgeMoney(money_p2[2],money3[2],money_tmp)||!toMinCurL3[s5] && !toMinCurL2[s5] && !toMinCurL1[s5]) continue;stk[4] = s5;pushMoney(money,money_p2,money3, money_tmp, 3);           if (toMinCurL1[s5]){//stk[4] = s5;//pushMoney(stk_mny, s4, i5, 3);//pushMoneyLast(stk_mny, minCur, toMinCurL1Idx[s5], 4);  money_tmp = ReGraph[minCur][toMinCurL1Idx[s5]].money;if (judgeMoney(money_p2[3],money3[3],money_tmp)&&judgeMoney(moneyO_3,moneyO5,money_tmp)){arr2AsciiThread(stk, 5, thread_id, task_id);cycle_counts[thread_id]++;}}if (toMinCurL2[s5]){//stk[4] = s5;//pushMoney(stk_mny, s4, i5, 3);//pushMoney(money,money_p2,money3, money_tmp, 3);for (int i6 = 0; i6 < deg_G[s5]; i6++){auto s6 = Graph[s5][i6].id;money_tmp = Graph[s5][i6].money;if (minCur<s6&&!onStack[s6] && toMinCurL1[s6]&&judgeMoney(money_p2[3],money3[3],money_tmp)){stk[5] = s6;//money_tmp = Graph[s5][i6].money;//pushMoney(stk_mny, s5, i6, 4);//pushMoneyLast(stk_mny, minCur, toMinCurL1Idx[s6], 5);pushMoney(money,money_p2,money3, money_tmp, 4);money_tmp = ReGraph[minCur][toMinCurL1Idx[s6]].money;if (judgeMoney(money_p2[4],money3[4],money_tmp)&&judgeMoney(moneyO_3,moneyO5,money_tmp)){arr2AsciiThread(stk, 6, thread_id, task_id);cycle_counts[thread_id]++;}}}}if (toMinCurL3[s5]){//stk[4] = s5;onStack[s5] = true;//pushMoney(stk_mny, s4, i5, 3);for (int i6 = 0; i6 < deg_G[s5]; i6++){auto s6 = Graph[s5][i6].id;money_tmp = Graph[s5][i6].money;if (toMinCurL2[s6] && !onStack[s6] && minCur <= s6&&judgeMoney(money_p2[3],money3[3],money_tmp)){stk[5] = s6;//pushMoney(stk_mny, s5, i6, 4);pushMoney(money,money_p2,money3, money_tmp, 4);for (int i7 = 0; i7 < deg_G[s6]; i7++){auto s7 = Graph[s6][i7].id;money_tmp = Graph[s6][i7].money;if (minCur<s7&&toMinCurL1[s7] && !onStack[s7]&&judgeMoney(money_p2[4],money3[4],money_tmp)){stk[6] = s7;pushMoney(money,money_p2,money3, money_tmp, 5);//pushMoney(stk_mny, s6, i7, 5);//pushMoneyLast(stk_mny, minCur, toMinCurL1Idx[s7], 6);money_tmp = ReGraph[minCur][toMinCurL1Idx[s7]].money;//if (judgeMoneyCycle(stk_mny, 7))if (judgeMoney(money_p2[5],money3[5],money_tmp)&&judgeMoney(moneyO_3,moneyO5,money_tmp)){arr2AsciiThread(stk, 7, thread_id, task_id);cycle_counts[thread_id]++;}}}}}onStack[s5] = false;}}onStack[s4] = false;}onStack[s3] = false;}onStack[s2] = false;}onStack[minCur] = false;
}void do_task(int cur_task_id, int(&stk)[7], int(&stk_mny)[7], bool(&onStack)[MAX_ID_COUNT], \bool(&toMinCurL1)[MAX_ID_COUNT], bool(&toMinCurL2)[MAX_ID_COUNT], bool(&toMinCurL3)[MAX_ID_COUNT], \int(&toMinCurL1Idx)[MAX_ID_COUNT], vector<int> &task_set, int &thread_id, int &task_id,\int (&money)[6],double (&money_p2)[6],int (&money3)[6])
{int from = task_set[cur_task_id];int to = task_set[cur_task_id + 1];for (int minCur = from; minCur < to; ++minCur){memset(toMinCurL1, false, id_num*sizeof(bool));memset(toMinCurL2, false, id_num*sizeof(bool));memset(toMinCurL3, false, id_num*sizeof(bool));bool flg = DFSInv(minCur, toMinCurL1, toMinCurL2, toMinCurL3, toMinCurL1Idx);if (flg == true){DFS(minCur, stk, stk_mny, onStack, toMinCurL1, toMinCurL2, toMinCurL3, toMinCurL1Idx, thread_id, task_id,money,money_p2,money3);}}
}
vector<mutex> mtx(20000);
void dfs_thread(int thread_id, vector<int> &task_set)
{bool onStack[MAX_ID_COUNT] = { false };int stk[7];int stk_mny[7];bool toMinCurL1[MAX_ID_COUNT] = { false };bool toMinCurL2[MAX_ID_COUNT] = { false };bool toMinCurL3[MAX_ID_COUNT] = { false };int toMinCurL1Idx[MAX_ID_COUNT];int cur_task_id = 0;char* ptr_from_cur[5] = { pT[thread_id][0], pT[thread_id][1], pT[thread_id][2],pT[thread_id][3],pT[thread_id][4] };int money[6];double money_p2[6];int money3[6];while(cur_task_id<task_set.size()-1){if(mtx[cur_task_id].try_lock()){memcpy(task_cycle_pos[cur_task_id], ptr_from_cur, sizeof(char*)* 5);memcpy(task_cycle_pos[cur_task_id]+5, ptr_from_cur, sizeof(char*)* 5);do_task(cur_task_id, stk, stk_mny, onStack, toMinCurL1, toMinCurL2, toMinCurL3, toMinCurL1Idx, task_set,thread_id,cur_task_id,\money,money_p2,money3);memcpy(ptr_from_cur, task_cycle_pos[cur_task_id]+5, sizeof(char*)* 5);}cur_task_id++;}
}int task_num;
void run_dfs()
{vector<int> task_set;task_num = id_num / 100+1;for (int i = 0; i < task_num; i++){task_set.push_back(i * 100);}task_set.push_back(id_num+1);thread threads[4];for (int i = 0; i < 4; i++){threads[i] = thread(dfs_thread, i, ref(task_set));}for (auto& it : threads) {it.join();}
}int getDigits(int v) {if (v < 10) return 1;if (v < 100) return 2;if (v < 1000) return 3;if (v < 1000000000000) {    // 10^12if (v < 100000000) {    // 10^7if (v < 1000000) {  // 10^6if (v < 10000) return 4;return 5 + (v >= 100000); // 10^5}return 7 + (v >= 10000000); // 10^7}if (v < 10000000000) {  // 10^10return 9 + (v >= 1000000000); // 10^9c}return 11 + (v >= 100000000000); // 10^11}return 12 + getDigits(v / 1000000000000); // 10^12
}
int intToAscii(int value, char* dst) {static const char digits[] = {"0001020304050607080910111213141516171819""2021222324252627282930313233343536373839""4041424344454647484950515253545556575859""6061626364656667686970717273747576777879""8081828384858687888990919293949596979899" };const int length = getDigits(value);int next = length - 1;while (value >= 100) {const int i = (value % 100) * 2;value /= 100;dst[next - 1] = digits[i];dst[next] = digits[i + 1];next -= 2;}// Handle last 1-2 digitsif (value < 10) {dst[next] = '0' + int(value);}else {int i = int(value) * 2;dst[next - 1] = digits[i];dst[next] = digits[i + 1];}return length;
}
char *LAll;
int cycleCount = 0;
void write_th(int th_id,int (&task_sz)[5],void *(&dst_ptr))
{int from = task_sz[th_id];int size = task_sz[th_id+1]-from;// cout<<endl<<th_id<<" "<<from<<" "<<size<<endl;// cout<<(char*)dst_ptr+size-(char*)dst_ptr<<endl;memcpy((char*)dst_ptr+from,LAll+from,size);
}
void write_result()
{char* pAll = LAll;int dig = intToAscii(cycleCount, pAll);int sz;pAll += dig;*pAll++ = '\n';for (int j = 0; j < 5; j++){for (int i = 0; i < task_num; i++){auto pos_ptr = task_cycle_pos[i];sz = pos_ptr[j + 5] - pos_ptr[j];memcpy(pAll, (pos_ptr[j]), sz);pAll += sz;}}int total = pAll - LAll;int task_sz = total/4;int task_set[5] = {0,task_sz,2*task_sz,3*task_sz,total+1};int fd = open(result_data_path.c_str(),O_RDWR|O_CREAT,0666);//cout<<111<<endl;truncate(result_data_path.c_str(),total);//cout<<total<<endl;//cout<<task_sz<<endl;// for(int i=0;i<5;i++)// {// cout<<task_set[i]<<endl;// }void *dst_ptr=mmap(NULL,total,PROT_WRITE|PROT_READ,MAP_SHARED,fd,0);//memcpy(dst_ptr,LAll,total);thread threads[4];for (int i = 0; i < 4; i++){threads[i] = thread(write_th, i, ref(task_set),ref(dst_ptr));}for (auto& it : threads) {it.join();}//exit(0);//munmap(dst_ptr,total);
}int main()
{
#ifdef TESTtest_data_path = "./data/std/test_data.txt";result_data_path = "./data/myresult.txt";auto s1 = chrono::steady_clock::now();
#endif// int a = 3;// double b = 3.3;// bool flg = (a>b);// cout << flg  <<endl;read_raw_data();
#ifdef TESTauto s2 = chrono::steady_clock::now();cout << "read data time: " << chrono::duration_cast<chrono::duration<double>>(s2 - s1).count() << " s" << endl;
#endif//id_map();buid_graph();
#ifdef TESTauto s3 = chrono::steady_clock::now();cout << "build graph time: " << chrono::duration_cast<chrono::duration<double>>(s3 - s2).count() << " s" << endl;
#endif// L[0] = new char[MAX_CYCLE3_CHAR_SIZE / 10];// L[1] = new char[MAX_CYCLE4_CHAR_SIZE / 100];// L[2] = new char[MAX_CYCLE5_CHAR_SIZE / 100];// L[3] = new char[MAX_CYCLE6_CHAR_SIZE / 100];// L[4] = new char[MAX_CYCLE7_CHAR_SIZE / 100];for (int i = 0; i<5; ++i){pL[i] = L[i];}LAll = new char[MAX_CYCLE_CHAR_SIZE];for (int i = 0; i < 4; i++){T[i][0] = new char[MAX_CYCLE3_CHAR_SIZE];pT[i][0] = T[i][0];T[i][1] = new char[MAX_CYCLE4_CHAR_SIZE];pT[i][1] = T[i][1];T[i][2] = new char[MAX_CYCLE5_CHAR_SIZE];pT[i][2] = T[i][2];T[i][3] = new char[MAX_CYCLE6_CHAR_SIZE];pT[i][3] = T[i][3];T[i][4] = new char[MAX_CYCLE7_CHAR_SIZE];pT[i][4] = T[i][4];}run_dfs();
#ifdef TESTauto s4 = chrono::steady_clock::now();cout << "dfs time: " << chrono::duration_cast<chrono::duration<double>>(s4 - s3).count() << " s" << endl;
#endiffor (int i = 0; i < 4; i++)cycleCount += cycle_counts[i];write_result();
#ifdef TESTauto s5 = chrono::steady_clock::now();cout << "write result time: " << chrono::duration_cast<chrono::duration<double>>(s5 - s4).count() << " s" << endl;
#endif
#ifdef TESTcout << id_num << endl;cout << cycleCount << endl;
#endifexit(0);
}

C++ 高性能计算之多线程简单基础入门教程相关推荐

  1. Python基础入门教程:使用 Python 3 协程快速获得一个代理池

    Python基础入门教程:使用 Python 3 协程快速获得一个代理池 前言 在执行 IO 密集型任务的时候,程序会因为等待 IO 而阻塞.比如我们使用 requests 库来进行网络爬虫请求的话, ...

  2. python怎么输出浮点数_python 零基础入门教程第 2 章:基本数据类型 (一)

    一.什么是数据类型 编程语言通过一些复杂的计算机物理底层机制,创造不同类型的数据,用来表示现实世界中的不同信息,以便于计算机更好的存储和计算. 每种编程语言都会有一些基本的数据类型用来表示现实世界中的 ...

  3. python免费入门手册-Python 基础入门教程

    Python是一种解释型.面向对象.动态数据类型的高级程序设计语言. Python由Guido van Rossum于1989年底发明,第一个公开发行版发行于1991年. <Python 基础入 ...

  4. android 编辑9图片,Android基础入门教程——1.6 .9(九妹)图片怎么玩

    Android基础入门教程--1.6 .9(九妹)图片怎么玩 Android基础入门教程 1.本节引言: 可能有的一些疑问: 1.什么是.9图片? 答:图片后缀名前有.9的图片,如pic1.9.png ...

  5. Mongodb最基础入门教程

    Mongodb最基础入门教程 如果想了解一下redis的入门教程,可以去看一下我的上一篇博客 Mongodb的安装大家可以参考一下其他博主的博客,这里我就不做介绍了.不过值得注意的是,在Linux版本 ...

  6. QR code 二维码基础入门教程(二)

    QR code 二维码基础入门教程(二) 承接上文,让我们继续下面的步骤 纠错码编码 先说说纠错容量 纠错码可以纠正两种错误: 拒读错误(错误码位置已知),是一个没有扫到或者无法译码的符号字符,需要一 ...

  7. QR code 二维码基础入门教程

    QR code 二维码基础入门教程 本文为 QR Code Tutorial: Introduction 的总结,详细内容请查看原文 Introduction History and Informat ...

  8. android设置webview缓存目录,Android基础入门教程——7.5.5 WebView缓存问题

    Android基础入门教程--7.5.5 WebView缓存问题 Android基础入门教程 本节引言:现在很多门户类信息网站,比如虎嗅,ifanr,钛媒体等等的APP,简单点说是信息阅读类的APP, ...

  9. Python新手基础入门教程:如何像面试官展示你的coding能力

    Python新手基础入门教程:如何像面试官展示你的coding能力 作为程序员,我们通常在面试的时候面试官会出一些题来让你解决,作为优秀的程秀员,除了需要具备解决问题的思路以外,代码的质量也很关键.因 ...

最新文章

  1. python字符串按长度分割_python 按照固定长度分割字符串的方法小结
  2. DNS(二)--正反解析及主从配置
  3. js去除字符串数字前面的0_JS基础复习-七种基本数据类型
  4. 第一部分 Mysql的基础
  5. 浏览器快捷键指南_快速但完整的IndexedDB指南以及在浏览器中存储数据
  6. 解决:Error response from daemon: Cannot restart container xxx: driver failed programming external
  7. 修改Windows 2003/2008/2012远程桌面服务端口号
  8. Java面试题300道
  9. Android中微信抢红包插件原理解析和开发实现
  10. 开源的看板管理工具Wekan
  11. 行为决策学入门书籍推荐《别做正常的傻瓜》
  12. Unity:骨骼动画
  13. 程序员七夕表白攻略:原来数学才是世界上最浪漫的学科!
  14. Netty第二章 2020 7-4——Netty在Dubbo中的应用(3)关于单一长连接
  15. WPF DataGrid MVVM 绑定 SelectedCells
  16. 人大金仓数据库:睡在甲骨文卧榻之侧
  17. upnp 播放器 android,基于Android系统的UPNP媒体播放器的研究与实现
  18. c语言如何乘分数,C语言分数相乘程序简化问题。
  19. DenseNet——密集连接的卷积神经网络
  20. 如何在win10系统上使用旧版本的IE浏览器

热门文章

  1. 一篇有关LightSwitch部署的指南,来自 Beth Massi 博客
  2. SM2SM4实现字符串和文件的数字信封(小文件不支持超过堆内存的大文件)
  3. 一个字节的网络漫游故事独白
  4. 基于UIAutomation+Python+Unittest+Beautifulreport的WindowsGUI自动化测试框架common目录解析
  5. 2022.2.28总结
  6. win10中关于用粘滞键留下后门
  7. CAD与ANSYS面域的创建与导入
  8. 做steam/csgo搬砖靠谱吗?
  9. Linux每日背诵(七)
  10. 小度关灯Blinker