R代码最常用的10种机器学习算法在Python和R中的代码对比:

① 线性回归算法(Linear Regression)

② 逻辑回归算法(Logistic Regression)

③ 决策树算法(Decision Tree

④ 支持向量机算法(SVM)

⑤ 朴素贝叶斯算法(Naive Bayes)

⑥ K邻近算法(k- Nearest Neighbors,kNN)

⑦ K均值算法(k-Means)

⑧ 随机森林算法(Random Forest)

⑨ 主成分分析算法(PCA)

⑩ 梯度提升树(Gradient Boosting)

  1. GBM
  2. XGBoos
  3. LightGBM
  4. CatBoost

一、线性回归算法(Linear Regression)

线性回归主要有两种类型:简单线性回归和多元线性回归。简单线性回归的特征是一个自变量。而且,多元线性回归(顾名思义)的特征是多个(超过1个)自变量。在查找最佳拟合线时,可以拟合多项式或曲线回归。这些被称为多项式或曲线回归。

1、Python代码

# importing required libraries
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error# read the train and test dataset
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')print(train_data.head())# shape of the dataset
print('\nShape of training data :',train_data.shape)
print('\nShape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Item_Outlet_Sales# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Item_Outlet_Sales'],axis=1)
train_y = train_data['Item_Outlet_Sales']# seperate the independent and target variable on training data
test_x = test_data.drop(columns=['Item_Outlet_Sales'],axis=1)
test_y = test_data['Item_Outlet_Sales']'''
Create the object of the Linear Regression model
You can also add other parameters and test your code here
Some parameters are : fit_intercept and normalize
Documentation of sklearn LinearRegression: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html'''
model = LinearRegression()# fit the model with the training data
model.fit(train_x,train_y)# coefficeints of the trained model
print('\nCoefficient of model :', model.coef_)# intercept of the model
print('\nIntercept of model',model.intercept_)# predict the target on the test dataset
predict_train = model.predict(train_x)
print('\nItem_Outlet_Sales on training data',predict_train) # Root Mean Squared Error on training dataset
rmse_train = mean_squared_error(train_y,predict_train)**(0.5)
print('\nRMSE on train dataset : ', rmse_train)# predict the target on the testing dataset
predict_test = model.predict(test_x)
print('\nItem_Outlet_Sales on test data',predict_test) # Root Mean Squared Error on testing dataset
rmse_test = mean_squared_error(test_y,predict_test)**(0.5)
print('\nRMSE on test dataset : ', rmse_test)

2、R代码

#Load Train and Test datasets
#Identify feature and response variable(s) and values must be numeric and numpy arrays
x_train <- input_variables_values_training_datasets
y_train <- target_variables_values_training_datasets
x_test <- input_variables_values_test_datasets
x <- cbind(x_train,y_train)
# Train the model using the training sets and check score
linear <- lm(y_train ~ ., data = x)
summary(linear)
#Predict Output
predicted= predict(linear,x_test)

二、逻辑回归算法(Logistic Regression)

不要被它的名字弄糊涂了!它是一种分类,而不是回归算法。它用于根据给定的自变量集估计离散值(二进制值,如0/1,是/否,真/假)。简而言之,它通过将数据拟合到logit函数来预测事件发生的概率。因此,它也被称为logit回归。由于它预测概率,因此其输出值介于 0 和 1 之间(如预期的那样)。

同样,让我们通过一个简单的例子来尝试理解这一点。

假设你的朋友给了你一个谜题来解决。只有2个结果场景 - 要么你解决它,要么你不解决它。现在想象一下,你正在接受各种各样的谜题/测验,试图了解你擅长哪些主题。这项研究的结果将是这样的 - 如果你得到一个基于三角学的十年级问题,你有70%的1可能性来解决它。另一方面,如果是五年级历史问题,得到答案的概率只有30%。这就是 Logistic 回归为您提供的。

1、Python代码

# importing required libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')print(train_data.head())# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''
Create the object of the Logistic Regression model
You can also add other parameters and test your code here
Some parameters are : fit_intercept and penalty
Documentation of sklearn LogisticRegression: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html'''
model = LogisticRegression()# fit the model with the training data
model.fit(train_x,train_y)# coefficeints of the trained model
print('Coefficient of model :', model.coef_)# intercept of the model
print('Intercept of model',model.intercept_)# predict the target on the train dataset
predict_train = model.predict(train_x)
print('Target on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)

2、R代码

x <- cbind(x_train,y_train)
# Train the model using the training sets and check score
logistic <- glm(y_train ~ ., data = x,family='binomial')
summary(logistic)
#Predict Output
predicted= predict(logistic,x_test)

三、决策树算法(Decision Tree)

1、Python代码

# importing required libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''
Create the object of the Decision Tree model
You can also add other parameters and test your code here
Some parameters are : max_depth and max_features
Documentation of sklearn DecisionTreeClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html'''
model = DecisionTreeClassifier()# fit the model with the training data
model.fit(train_x,train_y)# depth of the decision tree
print('Depth of the Decision Tree :', model.get_depth())# predict the target on the train dataset
predict_train = model.predict(train_x)
print('Target on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)

2、R代码

library(rpart)
x <- cbind(x_train,y_train)
# grow tree
fit <- rpart(y_train ~ ., data = x,method="class")
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

四、支持向量机算法(SVM)

  1. Python代码
# importing required libraries
import pandas as pd
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''
Create the object of the Support Vector Classifier model
You can also add other parameters and test your code here
Some parameters are : kernal and degree
Documentation of sklearn Support Vector Classifier: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html'''
model = SVC()# fit the model with the training data
model.fit(train_x,train_y)# predict the target on the train dataset
predict_train = model.predict(train_x)
print('Target on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)
  1. R代码
library(e1071)
x <- cbind(x_train,y_train)
# Fitting model
fit <-svm(y_train ~ ., data = x)
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

五、朴素贝叶斯算法(Naive Bayes)

  1. Python代码
# importing required libraries
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''
Create the object of the Naive Bayes model
You can also add other parameters and test your code here
Some parameters are : var_smoothing
Documentation of sklearn GaussianNB: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html'''
model = GaussianNB()# fit the model with the training data
model.fit(train_x,train_y)# predict the target on the train dataset
predict_train = model.predict(train_x)
print('Target on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)
  1. R代码
library(e1071)
x <- cbind(x_train,y_train)
# Fitting model
fit <-naiveBayes(y_train ~ ., data = x)
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

六、K邻近算法(k- Nearest Neighbors,kNN)

1、 Python代码

# importing required libraries
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''
Create the object of the K-Nearest Neighbor model
You can also add other parameters and test your code here
Some parameters are : n_neighbors, leaf_size
Documentation of sklearn K-Neighbors Classifier: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html'''
model = KNeighborsClassifier()  # fit the model with the training data
model.fit(train_x,train_y)# Number of Neighbors used to predict the target
print('\nThe number of neighbors used to predict the target : ',model.n_neighbors)# predict the target on the train dataset
predict_train = model.predict(train_x)
print('\nTarget on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('Target on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)

2、 R代码

library(knn)
x <- cbind(x_train,y_train)
# Fitting model
fit <-knn(y_train ~ ., data = x,k=5)
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

七、K均值算法(k-Means)

1、 Python代码

# importing required libraries
import pandas as pd
from sklearn.cluster import KMeans# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to divide the training data into differernt clusters
# and predict in which cluster a particular data point belongs.  '''
Create the object of the K-Means model
You can also add other parameters and test your code here
Some parameters are : n_clusters and max_iter
Documentation of sklearn KMeans: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html'''model = KMeans()  # fit the model with the training data
model.fit(train_data)# Number of Clusters
print('\nDefault number of Clusters : ',model.n_clusters)# predict the clusters on the train dataset
predict_train = model.predict(train_data)
print('\nCLusters on train data',predict_train) # predict the target on the test dataset
predict_test = model.predict(test_data)
print('Clusters on test data',predict_test) # Now, we will train a model with n_cluster = 3
model_n3 = KMeans(n_clusters=3)# fit the model with the training data
model_n3.fit(train_data)# Number of Clusters
print('\nNumber of Clusters : ',model_n3.n_clusters)# predict the clusters on the train dataset
predict_train_3 = model_n3.predict(train_data)
print('\nCLusters on train data',predict_train_3) # predict the target on the test dataset
predict_test_3 = model_n3.predict(test_data)
print('Clusters on test data',predict_test_3) 

2、 R代码

library(cluster)
fit <- kmeans(X, 3) # 5 cluster solution

八、随机森林算法(Random Forest)

1、 Python代码

# importing required libraries
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# view the top 3 rows of the dataset
print(train_data.head(3))# shape of the dataset
print('\nShape of training data :',train_data.shape)
print('\nShape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''Create the object of the Random Forest model
You can also add other parameters and test your code here
Some parameters are : n_estimators and max_depth
Documentation of sklearn RandomForestClassifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html'''
model = RandomForestClassifier()# fit the model with the training data
model.fit(train_x,train_y)# number of trees used
print('Number of Trees used : ', model.n_estimators)# predict the target on the train dataset
predict_train = model.predict(train_x)
print('\nTarget on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('\nTarget on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('\naccuracy_score on test dataset : ', accuracy_test)

2、 R代码

library(randomForest)
x <- cbind(x_train,y_train)
# Fitting model
fit <- randomForest(Species ~ ., x,ntree=500)
summary(fit)
#Predict Output
predicted= predict(fit,x_test)

九、主成分分析算法(PCA)

1、 Python代码

# importing required libraries
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error  # read the train and test dataset
train_data = pd.read_csv('train.csv')
test_data = pd.read_csv('test.csv')# view the top 3 rows of the dataset
print(train_data.head(3))# shape of the dataset
print('\nShape of training data :',train_data.shape)
print('\nShape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
# target variable - Item_Outlet_Sales
train_x = train_data.drop(columns=['Item_Outlet_Sales'],axis=1)
train_y = train_data['Item_Outlet_Sales']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Item_Outlet_Sales'],axis=1)
test_y = test_data['Item_Outlet_Sales']print('\nTraining model with {} dimensions.'.format(train_x.shape[1]))# create object of model
model = LinearRegression()# fit the model with the training data
model.fit(train_x,train_y)# predict the target on the train dataset
predict_train = model.predict(train_x)# Accuray Score on train dataset
rmse_train = mean_squared_error(train_y,predict_train)**(0.5)
print('\nRMSE on train dataset : ', rmse_train)# predict the target on the test dataset
predict_test = model.predict(test_x)# Accuracy Score on test dataset
rmse_test = mean_squared_error(test_y,predict_test)**(0.5)
print('\nRMSE on test dataset : ', rmse_test)# create the object of the PCA (Principal Component Analysis) model
# reduce the dimensions of the data to 12
'''
You can also add other parameters and test your code here
Some parameters are : svd_solver, iterated_power
Documentation of sklearn PCA:https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
'''
model_pca = PCA(n_components=12)new_train = model_pca.fit_transform(train_x)
new_test  = model_pca.fit_transform(test_x)print('\nTraining model with {} dimensions.'.format(new_train.shape[1]))# create object of model
model_new = LinearRegression()# fit the model with the training data
model_new.fit(new_train,train_y)# predict the target on the new train dataset
predict_train_pca = model_new.predict(new_train)# Accuray Score on train dataset
rmse_train_pca = mean_squared_error(train_y,predict_train_pca)**(0.5)
print('\nRMSE on new train dataset : ', rmse_train_pca)# predict the target on the new test dataset
predict_test_pca = model_new.predict(new_test)# Accuracy Score on test dataset
rmse_test_pca = mean_squared_error(test_y,predict_test_pca)**(0.5)
print('\nRMSE on new test dataset : ', rmse_test_pca)

2、 R代码

library(stats)
pca <- princomp(train, cor = TRUE)
train_reduced  <- predict(pca,train)
test_reduced  <- predict(pca,test)

十、梯度提升树(Gradient Boosting)

  1. GBM

1、 Python代码

# importing required libraries
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''
Create the object of the GradientBoosting Classifier model
You can also add other parameters and test your code here
Some parameters are : learning_rate, n_estimators
Documentation of sklearn GradientBoosting Classifier: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
'''
model = GradientBoostingClassifier(n_estimators=100,max_depth=5)# fit the model with the training data
model.fit(train_x,train_y)# predict the target on the train dataset
predict_train = model.predict(train_x)
print('\nTarget on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('\nTarget on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('\naccuracy_score on test dataset : ', accuracy_test)

2、 R代码

library(caret)
x <- cbind(x_train,y_train)
# Fitting model
fitControl <- trainControl( method = "repeatedcv", number = 4, repeats = 4)
fit <- train(y ~ ., data = x, method = "gbm", trControl = fitControl,verbose = FALSE)
predicted= predict(fit,x_test,type= "prob")[,2] 
  1. XGBoost

1、  Python代码

# importing required libraries
import pandas as pd
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)# Now, we need to predict the missing target variable in the test data
# target variable - Survived# seperate the independent and target variable on training data
train_x = train_data.drop(columns=['Survived'],axis=1)
train_y = train_data['Survived']# seperate the independent and target variable on testing data
test_x = test_data.drop(columns=['Survived'],axis=1)
test_y = test_data['Survived']'''
Create the object of the XGBoost model
You can also add other parameters and test your code here
Some parameters are : max_depth and n_estimators
Documentation of xgboost:https://xgboost.readthedocs.io/en/latest/
'''
model = XGBClassifier()# fit the model with the training data
model.fit(train_x,train_y)# predict the target on the train dataset
predict_train = model.predict(train_x)
print('\nTarget on train data',predict_train) # Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)# predict the target on the test dataset
predict_test = model.predict(test_x)
print('\nTarget on test data',predict_test) # Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('\naccuracy_score on test dataset : ', accuracy_test)

2、  R代码

require(caret)x <- cbind(x_train,y_train)# Fitting modelTrainControl <- trainControl( method = "repeatedcv", number = 10, repeats = 4)model<- train(y ~ ., data = x, method = "xgbLinear", trControl = TrainControl,verbose = FALSE)OR model<- train(y ~ ., data = x, method = "xgbTree", trControl = TrainControl,verbose = FALSE)predicted <- predict(model, x_test)
  1. LightGBM

1、  Python代码

data = np.random.rand(500, 10) # 500 entities, each contains 10 features
label = np.random.randint(2, size=500) # binary targettrain_data = lgb.Dataset(data, label=label)
test_data = train_data.create_valid('test.svm')param = {'num_leaves':31, 'num_trees':100, 'objective':'binary'}
param['metric'] = 'auc'num_round = 10
bst = lgb.train(param, train_data, num_round, valid_sets=[test_data])bst.save_model('model.txt')# 7 entities, each contains 10 features
data = np.random.rand(7, 10)
ypred = bst.predict(data)

2、  R代码

library(RLightGBM)
data(example.binary)
#Parametersnum_iterations <- 100
config <- list(objective = "binary",  metric="binary_logloss,auc", learning_rate = 0.1, num_leaves = 63, tree_learner = "serial", feature_fraction = 0.8, bagging_freq = 5, bagging_fraction = 0.8, min_data_in_leaf = 50, min_sum_hessian_in_leaf = 5.0)#Create data handle and booster
handle.data <- lgbm.data.create(x)lgbm.data.setField(handle.data, "label", y)handle.booster <- lgbm.booster.create(handle.data, lapply(config, as.character))#Train for num_iterations iterations and eval every 5 stepslgbm.booster.train(handle.booster, num_iterations, 5)#Predict
pred <- lgbm.booster.predict(handle.booster, x.test)#Test accuracy
sum(y.test == (y.pred > 0.5)) / length(y.test)#Save model (can be loaded again via lgbm.booster.load(filename))
lgbm.booster.save(handle.booster, filename = "/tmp/model.txt")
require(caret)
require(RLightGBM)
data(iris)model <-caretModel.LGBM()fit <- train(Species ~ ., data = iris, method=model, verbosity = 0)
print(fit)
y.pred <- predict(fit, iris[,1:4])library(Matrix)
model.sparse <- caretModel.LGBM.sparse()#Generate a sparse matrix
mat <- Matrix(as.matrix(iris[,1:4]), sparse = T)
fit <- train(data.frame(idx = 1:nrow(iris)), iris$Species, method = model.sparse, matrix = mat, verbosity = 0)
print(fit)
  1. CatBoost

1、  Python代码

import pandas as pd
import numpy as npfrom catboost import CatBoostRegressor#Read training and testing files
train = pd.read_csv("train.csv")
test = pd.read_csv("test.csv")#Imputing missing values for both train and test
train.fillna(-999, inplace=True)
test.fillna(-999,inplace=True)#Creating a training set for modeling and validation set to check model performance
X = train.drop(['Item_Outlet_Sales'], axis=1)
y = train.Item_Outlet_Salesfrom sklearn.model_selection import train_test_splitX_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.7, random_state=1234)
categorical_features_indices = np.where(X.dtypes != np.float)[0]#importing library and building model
from catboost import CatBoostRegressormodel=CatBoostRegressor(iterations=50, depth=3, learning_rate=0.1, loss_function='RMSE')model.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_validation, y_validation),plot=True)submission = pd.DataFrame()submission['Item_Identifier'] = test['Item_Identifier']
submission['Outlet_Identifier'] = test['Outlet_Identifier']
submission['Item_Outlet_Sales'] = model.predict(test)

2、  R代码

set.seed(1)require(titanic)require(caret)require(catboost)tt <- titanic::titanic_train[complete.cases(titanic::titanic_train),]data <- as.data.frame(as.matrix(tt), stringsAsFactors = TRUE)drop_columns = c("PassengerId", "Survived", "Name", "Ticket", "Cabin")x <- data[,!(names(data) %in% drop_columns)]y <- data[,c("Survived")]fit_control <- trainControl(method = "cv", number = 4,classProbs = TRUE)grid <- expand.grid(depth = c(4, 6, 8),learning_rate = 0.1,iterations = 100, l2_leaf_reg = 1e-3,            rsm = 0.95, border_count = 64)report <- train(x, as.factor(make.names(y)),method = catboost.caret,verbose = TRUE, preProc = NULL,tuneGrid = grid, trControl = fit_control)print(report)importance <- varImp(report, scale = FALSE)print(importance)

常用的机器学习算法(使用 Python 和 R 代码)相关推荐

  1. 机器学习算法清单!附Python和R代码

    来源:数据与算法之美 本文约6000字,建议阅读8分钟. 通过本文为大家介绍了3种机器学习算法方式以及10种机器学习算法的清单,学起来吧~ 前言 谷歌董事长施密特曾说过:虽然谷歌的无人驾驶汽车和机器人 ...

  2. 10 种机器学习算法的要点(附 Python 和 R 代码)(转载)

    10 种机器学习算法的要点(附 Python 和 R 代码)(转载) from:https://zhuanlan.zhihu.com/p/25273698 前言 谷歌董事长施密特曾说过:虽然谷歌的无人 ...

  3. 机器学习算法一览(附python和R代码)

     机器学习算法一览(附python和R代码) 来源:数据观 时间:2016-04-19 15:20:43 作者:大数据文摘 "谷歌的无人车和机器人得到了很多关注,但我们真正的未来却在于能 ...

  4. 机器学习系列(9)_机器学习算法一览(附Python和R代码)

    转载自:http://blog.csdn.net/longxinchen_ml/article/details/51192086 – 谷歌的无人车和机器人得到了很多关注,但我们真正的未来却在于能够使电 ...

  5. 10 种机器学习算法的要点(附 Python 和 R 代码)

    前言 谷歌董事长施密特曾说过:虽然谷歌的无人驾驶汽车和机器人受到了许多媒体关注,但是这家公司真正的未来在于机器学习,一种让计算机更聪明.更个性化的技术. 也许我们生活在人类历史上最关键的时期:从使用大 ...

  6. 转机器学习系列(9)_机器学习算法一览(附Python和R代码)

    转自http://blog.csdn.net/han_xiaoyang/article/details/51191386 – 谷歌的无人车和机器人得到了很多关注,但我们真正的未来却在于能够使电脑变得更 ...

  7. 机器学习算法的要点(附 Python 和 R 代码)

    前言 谷歌董事长施密特曾说过:虽然谷歌的无人驾驶汽车和机器人受到了许多媒体关注,但是这家公司真正的未来在于机器学习,一种让计算机更聪明.更个性化的技术. 也许我们生活在人类历史上最关键的时期:从使用大 ...

  8. 机器学习算法与Python实践之k均值聚类(k-means)

    机器学习算法与Python实践之(五)k均值聚类(k-means) zouxy09@qq.com http://blog.csdn.net/zouxy09 机器学习算法与Python实践这个系列主要是 ...

  9. 机器学习算法与Python实践之 k均值聚类(k-means)

    文章来源:http://blog.csdn.net/zouxy09/article/details/17589329 机器学习算法与Python实践这个系列主要是参考<机器学习实战>这本书 ...

  10. 机器学习算法与Python实践之(六)二分k均值聚类

    机器学习算法与Python实践这个系列主要是参考<机器学习实战>这本书.因为自己想学习Python,然后也想对一些机器学习算法加深下了解,所以就想通过Python来实现几个比较常用的机器学 ...

最新文章

  1. (C语言)一种简易记法:生成[a,b]范围内的随机整数
  2. android 滑动顶部固定,android view滑动到顶部悬停
  3. ZooKeeper的事务日志和快照
  4. php语言与jsp,关于开发语言之PHP JSP与ASP NET对比浅析
  5. Perlin Noise algorithms(备忘)
  6. 策略模式在公司项目中的运用实践,看完又可以涨一波实战经验了!
  7. Julia的学习资料从哪里找?
  8. VS2010打开VS2013、VS2015建立的工程,各种版本之间转换
  9. 零基础程序员如何自学编程?用这6种方法就够了!
  10. Windows Server 2012 网络发现选项无法启动 启动不生效(无法保存)
  11. FPGA-小梅哥时序约束
  12. RCE(命令执行)总结
  13. aid learning安装应用_Aid Learning
  14. WordPress星级评分插件KK Star Ratings评分插件教程
  15. 如何创建一个原始Mac OS镜像
  16. 3500字干货!精准解决3大难题,助力服装行业数字化转型
  17. 4.2.5 Kafka集群与运维(集群的搭建、监控工具 Kafka Eagle)
  18. 我是如何在12周内由零基础成为一名程序员的——谨以此文激励自己!!!
  19. 概率图模型(1)--隐马尔科夫模型(1)
  20. CCF C类会议:PAKDD叶老师和闵老师意见反馈

热门文章

  1. [云原生专题-1]:总体-云原生初步探究,什么是云原生,云原生的基本特性
  2. SOT23(Small Outline Transistor)
  3. Java后端工程师在做什么
  4. 岗位:python后端工程师
  5. 数字化势不可挡:“衣食住行”的升级之战,行业巨头如何破局
  6. 产品新创意,创意产品原型大公开,原来可以这样做!
  7. python地图制作 - pyecharts(1.9.1)绘制各城市地图
  8. 已解决-安装CentOS 7时No Caching mode page found和Assuming drive cache:write through报错问题
  9. 学java好还是学挖机好_现在开挖掘机还能月入上万吗,为何年轻人还是热衷于学挖掘机?...
  10. java代码如何整合_Java如何合并两个PPT文档?