pdftools

pdftools是一个专门用来处理pdf文件的包 pdftools

pdf_text()

pdf_text()#将pdf每页返回成(return)成一个character vector.
> #举个例子
> a <- pdf_text("41375_2012_BFleu2012127_MOESM29_ESM.pdf")
> #查看pdf页数
> length(a)
[1] 23
> #看看第一页,不过好像不能读取公式
> a[1]



接着按照自己的需求提取pdf中的信息就好了~

#还是举个例子,我想提取第十页的gene symbol
> b <- pdf_text("41375_2012_BFleu2012127_MOESM29_ESM.pdf")
> #查看pdf页数
> length(b)
[1] 23
> #看看第一页,不过好像不能读取公式
> b[1]
[1] "                                      SUPPLEMENTAL METHODS\n\nPatients in training dataset\nThe HOVON-65/GMMG-HD4 randomized clinical trial (ISRCTN64455289) consists of newly diagnosed,\ntransplant-eligible patients with multiple myeloma. Patients were randomly assigned to either bortezomib based\ntreatment or vincristine based treatment. Vincristine based treatment: three cycles of induction treatment with\nvincristine 0.4 mg intravenously on days 1-4, doxorubicin 9 mg/m² intravenously on days 1-4, and dexamethasone\n40 mg orally on days 1-4, 9-12, and 17-20; bortezomib based treatment: bortezomib 1.3 mg/m² intravenously on\ndays 1, 4, 8, and 11, doxorubicin 9 mg/m² intravenously on days 1-4, and dexamethasone 40 mg orally on days 1-4,\n9-12, and 17-20. Stem-cells were mobilized by use of cyclophosphamide 1000 mg/m² intravenously on day 1,\ndoxorubicin 15 mg/m² intravenously on days 1-4, dexamethasone 40 mg orally on days 1-4, and granulocyte colony-\nstimulating factor (filgrastim) 10 μg/kg per day subcutaneously, divided in two doses per day, from day 5 until last\nstem cell collection. A minimum of 2.5 × 106 CD34+ cells per transplantation procedure was required. After\ninduction therapy, patients received one (HOVON-65) or two (GMMG-HD4) cycles of high-dose melphalan (200\nmg/m² intravenously) with autologous stem-cell rescue followed by maintenance treatment with thalidomide (50 mg\nper day orally; group assigned to vincristine-based induction treatment) or bortezomib (1.3 mg/m² intravenously\nonce every 2 weeks; group assigned to bortezomib-based induction treatment) for 2 years. Treatment was not\nmasked for physicians and patients (see Figure S1).\nInformed consent to treatment protocols and sample procurement was obtained for all cases included in this study, in\naccordance with the Declaration of Helsinki. Use of diagnostic tumour material was approved by the institutional\nreview board of the Erasmus Medical Centre.\n\nPatients in validation datasets\nUAMS-TT2 is a randomized trial in which patients received thalidomide during all treatment phases (UAMS-TT2;\nn=351; GSE2658; NCT00573391).1 UAMS-TT3 is a similar regimen with the addition of bortezomib to the\nthalidomide arm (UAMS-TT3; n=142; E-TABM-1138; NCT00081939).2 The MRC-IX trial (n=247; GSE15695;\nISRCTN68454111) included both transplant-eligible and non-transplant-eligible newly diagnosed patients. For\ntransplant-eligible patients treatment consisted of induction high-dose therapy while non-transplant-eligible patients\nwere treated initially with either thalidomide or melphalan. Maintenance for both age classes was a comparison of\nthalidomide vs. no thalidomide.3, 4 The trial and dataset denoted here as APEX consisted of the three trials APEX,\nSUMMIT and CREST (n=264; GSE9782; registered under M34100-024, M34100-025 and\nNCT00049478/NCT00048230).5-8 The APEX trial included patients with relapsed myeloma who received either\nbortezomib or high-dose dexamethasone, with the possibility to cross-over to receive bortezomib after disease\nprogression.5 In the SUMMIT trial patients received bortezomib. In patients with a suboptimal response, oral\ndexamethasone was added to the regimen.8 The CREST trial included relapsed or refractory patients who received\nbortezomib. Dexamethasone was permitted in patients with progressive or stable disease. 7\n\nSurvival signature\nThe MAS5 normalized, log2 transformed and mean-variance scaled HOVON-65/GMMG-HD4 dataset was used as a\ntraining set for building a GEP based survival classifier.9, 10 The model was built using a Supervised Principal\nComponent Analysis (SPCA) framework.11 This technique is widely used in biological settings.12-19 The underlying\nassumption is the existence of a high-risk group which can be separated from a standard-risk group on the basis of\nprogression free survival. A Principal Component Analysis (PCA) is a rotation of a n  m centered feature space\nX in such a way that the largest variance in the data is projected on the top principal components. 20 This rotation\ncan be described by a m  m rotation matrix R pca\n\n                                                    X rot  XR pca\n\nX rot is rotated in such a way that the first principal component (PC) is the axis that points in the direction\nexhibiting the largest variance. Every subsequent PC is perpendicular to all previous while capturing as much as\npossible of the remaining variance. SPCA is a PCA whereby the feature space has undergone a selection X sel . In\nthis study the initial selection is based on selecting the top  probe sets that were ranked by a univariate Cox\nproportional hazard regression. This will result in high variance due to survival so it is likely survival is projected\nonto the top  PC's on which a Cox proportional hazard regression is applied. This yields regression coefficients β.\nThe resulting model can be summarized as:\n\n\n                                                          1\n"
> #提取第十页信息
> b[10]#杂乱无章!
[1] "                                      SUPPLEMENTAL TABLES\n\nTable S1. EMC-92 gene signature. Probe sets are ordered by decreasing magnitude of weighting coefficient\n(beta)\n                              Weighting         Symbol\nRank   Probes                                                   GO-term/description1\n                           coefficient (beta)\n  1    202728_s_at             -0.1105          LTBP1           negative regulation of TGFbeta receptor signaling\n  2    239054_at               -0.1088          SFMBT1          regulation of transcription\n  3    208942_s_at             -0.0997          SEC62           cotranslational protein targeting to membrane\n  4    208747_s_at             -0.0874          C1S             proteolysis\n  5    202542_s_at              0.0870          AIMP1           negative regulation of endothelial cell proliferation\n  6    214482_at                0.0861          ZBTB25          transcription\n  7    228416_at               -0.0778          ACVR2A          transmembrane receptor protein serine/threonine kinase signaling\n  8    217728_at                0.0773          S100A6          signal transduction\n  9    215177_s_at             -0.0768          ITGA6           cell-substrate junction assembly\n  10   225601_at                0.0750          HMGB3           multicellular organismal development\n  11   207618_s_at              0.0746          BCS1L           mitochondrion organization\n  12   231989_s_at              0.0730          LOC100271836    ---\n  13   202884_s_at              0.0714          PPP2R1B         control of cell growth and division\n  14   231738_at                0.0686          PCDHB7          calcium-dependent cell-cell adhesion\n  15   238116_at                0.0661          DYNLRB2         microtubule-based movement\n  16   226218_at               -0.0644          IL7R            regulation of DNA recombination\n  17   202842_s_at             -0.0626          DNAJB9          protein folding\n  18   208732_at               -0.0618          RAB2A           ER to Golgi vesicle-mediated transport\n  19   204379_s_at              0.0594          FGFR3           MAPKKK cascade\n  20   242180_at               -0.0585          TSPAN16         cellular activation and adhesion\n  21   216473_x_at             -0.0576          DUX4            regulation of transcription, DNA-dependent\n  22   209683_at               -0.0561          FAM49A          ---\n  23   219550_at                0.0559          ROBO3           axon guidance\n  24   223811_s_at              0.0556          SUN1 / GET4     cytoskeletal anchoring at nuclear membrane\n  25   202813_at                0.0548          TARBP1          regulation of transcription from RNA polymerase II promoter\n  26   212282_at                0.0530          TMEM97          cholesterol homeostasis\n  27   238780_s_at             -0.0529          EST/ BX647543   ---\n  28   M97935_MA_at2            0.0525          STAT1           transcription from RNA polymerase II promoter\n  29   221041_s_at             -0.0520          SLC17A5         anion transport\n  30   224009_x_at             -0.0520          DHRS9           androgen metabolic process\n  31   214612_x_at              0.0496          MAGEA6          ---\n  32   208232_x_at             -0.0493          ---             ---\n  33   238662_at                0.0490          ATPBD4          ---\n  34   206204_at                0.0477          GRB14           signal transduction\n  35   233437_at                0.0446          GABRA4          transport\n  36   200875_s_at              0.0437          NOP56           rRNA processing\n  37   38158_at                 0.0423          ESPL1           apoptosis\n  38   217548_at               -0.0423          C15orf38        ---\n  39   220351_at                0.0420          CCRL1           chemotaxis\n  40   213002_at               -0.0418          MARCKS          actin filament crosslinking\n  41   243018_at                0.0407          EST/BE568408    ---\n  42   221755_at                0.0396          EHBP1L1         ---\n  43   208667_s_at             -0.0390          ST13            protein folding\n  44   212055_at                0.0384          C18orf10        cytoskeleton\n  45   201292_at               -0.0372          TOP2A           DNA ligation\n  46   201102_s_at              0.0349          PFKL            fructose 6-phosphate metabolic process\n  47   214150_x_at             -0.0349          ATP6V0E1        proton transport\n  48   226742_at               -0.0345          SAR1B           transport\n  49   215181_at               -0.0342          CDH22           cell adhesion\n  50   208904_s_at             -0.0334          RPS28           rRNA processing\n\n                                                         10\n"
> #去掉分隔符"\n"
> b[10] %>% str_split("\n")
[[1]][1] "                                      SUPPLEMENTAL TABLES"                                                                       [2] ""                                                                                                                                [3] "Table S1. EMC-92 gene signature. Probe sets are ordered by decreasing magnitude of weighting coefficient"                        [4] "(beta)"                                                                                                                          [5] "                              Weighting         Symbol"                                                                          [6] "Rank   Probes                                                   GO-term/description1"                                            [7] "                           coefficient (beta)"                                                                                   [8] "  1    202728_s_at             -0.1105          LTBP1           negative regulation of TGFbeta receptor signaling"               [9] "  2    239054_at               -0.1088          SFMBT1          regulation of transcription"
[10] "  3    208942_s_at             -0.0997          SEC62           cotranslational protein targeting to membrane"
[11] "  4    208747_s_at             -0.0874          C1S             proteolysis"
[12] "  5    202542_s_at              0.0870          AIMP1           negative regulation of endothelial cell proliferation"
[13] "  6    214482_at                0.0861          ZBTB25          transcription"
[14] "  7    228416_at               -0.0778          ACVR2A          transmembrane receptor protein serine/threonine kinase signaling"
[15] "  8    217728_at                0.0773          S100A6          signal transduction"
[16] "  9    215177_s_at             -0.0768          ITGA6           cell-substrate junction assembly"
[17] "  10   225601_at                0.0750          HMGB3           multicellular organismal development"
[18] "  11   207618_s_at              0.0746          BCS1L           mitochondrion organization"
[19] "  12   231989_s_at              0.0730          LOC100271836    ---"
[20] "  13   202884_s_at              0.0714          PPP2R1B         control of cell growth and division"
[21] "  14   231738_at                0.0686          PCDHB7          calcium-dependent cell-cell adhesion"
[22] "  15   238116_at                0.0661          DYNLRB2         microtubule-based movement"
[23] "  16   226218_at               -0.0644          IL7R            regulation of DNA recombination"
[24] "  17   202842_s_at             -0.0626          DNAJB9          protein folding"
[25] "  18   208732_at               -0.0618          RAB2A           ER to Golgi vesicle-mediated transport"
[26] "  19   204379_s_at              0.0594          FGFR3           MAPKKK cascade"
[27] "  20   242180_at               -0.0585          TSPAN16         cellular activation and adhesion"
[28] "  21   216473_x_at             -0.0576          DUX4            regulation of transcription, DNA-dependent"
[29] "  22   209683_at               -0.0561          FAM49A          ---"
[30] "  23   219550_at                0.0559          ROBO3           axon guidance"
[31] "  24   223811_s_at              0.0556          SUN1 / GET4     cytoskeletal anchoring at nuclear membrane"
[32] "  25   202813_at                0.0548          TARBP1          regulation of transcription from RNA polymerase II promoter"
[33] "  26   212282_at                0.0530          TMEM97          cholesterol homeostasis"
[34] "  27   238780_s_at             -0.0529          EST/ BX647543   ---"
[35] "  28   M97935_MA_at2            0.0525          STAT1           transcription from RNA polymerase II promoter"
[36] "  29   221041_s_at             -0.0520          SLC17A5         anion transport"
[37] "  30   224009_x_at             -0.0520          DHRS9           androgen metabolic process"
[38] "  31   214612_x_at              0.0496          MAGEA6          ---"
[39] "  32   208232_x_at             -0.0493          ---             ---"
[40] "  33   238662_at                0.0490          ATPBD4          ---"
[41] "  34   206204_at                0.0477          GRB14           signal transduction"
[42] "  35   233437_at                0.0446          GABRA4          transport"
[43] "  36   200875_s_at              0.0437          NOP56           rRNA processing"
[44] "  37   38158_at                 0.0423          ESPL1           apoptosis"
[45] "  38   217548_at               -0.0423          C15orf38        ---"
[46] "  39   220351_at                0.0420          CCRL1           chemotaxis"
[47] "  40   213002_at               -0.0418          MARCKS          actin filament crosslinking"
[48] "  41   243018_at                0.0407          EST/BE568408    ---"
[49] "  42   221755_at                0.0396          EHBP1L1         ---"
[50] "  43   208667_s_at             -0.0390          ST13            protein folding"
[51] "  44   212055_at                0.0384          C18orf10        cytoskeleton"
[52] "  45   201292_at               -0.0372          TOP2A           DNA ligation"
[53] "  46   201102_s_at              0.0349          PFKL            fructose 6-phosphate metabolic process"
[54] "  47   214150_x_at             -0.0349          ATP6V0E1        proton transport"
[55] "  48   226742_at               -0.0345          SAR1B           transport"
[56] "  49   215181_at               -0.0342          CDH22           cell adhesion"
[57] "  50   208904_s_at             -0.0334          RPS28           rRNA processing"
[58] ""
[59] "                                                         10"
[60] ""                                                                                                                                > #变整齐了,再去掉空行
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)][1] "  1    202728_s_at             -0.1105          LTBP1           negative regulation of TGFbeta receptor signaling"               [2] "  2    239054_at               -0.1088          SFMBT1          regulation of transcription"                                     [3] "  3    208942_s_at             -0.0997          SEC62           cotranslational protein targeting to membrane"                   [4] "  4    208747_s_at             -0.0874          C1S             proteolysis"                                                     [5] "  5    202542_s_at              0.0870          AIMP1           negative regulation of endothelial cell proliferation"           [6] "  6    214482_at                0.0861          ZBTB25          transcription"                                                   [7] "  7    228416_at               -0.0778          ACVR2A          transmembrane receptor protein serine/threonine kinase signaling"[8] "  8    217728_at                0.0773          S100A6          signal transduction"                                             [9] "  9    215177_s_at             -0.0768          ITGA6           cell-substrate junction assembly"
[10] "  10   225601_at                0.0750          HMGB3           multicellular organismal development"
[11] "  11   207618_s_at              0.0746          BCS1L           mitochondrion organization"
[12] "  12   231989_s_at              0.0730          LOC100271836    ---"
[13] "  13   202884_s_at              0.0714          PPP2R1B         control of cell growth and division"
[14] "  14   231738_at                0.0686          PCDHB7          calcium-dependent cell-cell adhesion"
[15] "  15   238116_at                0.0661          DYNLRB2         microtubule-based movement"
[16] "  16   226218_at               -0.0644          IL7R            regulation of DNA recombination"
[17] "  17   202842_s_at             -0.0626          DNAJB9          protein folding"
[18] "  18   208732_at               -0.0618          RAB2A           ER to Golgi vesicle-mediated transport"
[19] "  19   204379_s_at              0.0594          FGFR3           MAPKKK cascade"
[20] "  20   242180_at               -0.0585          TSPAN16         cellular activation and adhesion"
[21] "  21   216473_x_at             -0.0576          DUX4            regulation of transcription, DNA-dependent"
[22] "  22   209683_at               -0.0561          FAM49A          ---"
[23] "  23   219550_at                0.0559          ROBO3           axon guidance"
[24] "  24   223811_s_at              0.0556          SUN1 / GET4     cytoskeletal anchoring at nuclear membrane"
[25] "  25   202813_at                0.0548          TARBP1          regulation of transcription from RNA polymerase II promoter"
[26] "  26   212282_at                0.0530          TMEM97          cholesterol homeostasis"
[27] "  27   238780_s_at             -0.0529          EST/ BX647543   ---"
[28] "  28   M97935_MA_at2            0.0525          STAT1           transcription from RNA polymerase II promoter"
[29] "  29   221041_s_at             -0.0520          SLC17A5         anion transport"
[30] "  30   224009_x_at             -0.0520          DHRS9           androgen metabolic process"
[31] "  31   214612_x_at              0.0496          MAGEA6          ---"
[32] "  32   208232_x_at             -0.0493          ---             ---"
[33] "  33   238662_at                0.0490          ATPBD4          ---"
[34] "  34   206204_at                0.0477          GRB14           signal transduction"
[35] "  35   233437_at                0.0446          GABRA4          transport"
[36] "  36   200875_s_at              0.0437          NOP56           rRNA processing"
[37] "  37   38158_at                 0.0423          ESPL1           apoptosis"
[38] "  38   217548_at               -0.0423          C15orf38        ---"
[39] "  39   220351_at                0.0420          CCRL1           chemotaxis"
[40] "  40   213002_at               -0.0418          MARCKS          actin filament crosslinking"
[41] "  41   243018_at                0.0407          EST/BE568408    ---"
[42] "  42   221755_at                0.0396          EHBP1L1         ---"
[43] "  43   208667_s_at             -0.0390          ST13            protein folding"
[44] "  44   212055_at                0.0384          C18orf10        cytoskeleton"
[45] "  45   201292_at               -0.0372          TOP2A           DNA ligation"
[46] "  46   201102_s_at              0.0349          PFKL            fructose 6-phosphate metabolic process"
[47] "  47   214150_x_at             -0.0349          ATP6V0E1        proton transport"
[48] "  48   226742_at               -0.0345          SAR1B           transport"
[49] "  49   215181_at               -0.0342          CDH22           cell adhesion"
[50] "  50   208904_s_at             -0.0334          RPS28           rRNA processing"
> #看着更规整了,再去掉空格
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)] %>% str_split(" ")
[[1]][1] ""            ""            "1"           ""            ""            ""           [7] "202728_s_at" ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            "-0.1105"     ""            ""            ""            ""
[25] ""            ""            ""            ""            ""            "LTBP1"
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            ""            "negative"    "regulation"
[43] "of"          "TGFbeta"     "receptor"    "signaling"  [[2]][1] ""              ""              "2"             ""              ""             [6] ""              "239054_at"     ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              ""              ""              ""
[21] ""              "-0.1088"       ""              ""              ""
[26] ""              ""              ""              ""              ""
[31] ""              "SFMBT1"        ""              ""              ""
[36] ""              ""              ""              ""              ""
[41] ""              "regulation"    "of"            "transcription"[[3]][1] ""                ""                "3"               ""               [5] ""                ""                "208942_s_at"     ""               [9] ""                ""                ""                ""
[13] ""                ""                ""                ""
[17] ""                ""                ""                "-0.0997"
[21] ""                ""                ""                ""
[25] ""                ""                ""                ""
[29] ""                "SEC62"           ""                ""
[33] ""                ""                ""                ""
[37] ""                ""                ""                ""
[41] "cotranslational" "protein"         "targeting"       "to"
[45] "membrane"       [[4]][1] ""            ""            "4"           ""            ""            ""           [7] "208747_s_at" ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            "-0.0874"     ""            ""            ""            ""
[25] ""            ""            ""            ""            ""            "C1S"
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            ""            ""            ""
[43] "proteolysis"[[5]][1] ""              ""              "5"             ""              ""             [6] ""              "202542_s_at"   ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              ""              ""              ""
[21] "0.0870"        ""              ""              ""              ""
[26] ""              ""              ""              ""              ""
[31] "AIMP1"         ""              ""              ""              ""
[36] ""              ""              ""              ""              ""
[41] ""              "negative"      "regulation"    "of"            "endothelial"
[46] "cell"          "proliferation"[[6]][1] ""              ""              "6"             ""              ""             [6] ""              "214482_at"     ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              ""              ""              ""
[21] ""              ""              "0.0861"        ""              ""
[26] ""              ""              ""              ""              ""
[31] ""              ""              "ZBTB25"        ""              ""
[36] ""              ""              ""              ""              ""
[41] ""              ""              "transcription"[[7]][1] ""                 ""                 "7"                ""                [5] ""                 ""                 "228416_at"        ""                [9] ""                 ""                 ""                 ""
[13] ""                 ""                 ""                 ""
[17] ""                 ""                 ""                 ""
[21] ""                 "-0.0778"          ""                 ""
[25] ""                 ""                 ""                 ""
[29] ""                 ""                 ""                 "ACVR2A"
[33] ""                 ""                 ""                 ""
[37] ""                 ""                 ""                 ""
[41] ""                 "transmembrane"    "receptor"         "protein"
[45] "serine/threonine" "kinase"           "signaling"       [[8]][1] ""             ""             "8"            ""             ""            [6] ""             "217728_at"    ""             ""             ""
[11] ""             ""             ""             ""             ""
[16] ""             ""             ""             ""             ""
[21] ""             ""             "0.0773"       ""             ""
[26] ""             ""             ""             ""             ""
[31] ""             ""             "S100A6"       ""             ""
[36] ""             ""             ""             ""             ""
[41] ""             ""             "signal"       "transduction"[[9]][1] ""               ""               "9"              ""               ""              [6] ""               "215177_s_at"    ""               ""               ""
[11] ""               ""               ""               ""               ""
[16] ""               ""               ""               ""               "-0.0768"
[21] ""               ""               ""               ""               ""
[26] ""               ""               ""               ""               "ITGA6"
[31] ""               ""               ""               ""               ""
[36] ""               ""               ""               ""               ""
[41] "cell-substrate" "junction"       "assembly"      [[10]][1] ""              ""              "10"            ""              ""             [6] "225601_at"     ""              ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              ""              ""              ""
[21] ""              "0.0750"        ""              ""              ""
[26] ""              ""              ""              ""              ""
[31] ""              "HMGB3"         ""              ""              ""
[36] ""              ""              ""              ""              ""
[41] ""              ""              "multicellular" "organismal"    "development"  [[11]][1] ""              ""              "11"            ""              ""             [6] "207618_s_at"   ""              ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              ""              ""              "0.0746"
[21] ""              ""              ""              ""              ""
[26] ""              ""              ""              ""              "BCS1L"
[31] ""              ""              ""              ""              ""
[36] ""              ""              ""              ""              ""
[41] "mitochondrion" "organization" [[12]][1] ""             ""             "12"           ""             ""            [6] "231989_s_at"  ""             ""             ""             ""
[11] ""             ""             ""             ""             ""
[16] ""             ""             ""             ""             "0.0730"
[21] ""             ""             ""             ""             ""
[26] ""             ""             ""             ""             "LOC100271836"
[31] ""             ""             ""             "---"         [[13]][1] ""            ""            "13"          ""            ""            "202884_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            "0.0714"      ""            ""            ""            ""
[25] ""            ""            ""            ""            ""            "PPP2R1B"
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            "control"     "of"          "cell"        "growth"
[43] "and"         "division"   [[14]][1] ""                  ""                  "14"                ""                 [5] ""                  "231738_at"         ""                  ""                 [9] ""                  ""                  ""                  ""
[13] ""                  ""                  ""                  ""
[17] ""                  ""                  ""                  ""
[21] ""                  "0.0686"            ""                  ""
[25] ""                  ""                  ""                  ""
[29] ""                  ""                  ""                  "PCDHB7"
[33] ""                  ""                  ""                  ""
[37] ""                  ""                  ""                  ""
[41] ""                  "calcium-dependent" "cell-cell"         "adhesion"         [[15]][1] ""                  ""                  "15"                ""                 [5] ""                  "238116_at"         ""                  ""                 [9] ""                  ""                  ""                  ""
[13] ""                  ""                  ""                  ""
[17] ""                  ""                  ""                  ""
[21] ""                  "0.0661"            ""                  ""
[25] ""                  ""                  ""                  ""
[29] ""                  ""                  ""                  "DYNLRB2"
[33] ""                  ""                  ""                  ""
[37] ""                  ""                  ""                  ""
[41] "microtubule-based" "movement"         [[16]][1] ""              ""              "16"            ""              ""             [6] "226218_at"     ""              ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              ""              ""              ""
[21] "-0.0644"       ""              ""              ""              ""
[26] ""              ""              ""              ""              ""
[31] "IL7R"          ""              ""              ""              ""
[36] ""              ""              ""              ""              ""
[41] ""              ""              "regulation"    "of"            "DNA"
[46] "recombination"[[17]][1] ""            ""            "17"          ""            ""            "202842_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0626"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "DNAJB9"      ""
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            "protein"     "folding"    [[18]][1] ""                 ""                 "18"               ""                [5] ""                 "208732_at"        ""                 ""                [9] ""                 ""                 ""                 ""
[13] ""                 ""                 ""                 ""
[17] ""                 ""                 ""                 ""
[21] "-0.0618"          ""                 ""                 ""
[25] ""                 ""                 ""                 ""
[29] ""                 ""                 "RAB2A"            ""
[33] ""                 ""                 ""                 ""
[37] ""                 ""                 ""                 ""
[41] ""                 "ER"               "to"               "Golgi"
[45] "vesicle-mediated" "transport"       [[19]][1] ""            ""            "19"          ""            ""            "204379_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            "0.0594"      ""            ""            ""            ""
[25] ""            ""            ""            ""            ""            "FGFR3"
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            ""            "MAPKKK"      "cascade"    [[20]][1] ""           ""           "20"         ""           ""           "242180_at" [7] ""           ""           ""           ""           ""           ""
[13] ""           ""           ""           ""           ""           ""
[19] ""           ""           "-0.0585"    ""           ""           ""
[25] ""           ""           ""           ""           ""           ""
[31] "TSPAN16"    ""           ""           ""           ""           ""
[37] ""           ""           ""           "cellular"   "activation" "and"
[43] "adhesion"  [[21]][1] ""               ""               "21"             ""               ""              [6] "216473_x_at"    ""               ""               ""               ""
[11] ""               ""               ""               ""               ""
[16] ""               ""               ""               "-0.0576"        ""
[21] ""               ""               ""               ""               ""
[26] ""               ""               ""               "DUX4"           ""
[31] ""               ""               ""               ""               ""
[36] ""               ""               ""               ""               ""
[41] "regulation"     "of"             "transcription," "DNA-dependent" [[22]][1] ""          ""          "22"        ""          ""          "209683_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          "-0.0561"
[22] ""          ""          ""          ""          ""          ""          ""
[29] ""          ""          "FAM49A"    ""          ""          ""          ""
[36] ""          ""          ""          ""          ""          "---"      [[23]][1] ""          ""          "23"        ""          ""          "219550_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          ""
[22] "0.0559"    ""          ""          ""          ""          ""          ""
[29] ""          ""          ""          "ROBO3"     ""          ""          ""
[36] ""          ""          ""          ""          ""          ""          ""
[43] "axon"      "guidance" [[24]][1] ""             ""             "24"           ""             ""            [6] "223811_s_at"  ""             ""             ""             ""
[11] ""             ""             ""             ""             ""
[16] ""             ""             ""             ""             "0.0556"
[21] ""             ""             ""             ""             ""
[26] ""             ""             ""             ""             "SUN1"
[31] "/"            "GET4"         ""             ""             ""
[36] ""             "cytoskeletal" "anchoring"    "at"           "nuclear"
[41] "membrane"    [[25]][1] ""              ""              "25"            ""              ""             [6] "202813_at"     ""              ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              ""              ""              ""
[21] ""              "0.0548"        ""              ""              ""
[26] ""              ""              ""              ""              ""
[31] ""              "TARBP1"        ""              ""              ""
[36] ""              ""              ""              ""              ""
[41] ""              "regulation"    "of"            "transcription" "from"
[46] "RNA"           "polymerase"    "II"            "promoter"     [[26]][1] ""            ""            "26"          ""            ""            "212282_at"  [7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            ""            ""            "0.0530"      ""            ""
[25] ""            ""            ""            ""            ""            ""
[31] ""            "TMEM97"      ""            ""            ""            ""
[37] ""            ""            ""            ""            ""            "cholesterol"
[43] "homeostasis"[[27]][1] ""            ""            "27"          ""            ""            "238780_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0529"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "EST/"        "BX647543"
[31] ""            ""            "---"        [[28]][1] ""              ""              "28"            ""              ""             [6] "M97935_MA_at2" ""              ""              ""              ""
[11] ""              ""              ""              ""              ""
[16] ""              ""              "0.0525"        ""              ""
[21] ""              ""              ""              ""              ""
[26] ""              ""              "STAT1"         ""              ""
[31] ""              ""              ""              ""              ""
[36] ""              ""              ""              "transcription" "from"
[41] "RNA"           "polymerase"    "II"            "promoter"     [[29]][1] ""            ""            "29"          ""            ""            "221041_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0520"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "SLC17A5"     ""
[31] ""            ""            ""            ""            ""            ""
[37] ""            "anion"       "transport"  [[30]][1] ""            ""            "30"          ""            ""            "224009_x_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0520"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "DHRS9"       ""
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            "androgen"    "metabolic"   "process"    [[31]][1] ""            ""            "31"          ""            ""            "214612_x_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            "0.0496"      ""            ""            ""            ""
[25] ""            ""            ""            ""            ""            "MAGEA6"
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            "---"        [[32]][1] ""            ""            "32"          ""            ""            "208232_x_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0493"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "---"         ""
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            ""            ""            "---"        [[33]][1] ""          ""          "33"        ""          ""          "238662_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          ""
[22] "0.0490"    ""          ""          ""          ""          ""          ""
[29] ""          ""          ""          "ATPBD4"    ""          ""          ""
[36] ""          ""          ""          ""          ""          ""          "---"      [[34]][1] ""             ""             "34"           ""             ""            [6] "206204_at"    ""             ""             ""             ""
[11] ""             ""             ""             ""             ""
[16] ""             ""             ""             ""             ""
[21] ""             "0.0477"       ""             ""             ""
[26] ""             ""             ""             ""             ""
[31] ""             "GRB14"        ""             ""             ""
[36] ""             ""             ""             ""             ""
[41] ""             ""             "signal"       "transduction"[[35]][1] ""          ""          "35"        ""          ""          "233437_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          ""
[22] "0.0446"    ""          ""          ""          ""          ""          ""
[29] ""          ""          ""          "GABRA4"    ""          ""          ""
[36] ""          ""          ""          ""          ""          ""          "transport"[[36]][1] ""            ""            "36"          ""            ""            "200875_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            "0.0437"      ""            ""            ""            ""
[25] ""            ""            ""            ""            ""            "NOP56"
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            ""            "rRNA"        "processing" [[37]][1] ""          ""          "37"        ""          ""          "38158_at"  ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          ""
[22] ""          "0.0423"    ""          ""          ""          ""          ""
[29] ""          ""          ""          ""          "ESPL1"     ""          ""
[36] ""          ""          ""          ""          ""          ""          ""
[43] ""          "apoptosis"[[38]][1] ""          ""          "38"        ""          ""          "217548_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          "-0.0423"
[22] ""          ""          ""          ""          ""          ""          ""
[29] ""          ""          "C15orf38"  ""          ""          ""          ""
[36] ""          ""          ""          "---"      [[39]][1] ""           ""           "39"         ""           ""           "220351_at" [7] ""           ""           ""           ""           ""           ""
[13] ""           ""           ""           ""           ""           ""
[19] ""           ""           ""           "0.0420"     ""           ""
[25] ""           ""           ""           ""           ""           ""
[31] ""           "CCRL1"      ""           ""           ""           ""
[37] ""           ""           ""           ""           ""           ""
[43] "chemotaxis"[[40]][1] ""             ""             "40"           ""             ""            [6] "213002_at"    ""             ""             ""             ""
[11] ""             ""             ""             ""             ""
[16] ""             ""             ""             ""             ""
[21] "-0.0418"      ""             ""             ""             ""
[26] ""             ""             ""             ""             ""
[31] "MARCKS"       ""             ""             ""             ""
[36] ""             ""             ""             ""             ""
[41] "actin"        "filament"     "crosslinking"[[41]][1] ""             ""             "41"           ""             ""            [6] "243018_at"    ""             ""             ""             ""
[11] ""             ""             ""             ""             ""
[16] ""             ""             ""             ""             ""
[21] ""             "0.0407"       ""             ""             ""
[26] ""             ""             ""             ""             ""
[31] ""             "EST/BE568408" ""             ""             ""
[36] "---"         [[42]][1] ""          ""          "42"        ""          ""          "221755_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          ""
[22] "0.0396"    ""          ""          ""          ""          ""          ""
[29] ""          ""          ""          "EHBP1L1"   ""          ""          ""
[36] ""          ""          ""          ""          ""          "---"      [[43]][1] ""            ""            "43"          ""            ""            "208667_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0390"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "ST13"        ""
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            ""            "protein"     "folding"    [[44]][1] ""             ""             "44"           ""             ""            [6] "212055_at"    ""             ""             ""             ""
[11] ""             ""             ""             ""             ""
[16] ""             ""             ""             ""             ""
[21] ""             "0.0384"       ""             ""             ""
[26] ""             ""             ""             ""             ""
[31] ""             "C18orf10"     ""             ""             ""
[36] ""             ""             ""             ""             "cytoskeleton"[[45]][1] ""          ""          "45"        ""          ""          "201292_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          "-0.0372"
[22] ""          ""          ""          ""          ""          ""          ""
[29] ""          ""          "TOP2A"     ""          ""          ""          ""
[36] ""          ""          ""          ""          ""          ""          "DNA"
[43] "ligation" [[46]][1] ""            ""            "46"          ""            ""            "201102_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] ""            "0.0349"      ""            ""            ""            ""
[25] ""            ""            ""            ""            ""            "PFKL"
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            ""            ""            "fructose"
[43] "6-phosphate" "metabolic"   "process"    [[47]][1] ""            ""            "47"          ""            ""            "214150_x_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0349"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "ATP6V0E1"    ""
[31] ""            ""            ""            ""            ""            ""
[37] "proton"      "transport"  [[48]][1] ""          ""          "48"        ""          ""          "226742_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          "-0.0345"
[22] ""          ""          ""          ""          ""          ""          ""
[29] ""          ""          "SAR1B"     ""          ""          ""          ""
[36] ""          ""          ""          ""          ""          ""          "transport"[[49]][1] ""          ""          "49"        ""          ""          "215181_at" ""         [8] ""          ""          ""          ""          ""          ""          ""
[15] ""          ""          ""          ""          ""          ""          "-0.0342"
[22] ""          ""          ""          ""          ""          ""          ""
[29] ""          ""          "CDH22"     ""          ""          ""          ""
[36] ""          ""          ""          ""          ""          ""          "cell"
[43] "adhesion" [[50]][1] ""            ""            "50"          ""            ""            "208904_s_at"[7] ""            ""            ""            ""            ""            ""
[13] ""            ""            ""            ""            ""            ""
[19] "-0.0334"     ""            ""            ""            ""            ""
[25] ""            ""            ""            ""            "RPS28"       ""
[31] ""            ""            ""            ""            ""            ""
[37] ""            ""            ""            "rRNA"        "processing" > #只用把""去掉然后提取gene就好啦,可用for循环,也可以用lapply函数,道理都是相同的
> # gene <- list()
> # for (i in 1:50){> #   gene_name <- b2 %>% .[[i]] %>% .[.!=""] %>% .[4] # gene名排在第四个,根据不同的数据做不同的处理
> #   gene <- rbind(gene,gene_name)
> # }
> b[10] %>% str_split("\n") %>% .[[1]] %>% .[-c(1:7)] %>% .[-c(51:53)] %>% str_split(" ") %>% lapply(\(x){x[x!=""]%>%.[4]})
[[1]]
[1] "LTBP1"[[2]]
[1] "SFMBT1"[[3]]
[1] "SEC62"[[4]]
[1] "C1S"[[5]]
[1] "AIMP1"[[6]]
[1] "ZBTB25"[[7]]
[1] "ACVR2A"[[8]]
[1] "S100A6"[[9]]
[1] "ITGA6"[[10]]
[1] "HMGB3"[[11]]
[1] "BCS1L"[[12]]
[1] "LOC100271836"[[13]]
[1] "PPP2R1B"[[14]]
[1] "PCDHB7"[[15]]
[1] "DYNLRB2"[[16]]
[1] "IL7R"[[17]]
[1] "DNAJB9"[[18]]
[1] "RAB2A"[[19]]
[1] "FGFR3"[[20]]
[1] "TSPAN16"[[21]]
[1] "DUX4"[[22]]
[1] "FAM49A"[[23]]
[1] "ROBO3"[[24]]
[1] "SUN1"[[25]]
[1] "TARBP1"[[26]]
[1] "TMEM97"[[27]]
[1] "EST/"[[28]]
[1] "STAT1"[[29]]
[1] "SLC17A5"[[30]]
[1] "DHRS9"[[31]]
[1] "MAGEA6"[[32]]
[1] "---"[[33]]
[1] "ATPBD4"[[34]]
[1] "GRB14"[[35]]
[1] "GABRA4"[[36]]
[1] "NOP56"[[37]]
[1] "ESPL1"[[38]]
[1] "C15orf38"[[39]]
[1] "CCRL1"[[40]]
[1] "MARCKS"[[41]]
[1] "EST/BE568408"[[42]]
[1] "EHBP1L1"[[43]]
[1] "ST13"[[44]]
[1] "C18orf10"[[45]]
[1] "TOP2A"[[46]]
[1] "PFKL"[[47]]
[1] "ATP6V0E1"[[48]]
[1] "SAR1B"[[49]]
[1] "CDH22"[[50]]
[1] "RPS28"

pdf_data()

pdf_data() 可将pdf每页返回为数据帧

pdf_render_page()

render into a raw bitmap array for further processing in R

pdf_convert()

High quality conversion of pdf page(s) to png, jpeg or tiff format

这几个功能还没有用上,等用过了在回来写

利用R处理PDF文件相关推荐

  1. 利用Python提取PDF文件中的文本信息

    如何利用Python提取PDF文件中的文本信息 日常工作中我们经常会用到pdf格式的文件,大多数情况下是浏览或者编辑pdf信息,但有时候需要提取pdf中的文本,如果是单个文件的话还可以通过复制粘贴来直 ...

  2. 如何用python修改pdf内容_如何利用python将pdf文件转化为txt文件?

    https://www.wukong.com/answer/6579491774144708872/?iid=15906422033&app=news_article&share_an ...

  3. 利用Word制作pdf文件的方法

    利用Word制作pdf文件的方法 一.先用手机照成图片 二.把图片拖到word中 三.生成pdf文件 一.先用手机照成图片 二.把图片拖到word中 三.生成pdf文件 点文件 点导出 点创建PDF ...

  4. itextsharp 获取文本_利用iTextSharp提取PDF文件中的文本内容

    最近测试中需要对比两个PDF文件的内容,当然只是文字没有图表的,但是没有现成的工具可用.于是我的想法是先把PDF转换为Text,然后再对比Text的内容.现在问题的关键变成了如何提取PDF中的文本,在 ...

  5. R语言 PDF文件损坏或打不开

    最近在做ROC曲线,发现有的PDF文件打不开,提示''已损坏或者打开格式不对",第一段代码画的PDF可打开,第二段代码则的PDF文件一直打不开,捯饬了大半天,终于找到解决办法:1. 把原来的 ...

  6. 利用pdfbox读取pdf文件内容和图片

    最近用pdfbox读取pdf文件中的内容和图片,可以获取每一页的内容和图片,但有个问题是没法获取图片在页面的位置.源码如下: package com.util; import java.awt.ima ...

  7. vue 中利用canvas 给pdf文件加水印---详细教程(附上完整代码)

    需求:在h5网页中打开pdf文件,要求给文件添加水印 实现技术及插件:vue,vue-pdf,canvas 插件安装: npm i vue-pdf --save npm i pdf-lib --sav ...

  8. bfo java_Java 利用BFO操作PDF文件

    [java]代码库import org.faceless.pdf2.*; import java.util.Locale; import java.awt.Color; import java.uti ...

  9. html与css入门经典doc,HTML+CSS入门 flying-saucer如何利用HTML来生成PDF文件

    本篇教程介绍了HTML+CSS入门 flying-saucer如何利用HTML来生成PDF文件,希望阅读本篇文章以后大家有所收获,帮助大家HTML+CSS入门. < 1.导入maven依赖 9. ...

最新文章

  1. 博士申请 | 澳门大学汪澎洋助理教授招收机器学习方向全奖博士生
  2. 全排列(我开始怀疑自己的智商了....)
  3. Snapchat, 给年轻人要的安全感
  4. 更改数据库管理员sa账户密码
  5. Visual Studio调试技巧
  6. 【转】linux /centos 中OpenSSL升级方法详解
  7. 网管必杀技之VLAN的网络管理
  8. int 转string
  9. 再议Python协程——从yield到asyncio
  10. c#用友U8API开发之环境搭建(1)
  11. cocos creator-js-虚拟摇杆
  12. wps复选框怎么设置_wps中excel复选框怎么设置
  13. 0.96OLED显示原理及FPGA驱动程序
  14. 南开计算机等级,南开100题分类-全国计算机等级考试上机考试习题集(二级C)(南开大学出版社)...
  15. ExpandableListView使用方法详解
  16. 趣味项目—MyQQ机器人(二)关于python的pandas根据索引读写指定数据的方法实现签到功能
  17. java漫画pdf_Java并发编程学习宝典(漫画版)(PDF+HTML完结)
  18. Linux系统的上行和下行带宽的检测
  19. Excel日期加斜杠,日期时间戳互转
  20. ISCC-2019部分wp

热门文章

  1. 【python初级】 windows10上升级pip
  2. 山东省职业院校技能大赛“H5交互融媒体内容设计与制作”比赛回顾
  3. 聚观早报 | ChatGPT登顶美区iOS免费榜;库克不满苹果首款MR设备
  4. 中兴三层交换机基本配置
  5. GJB151B CS106测试方法
  6. 石高峰:顾客不会反感的实体店销售技巧和话术,成交率翻3倍!
  7. 快递员转行做站长赚钱变老板
  8. 如何查看电脑最大支持多少GB内存
  9. 小程序 cover-view 字体_iFonts 字体助手 - 用有趣的字做设计
  10. 微服务连载(二)漫谈何时从单体架构迁移到微服务?