项目有个需求,前端直接复制一段文字到富文本编辑器,其中可能包含若干个url,在后台需要找到文字中的url加上a标签,方便展示的时候可以直接让用户点击。找了很久,直接上代码。

     //处理超链接//message为需要处理的字符串String msg = message;Matcher matcher = Patterns.WEB_URL.matcher(message);List<String> urls = new ArrayList<>();while (matcher.find()) {String url = matcher.group();urls.add(url);message = message.replace(url, "");matcher = Patterns.WEB_URL.matcher(message);}for (String url : urls) {msg = msg.replace(url, "<a target='_blank' href='http://" + url + "'>" + url + "</a>");}message = msg;

Patterns类摘自Android.util包

import java.util.regex.Matcher;
import java.util.regex.Pattern;/*** Commonly used regular expression patterns.*/
public class Patterns {/***  Regular expression to match all IANA top-level domains.*  List accurate as of 2011/07/18.  List taken from:*  http://data.iana.org/TLD/tlds-alpha-by-domain.txt*  This pattern is auto-generated by frameworks/ex/common/tools/make-iana-tld-pattern.py**  @deprecated Due to the recent profileration of gTLDs, this API is*  expected to become out-of-date very quickly. Therefore it is now*  deprecated.*/@Deprecatedpublic static final String TOP_LEVEL_DOMAIN_STR ="((aero|arpa|asia|a[cdefgilmnoqrstuwxz])"+ "|(biz|b[abdefghijmnorstvwyz])"+ "|(cat|com|coop|c[acdfghiklmnoruvxyz])"+ "|d[ejkmoz]"+ "|(edu|e[cegrstu])"+ "|f[ijkmor]"+ "|(gov|g[abdefghilmnpqrstuwy])"+ "|h[kmnrtu]"+ "|(info|int|i[delmnoqrst])"+ "|(jobs|j[emop])"+ "|k[eghimnprwyz]"+ "|l[abcikrstuvy]"+ "|(mil|mobi|museum|m[acdeghklmnopqrstuvwxyz])"+ "|(name|net|n[acefgilopruz])"+ "|(org|om)"+ "|(pro|p[aefghklmnrstwy])"+ "|qa"+ "|r[eosuw]"+ "|s[abcdeghijklmnortuvyz]"+ "|(tel|travel|t[cdfghjklmnoprtvwz])"+ "|u[agksyz]"+ "|v[aceginu]"+ "|w[fs]"+ "|(\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae|\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435|\u0440\u0444|\u0441\u0440\u0431|\u05d8\u05e2\u05e1\u05d8|\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc|\u0625\u062e\u062a\u0628\u0627\u0631|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633|\u0633\u0648\u0631\u064a\u0629|\u0641\u0644\u0633\u0637\u064a\u0646|\u0642\u0637\u0631|\u0645\u0635\u0631|\u092a\u0930\u0940\u0915\u094d\u0937\u093e|\u092d\u093e\u0930\u0924|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd|\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e44\u0e17\u0e22|\u30c6\u30b9\u30c8|\u4e2d\u56fd|\u4e2d\u570b|\u53f0\u6e7e|\u53f0\u7063|\u65b0\u52a0\u5761|\u6d4b\u8bd5|\u6e2c\u8a66|\u9999\u6e2f|\ud14c\uc2a4\ud2b8|\ud55c\uad6d|xn\\-\\-0zwm56d|xn\\-\\-11b5bs3a9aj6g|xn\\-\\-3e0b707e|xn\\-\\-45brj9c|xn\\-\\-80akhbyknj4f|xn\\-\\-90a3ac|xn\\-\\-9t4b11yi5a|xn\\-\\-clchc0ea0b2g2a9gcd|xn\\-\\-deba0ad|xn\\-\\-fiqs8s|xn\\-\\-fiqz9s|xn\\-\\-fpcrj9c3d|xn\\-\\-fzc2c9e2c|xn\\-\\-g6w251d|xn\\-\\-gecrj9c|xn\\-\\-h2brj9c|xn\\-\\-hgbk6aj7f53bba|xn\\-\\-hlcj6aya9esc7a|xn\\-\\-j6w193g|xn\\-\\-jxalpdlp|xn\\-\\-kgbechtv|xn\\-\\-kprw13d|xn\\-\\-kpry57d|xn\\-\\-lgbbat1ad8j|xn\\-\\-mgbaam7a8h|xn\\-\\-mgbayh7gpa|xn\\-\\-mgbbh1a71e|xn\\-\\-mgbc0a9azcg|xn\\-\\-mgberp4a5d4ar|xn\\-\\-o3cw4h|xn\\-\\-ogbpf8fl|xn\\-\\-p1ai|xn\\-\\-pgbs0dh|xn\\-\\-s9brj9c|xn\\-\\-wgbh1c|xn\\-\\-wgbl6a|xn\\-\\-xkc2al3hye2a|xn\\-\\-xkc2dl3a5ee0h|xn\\-\\-yfro4i67o|xn\\-\\-ygbi2ammx|xn\\-\\-zckzah|xxx)"+ "|y[et]"+ "|z[amw])";/***  Regular expression pattern to match all IANA top-level domains.*  @deprecated This API is deprecated. See {@link #TOP_LEVEL_DOMAIN_STR}.*/@Deprecatedpublic static final Pattern TOP_LEVEL_DOMAIN =Pattern.compile(TOP_LEVEL_DOMAIN_STR);/***  Regular expression to match all IANA top-level domains for WEB_URL.*  List accurate as of 2011/07/18.  List taken from:*  http://data.iana.org/TLD/tlds-alpha-by-domain.txt*  This pattern is auto-generated by frameworks/ex/common/tools/make-iana-tld-pattern.py**  @deprecated This API is deprecated. See {@link #TOP_LEVEL_DOMAIN_STR}.*/@Deprecatedpublic static final String TOP_LEVEL_DOMAIN_STR_FOR_WEB_URL ="(?:"+ "(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])"+ "|(?:biz|b[abdefghijmnorstvwyz])"+ "|(?:cat|com|coop|c[acdfghiklmnoruvxyz])"+ "|d[ejkmoz]"+ "|(?:edu|e[cegrstu])"+ "|f[ijkmor]"+ "|(?:gov|g[abdefghilmnpqrstuwy])"+ "|h[kmnrtu]"+ "|(?:info|int|i[delmnoqrst])"+ "|(?:jobs|j[emop])"+ "|k[eghimnprwyz]"+ "|l[abcikrstuvy]"+ "|(?:mil|mobi|museum|m[acdeghklmnopqrstuvwxyz])"+ "|(?:name|net|n[acefgilopruz])"+ "|(?:org|om)"+ "|(?:pro|p[aefghklmnrstwy])"+ "|qa"+ "|r[eosuw]"+ "|s[abcdeghijklmnortuvyz]"+ "|(?:tel|travel|t[cdfghjklmnoprtvwz])"+ "|u[agksyz]"+ "|v[aceginu]"+ "|w[fs]"+ "|(?:\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae|\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435|\u0440\u0444|\u0441\u0440\u0431|\u05d8\u05e2\u05e1\u05d8|\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc|\u0625\u062e\u062a\u0628\u0627\u0631|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633|\u0633\u0648\u0631\u064a\u0629|\u0641\u0644\u0633\u0637\u064a\u0646|\u0642\u0637\u0631|\u0645\u0635\u0631|\u092a\u0930\u0940\u0915\u094d\u0937\u093e|\u092d\u093e\u0930\u0924|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd|\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e44\u0e17\u0e22|\u30c6\u30b9\u30c8|\u4e2d\u56fd|\u4e2d\u570b|\u53f0\u6e7e|\u53f0\u7063|\u65b0\u52a0\u5761|\u6d4b\u8bd5|\u6e2c\u8a66|\u9999\u6e2f|\ud14c\uc2a4\ud2b8|\ud55c\uad6d|xn\\-\\-0zwm56d|xn\\-\\-11b5bs3a9aj6g|xn\\-\\-3e0b707e|xn\\-\\-45brj9c|xn\\-\\-80akhbyknj4f|xn\\-\\-90a3ac|xn\\-\\-9t4b11yi5a|xn\\-\\-clchc0ea0b2g2a9gcd|xn\\-\\-deba0ad|xn\\-\\-fiqs8s|xn\\-\\-fiqz9s|xn\\-\\-fpcrj9c3d|xn\\-\\-fzc2c9e2c|xn\\-\\-g6w251d|xn\\-\\-gecrj9c|xn\\-\\-h2brj9c|xn\\-\\-hgbk6aj7f53bba|xn\\-\\-hlcj6aya9esc7a|xn\\-\\-j6w193g|xn\\-\\-jxalpdlp|xn\\-\\-kgbechtv|xn\\-\\-kprw13d|xn\\-\\-kpry57d|xn\\-\\-lgbbat1ad8j|xn\\-\\-mgbaam7a8h|xn\\-\\-mgbayh7gpa|xn\\-\\-mgbbh1a71e|xn\\-\\-mgbc0a9azcg|xn\\-\\-mgberp4a5d4ar|xn\\-\\-o3cw4h|xn\\-\\-ogbpf8fl|xn\\-\\-p1ai|xn\\-\\-pgbs0dh|xn\\-\\-s9brj9c|xn\\-\\-wgbh1c|xn\\-\\-wgbl6a|xn\\-\\-xkc2al3hye2a|xn\\-\\-xkc2dl3a5ee0h|xn\\-\\-yfro4i67o|xn\\-\\-ygbi2ammx|xn\\-\\-zckzah|xxx)"+ "|y[et]"+ "|z[amw]))";/***  Regular expression to match all IANA top-level domains.**  List accurate as of 2015/11/24.  List taken from:*  http://data.iana.org/TLD/tlds-alpha-by-domain.txt*  This pattern is auto-generated by frameworks/ex/common/tools/make-iana-tld-pattern.py**  @hide*/static final String IANA_TOP_LEVEL_DOMAINS ="(?:"+ "(?:aaa|aarp|abb|abbott|abogado|academy|accenture|accountant|accountants|aco|active"+ "|actor|ads|adult|aeg|aero|afl|agency|aig|airforce|airtel|allfinanz|alsace|amica|amsterdam"+ "|android|apartments|app|apple|aquarelle|aramco|archi|army|arpa|arte|asia|associates"+ "|attorney|auction|audio|auto|autos|axa|azure|a[cdefgilmoqrstuwxz])"+ "|(?:band|bank|bar|barcelona|barclaycard|barclays|bargains|bauhaus|bayern|bbc|bbva"+ "|bcn|beats|beer|bentley|berlin|best|bet|bharti|bible|bid|bike|bing|bingo|bio|biz|black"+ "|blackfriday|bloomberg|blue|bms|bmw|bnl|bnpparibas|boats|bom|bond|boo|boots|boutique"+ "|bradesco|bridgestone|broadway|broker|brother|brussels|budapest|build|builders|business"+ "|buzz|bzh|b[abdefghijmnorstvwyz])"+ "|(?:cab|cafe|cal|camera|camp|cancerresearch|canon|capetown|capital|car|caravan|cards"+ "|care|career|careers|cars|cartier|casa|cash|casino|cat|catering|cba|cbn|ceb|center|ceo"+ "|cern|cfa|cfd|chanel|channel|chat|cheap|chloe|christmas|chrome|church|cipriani|cisco"+ "|citic|city|cityeats|claims|cleaning|click|clinic|clothing|cloud|club|clubmed|coach"+ "|codes|coffee|college|cologne|com|commbank|community|company|computer|comsec|condos"+ "|construction|consulting|contractors|cooking|cool|coop|corsica|country|coupons|courses"+ "|credit|creditcard|creditunion|cricket|crown|crs|cruises|csc|cuisinella|cymru|cyou|c[acdfghiklmnoruvwxyz])"+ "|(?:dabur|dad|dance|date|dating|datsun|day|dclk|deals|degree|delivery|dell|delta"+ "|democrat|dental|dentist|desi|design|dev|diamonds|diet|digital|direct|directory|discount"+ "|dnp|docs|dog|doha|domains|doosan|download|drive|durban|dvag|d[ejkmoz])"+ "|(?:earth|eat|edu|education|email|emerck|energy|engineer|engineering|enterprises"+ "|epson|equipment|erni|esq|estate|eurovision|eus|events|everbank|exchange|expert|exposed"+ "|express|e[cegrstu])"+ "|(?:fage|fail|fairwinds|faith|family|fan|fans|farm|fashion|feedback|ferrero|film"+ "|final|finance|financial|firmdale|fish|fishing|fit|fitness|flights|florist|flowers|flsmidth"+ "|fly|foo|football|forex|forsale|forum|foundation|frl|frogans|fund|furniture|futbol|fyi"+ "|f[ijkmor])"+ "|(?:gal|gallery|game|garden|gbiz|gdn|gea|gent|genting|ggee|gift|gifts|gives|giving"+ "|glass|gle|global|globo|gmail|gmo|gmx|gold|goldpoint|golf|goo|goog|google|gop|gov|grainger"+ "|graphics|gratis|green|gripe|group|gucci|guge|guide|guitars|guru|g[abdefghilmnpqrstuwy])"+ "|(?:hamburg|hangout|haus|healthcare|help|here|hermes|hiphop|hitachi|hiv|hockey|holdings"+ "|holiday|homedepot|homes|honda|horse|host|hosting|hoteles|hotmail|house|how|hsbc|hyundai"+ "|h[kmnrtu])"+ "|(?:ibm|icbc|ice|icu|ifm|iinet|immo|immobilien|industries|infiniti|info|ing|ink|institute"+ "|insure|int|international|investments|ipiranga|irish|ist|istanbul|itau|iwc|i[delmnoqrst])"+ "|(?:jaguar|java|jcb|jetzt|jewelry|jlc|jll|jobs|joburg|jprs|juegos|j[emop])"+ "|(?:kaufen|kddi|kia|kim|kinder|kitchen|kiwi|koeln|komatsu|krd|kred|kyoto|k[eghimnprwyz])"+ "|(?:lacaixa|lancaster|land|landrover|lasalle|lat|latrobe|law|lawyer|lds|lease|leclerc"+ "|legal|lexus|lgbt|liaison|lidl|life|lifestyle|lighting|limited|limo|linde|link|live"+ "|lixil|loan|loans|lol|london|lotte|lotto|love|ltd|ltda|lupin|luxe|luxury|l[abcikrstuvy])"+ "|(?:madrid|maif|maison|man|management|mango|market|marketing|markets|marriott|mba"+ "|media|meet|melbourne|meme|memorial|men|menu|meo|miami|microsoft|mil|mini|mma|mobi|moda"+ "|moe|moi|mom|monash|money|montblanc|mormon|mortgage|moscow|motorcycles|mov|movie|movistar"+ "|mtn|mtpc|mtr|museum|mutuelle|m[acdeghklmnopqrstuvwxyz])"+ "|(?:nadex|nagoya|name|navy|nec|net|netbank|network|neustar|new|news|nexus|ngo|nhk"+ "|nico|ninja|nissan|nokia|nra|nrw|ntt|nyc|n[acefgilopruz])"+ "|(?:obi|office|okinawa|omega|one|ong|onl|online|ooo|oracle|orange|org|organic|osaka"+ "|otsuka|ovh|om)"+ "|(?:page|panerai|paris|partners|parts|party|pet|pharmacy|philips|photo|photography"+ "|photos|physio|piaget|pics|pictet|pictures|ping|pink|pizza|place|play|playstation|plumbing"+ "|plus|pohl|poker|porn|post|praxi|press|pro|prod|productions|prof|properties|property"+ "|protection|pub|p[aefghklmnrstwy])"+ "|(?:qpon|quebec|qa)"+ "|(?:racing|realtor|realty|recipes|red|redstone|rehab|reise|reisen|reit|ren|rent|rentals"+ "|repair|report|republican|rest|restaurant|review|reviews|rich|ricoh|rio|rip|rocher|rocks"+ "|rodeo|rsvp|ruhr|run|rwe|ryukyu|r[eosuw])"+ "|(?:saarland|sakura|sale|samsung|sandvik|sandvikcoromant|sanofi|sap|sapo|sarl|saxo"+ "|sbs|sca|scb|schmidt|scholarships|school|schule|schwarz|science|scor|scot|seat|security"+ "|seek|sener|services|seven|sew|sex|sexy|shiksha|shoes|show|shriram|singles|site|ski"+ "|sky|skype|sncf|soccer|social|software|sohu|solar|solutions|sony|soy|space|spiegel|spreadbetting"+ "|srl|stada|starhub|statoil|stc|stcgroup|stockholm|studio|study|style|sucks|supplies"+ "|supply|support|surf|surgery|suzuki|swatch|swiss|sydney|systems|s[abcdeghijklmnortuvxyz])"+ "|(?:tab|taipei|tatamotors|tatar|tattoo|tax|taxi|team|tech|technology|tel|telefonica"+ "|temasek|tennis|thd|theater|theatre|tickets|tienda|tips|tires|tirol|today|tokyo|tools"+ "|top|toray|toshiba|tours|town|toyota|toys|trade|trading|training|travel|trust|tui|t[cdfghjklmnortvwz])"+ "|(?:ubs|university|uno|uol|u[agksyz])"+ "|(?:vacations|vana|vegas|ventures|versicherung|vet|viajes|video|villas|vin|virgin"+ "|vision|vista|vistaprint|viva|vlaanderen|vodka|vote|voting|voto|voyage|v[aceginu])"+ "|(?:wales|walter|wang|watch|webcam|website|wed|wedding|weir|whoswho|wien|wiki|williamhill"+ "|win|windows|wine|wme|work|works|world|wtc|wtf|w[fs])"+ "|(?:\u03b5\u03bb|\u0431\u0435\u043b|\u0434\u0435\u0442\u0438|\u043a\u043e\u043c|\u043c\u043a\u0434"+ "|\u043c\u043e\u043d|\u043c\u043e\u0441\u043a\u0432\u0430|\u043e\u043d\u043b\u0430\u0439\u043d"+ "|\u043e\u0440\u0433|\u0440\u0443\u0441|\u0440\u0444|\u0441\u0430\u0439\u0442|\u0441\u0440\u0431"+ "|\u0443\u043a\u0440|\u049b\u0430\u0437|\u0570\u0561\u0575|\u05e7\u05d5\u05dd|\u0627\u0631\u0627\u0645\u0643\u0648"+ "|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629"+ "|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0627\u06cc\u0631\u0627\u0646"+ "|\u0628\u0627\u0632\u0627\u0631|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633"+ "|\u0633\u0648\u062f\u0627\u0646|\u0633\u0648\u0631\u064a\u0629|\u0634\u0628\u0643\u0629"+ "|\u0639\u0631\u0627\u0642|\u0639\u0645\u0627\u0646|\u0641\u0644\u0633\u0637\u064a\u0646"+ "|\u0642\u0637\u0631|\u0643\u0648\u0645|\u0645\u0635\u0631|\u0645\u0644\u064a\u0633\u064a\u0627"+ "|\u0645\u0648\u0642\u0639|\u0915\u0949\u092e|\u0928\u0947\u091f|\u092d\u093e\u0930\u0924"+ "|\u0938\u0902\u0917\u0920\u0928|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4"+ "|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd"+ "|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e04\u0e2d\u0e21|\u0e44\u0e17\u0e22"+ "|\u10d2\u10d4|\u307f\u3093\u306a|\u30b0\u30fc\u30b0\u30eb|\u30b3\u30e0|\u4e16\u754c"+ "|\u4e2d\u4fe1|\u4e2d\u56fd|\u4e2d\u570b|\u4e2d\u6587\u7f51|\u4f01\u4e1a|\u4f5b\u5c71"+ "|\u4fe1\u606f|\u5065\u5eb7|\u516b\u5366|\u516c\u53f8|\u516c\u76ca|\u53f0\u6e7e|\u53f0\u7063"+ "|\u5546\u57ce|\u5546\u5e97|\u5546\u6807|\u5728\u7ebf|\u5927\u62ff|\u5a31\u4e50|\u5de5\u884c"+ "|\u5e7f\u4e1c|\u6148\u5584|\u6211\u7231\u4f60|\u624b\u673a|\u653f\u52a1|\u653f\u5e9c"+ "|\u65b0\u52a0\u5761|\u65b0\u95fb|\u65f6\u5c1a|\u673a\u6784|\u6de1\u9a6c\u9521|\u6e38\u620f"+ "|\u70b9\u770b|\u79fb\u52a8|\u7ec4\u7ec7\u673a\u6784|\u7f51\u5740|\u7f51\u5e97|\u7f51\u7edc"+ "|\u8c37\u6b4c|\u96c6\u56e2|\u98de\u5229\u6d66|\u9910\u5385|\u9999\u6e2f|\ub2f7\ub137"+ "|\ub2f7\ucef4|\uc0bc\uc131|\ud55c\uad6d|xbox"+ "|xerox|xin|xn\\-\\-11b4c3d|xn\\-\\-1qqw23a|xn\\-\\-30rr7y|xn\\-\\-3bst00m|xn\\-\\-3ds443g"+ "|xn\\-\\-3e0b707e|xn\\-\\-3pxu8k|xn\\-\\-42c2d9a|xn\\-\\-45brj9c|xn\\-\\-45q11c|xn\\-\\-4gbrim"+ "|xn\\-\\-55qw42g|xn\\-\\-55qx5d|xn\\-\\-6frz82g|xn\\-\\-6qq986b3xl|xn\\-\\-80adxhks"+ "|xn\\-\\-80ao21a|xn\\-\\-80asehdb|xn\\-\\-80aswg|xn\\-\\-90a3ac|xn\\-\\-90ais|xn\\-\\-9dbq2a"+ "|xn\\-\\-9et52u|xn\\-\\-b4w605ferd|xn\\-\\-c1avg|xn\\-\\-c2br7g|xn\\-\\-cg4bki|xn\\-\\-clchc0ea0b2g2a9gcd"+ "|xn\\-\\-czr694b|xn\\-\\-czrs0t|xn\\-\\-czru2d|xn\\-\\-d1acj3b|xn\\-\\-d1alf|xn\\-\\-efvy88h"+ "|xn\\-\\-estv75g|xn\\-\\-fhbei|xn\\-\\-fiq228c5hs|xn\\-\\-fiq64b|xn\\-\\-fiqs8s|xn\\-\\-fiqz9s"+ "|xn\\-\\-fjq720a|xn\\-\\-flw351e|xn\\-\\-fpcrj9c3d|xn\\-\\-fzc2c9e2c|xn\\-\\-gecrj9c"+ "|xn\\-\\-h2brj9c|xn\\-\\-hxt814e|xn\\-\\-i1b6b1a6a2e|xn\\-\\-imr513n|xn\\-\\-io0a7i"+ "|xn\\-\\-j1aef|xn\\-\\-j1amh|xn\\-\\-j6w193g|xn\\-\\-kcrx77d1x4a|xn\\-\\-kprw13d|xn\\-\\-kpry57d"+ "|xn\\-\\-kput3i|xn\\-\\-l1acc|xn\\-\\-lgbbat1ad8j|xn\\-\\-mgb9awbf|xn\\-\\-mgba3a3ejt"+ "|xn\\-\\-mgba3a4f16a|xn\\-\\-mgbaam7a8h|xn\\-\\-mgbab2bd|xn\\-\\-mgbayh7gpa|xn\\-\\-mgbbh1a71e"+ "|xn\\-\\-mgbc0a9azcg|xn\\-\\-mgberp4a5d4ar|xn\\-\\-mgbpl2fh|xn\\-\\-mgbtx2b|xn\\-\\-mgbx4cd0ab"+ "|xn\\-\\-mk1bu44c|xn\\-\\-mxtq1m|xn\\-\\-ngbc5azd|xn\\-\\-node|xn\\-\\-nqv7f|xn\\-\\-nqv7fs00ema"+ "|xn\\-\\-nyqy26a|xn\\-\\-o3cw4h|xn\\-\\-ogbpf8fl|xn\\-\\-p1acf|xn\\-\\-p1ai|xn\\-\\-pgbs0dh"+ "|xn\\-\\-pssy2u|xn\\-\\-q9jyb4c|xn\\-\\-qcka1pmc|xn\\-\\-qxam|xn\\-\\-rhqv96g|xn\\-\\-s9brj9c"+ "|xn\\-\\-ses554g|xn\\-\\-t60b56a|xn\\-\\-tckwe|xn\\-\\-unup4y|xn\\-\\-vermgensberater\\-ctb"+ "|xn\\-\\-vermgensberatung\\-pwb|xn\\-\\-vhquv|xn\\-\\-vuq861b|xn\\-\\-wgbh1c|xn\\-\\-wgbl6a"+ "|xn\\-\\-xhq521b|xn\\-\\-xkc2al3hye2a|xn\\-\\-xkc2dl3a5ee0h|xn\\-\\-y9a3aq|xn\\-\\-yfro4i67o"+ "|xn\\-\\-ygbi2ammx|xn\\-\\-zfr164b|xperia|xxx|xyz)"+ "|(?:yachts|yamaxun|yandex|yodobashi|yoga|yokohama|youtube|y[et])"+ "|(?:zara|zip|zone|zuerich|z[amw]))";/*** Kept for backward compatibility reasons.** @deprecated Deprecated since it does not include all IRI characters defined in RFC 3987*/@Deprecatedpublic static final String GOOD_IRI_CHAR ="a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF";public static final Pattern IP_ADDRESS= Pattern.compile("((25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(25[0-5]|2[0-4]"+ "[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(25[0-5]|2[0-4][0-9]|[0-1]"+ "[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}"+ "|[1-9][0-9]|[0-9]))");/*** Valid UCS characters defined in RFC 3987. Excludes space characters.*/private static final String UCS_CHAR = "[" +"\u00A0-\uD7FF" +"\uF900-\uFDCF" +"\uFDF0-\uFFEF" +"\uD800\uDC00-\uD83F\uDFFD" +"\uD840\uDC00-\uD87F\uDFFD" +"\uD880\uDC00-\uD8BF\uDFFD" +"\uD8C0\uDC00-\uD8FF\uDFFD" +"\uD900\uDC00-\uD93F\uDFFD" +"\uD940\uDC00-\uD97F\uDFFD" +"\uD980\uDC00-\uD9BF\uDFFD" +"\uD9C0\uDC00-\uD9FF\uDFFD" +"\uDA00\uDC00-\uDA3F\uDFFD" +"\uDA40\uDC00-\uDA7F\uDFFD" +"\uDA80\uDC00-\uDABF\uDFFD" +"\uDAC0\uDC00-\uDAFF\uDFFD" +"\uDB00\uDC00-\uDB3F\uDFFD" +"\uDB44\uDC00-\uDB7F\uDFFD" +"&&[^\u00A0[\u2000-\u200A]\u2028\u2029\u202F\u3000]]";/*** Valid characters for IRI label defined in RFC 3987.*/private static final String LABEL_CHAR = "a-zA-Z0-9" + UCS_CHAR;/*** Valid characters for IRI TLD defined in RFC 3987.*/private static final String TLD_CHAR = "a-zA-Z" + UCS_CHAR;/*** RFC 1035 Section 2.3.4 limits the labels to a maximum 63 octets.*/private static final String IRI_LABEL ="[" + LABEL_CHAR + "](?:[" + LABEL_CHAR + "\\-]{0,61}[" + LABEL_CHAR + "]){0,1}";/*** RFC 3492 references RFC 1034 and limits Punycode algorithm output to 63 characters.*/private static final String PUNYCODE_TLD = "xn\\-\\-[\\w\\-]{0,58}\\w";private static final String TLD = "(" + PUNYCODE_TLD + "|" + "[" + TLD_CHAR + "]{2,63}" +")";private static final String HOST_NAME = "(" + IRI_LABEL + "\\.)+" + TLD;public static final Pattern DOMAIN_NAME= Pattern.compile("(" + HOST_NAME + "|" + IP_ADDRESS + ")");private static final String PROTOCOL = "(?i:http|https|rtsp):\\/\\/";/* A word boundary or end of input.  This is to stop foo.sure from matching as foo.su */private static final String WORD_BOUNDARY = "(?:\\b|$|^)";private static final String USER_INFO = "(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)"+ "\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_"+ "\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@";private static final String PORT_NUMBER = "\\:\\d{1,5}";private static final String PATH_AND_QUERY = "\\/(?:(?:[" + LABEL_CHAR+ "\\;\\/\\?\\:\\@\\&\\=\\#\\~"  // plus optional query params+ "\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*";/***  Regular expression pattern to match most part of RFC 3987*  Internationalized URLs, aka IRIs.*/public static final Pattern WEB_URL = Pattern.compile("("+ "("+ "(?:" + PROTOCOL + "(?:" + USER_INFO + ")?" + ")?"+ "(?:" + DOMAIN_NAME + ")"+ "(?:" + PORT_NUMBER + ")?"+ ")"+ "(" + PATH_AND_QUERY + ")?"+ WORD_BOUNDARY+ ")");/*** Regular expression that matches known TLDs and punycode TLDs*/private static final String STRICT_TLD = "(?:" +IANA_TOP_LEVEL_DOMAINS + "|" + PUNYCODE_TLD + ")";/*** Regular expression that matches host names using {@link #STRICT_TLD}*/private static final String STRICT_HOST_NAME = "(?:(?:" + IRI_LABEL + "\\.)+"+ STRICT_TLD + ")";/*** Regular expression that matches domain names using either {@link #STRICT_HOST_NAME} or* {@link #IP_ADDRESS}*/private static final Pattern STRICT_DOMAIN_NAME= Pattern.compile("(?:" + STRICT_HOST_NAME + "|" + IP_ADDRESS + ")");/*** Regular expression that matches domain names without a TLD*/private static final String RELAXED_DOMAIN_NAME ="(?:" + "(?:" + IRI_LABEL + "(?:\\.(?=\\S))" +"?)+" + "|" + IP_ADDRESS + ")";/*** Regular expression to match strings that do not start with a supported protocol. The TLDs* are expected to be one of the known TLDs.*/private static final String WEB_URL_WITHOUT_PROTOCOL = "("+ WORD_BOUNDARY+ "(?<!:\\/\\/)"+ "("+ "(?:" + STRICT_DOMAIN_NAME + ")"+ "(?:" + PORT_NUMBER + ")?"+ ")"+ "(?:" + PATH_AND_QUERY + ")?"+ WORD_BOUNDARY+ ")";/*** Regular expression to match strings that start with a supported protocol. Rules for domain* names and TLDs are more relaxed. TLDs are optional.*/private static final String WEB_URL_WITH_PROTOCOL = "("+ WORD_BOUNDARY+ "(?:"+ "(?:" + PROTOCOL + "(?:" + USER_INFO + ")?" + ")"+ "(?:" + RELAXED_DOMAIN_NAME + ")?"+ "(?:" + PORT_NUMBER + ")?"+ ")"+ "(?:" + PATH_AND_QUERY + ")?"+ WORD_BOUNDARY+ ")";/*** Regular expression pattern to match IRIs. If a string starts with http(s):// the expression* tries to match the URL structure with a relaxed rule for TLDs. If the string does not start* with http(s):// the TLDs are expected to be one of the known TLDs.** @hide*/public static final Pattern AUTOLINK_WEB_URL = Pattern.compile("(" + WEB_URL_WITH_PROTOCOL + "|" + WEB_URL_WITHOUT_PROTOCOL + ")");/*** Regular expression for valid email characters. Does not include some of the valid characters* defined in RFC5321: #&~!^`{}/=$*?|*/private static final String EMAIL_CHAR = LABEL_CHAR + "\\+\\-_%'";/*** Regular expression for local part of an email address. RFC5321 section 4.5.3.1.1 limits* the local part to be at most 64 octets.*/private static final String EMAIL_ADDRESS_LOCAL_PART ="[" + EMAIL_CHAR + "]" + "(?:[" + EMAIL_CHAR + "\\.]{1,62}[" + EMAIL_CHAR + "])?";/*** Regular expression for the domain part of an email address. RFC5321 section 4.5.3.1.2 limits* the domain to be at most 255 octets.*/private static final String EMAIL_ADDRESS_DOMAIN ="(?=.{1,255}(?:\\s|$|^))" + HOST_NAME;/*** Regular expression pattern to match email addresses. It excludes double quoted local parts* and the special characters #&~!^`{}/=$*?| that are included in RFC5321.* @hide*/public static final Pattern AUTOLINK_EMAIL_ADDRESS = Pattern.compile("(" + WORD_BOUNDARY +"(?:" + EMAIL_ADDRESS_LOCAL_PART + "@" + EMAIL_ADDRESS_DOMAIN + ")" +WORD_BOUNDARY + ")");public static final Pattern EMAIL_ADDRESS= Pattern.compile("[a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}" +"\\@" +"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}" +"(" +"\\." +"[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25}" +")+");/*** This pattern is intended for searching for things that look like they* might be phone numbers in arbitrary text, not for validating whether* something is in fact a phone number.  It will miss many things that* are legitimate phone numbers.** <p> The pattern matches the following:* <ul>* <li>Optionally, a + sign followed immediately by one or more digits. Spaces, dots, or dashes* may follow.* <li>Optionally, sets of digits in parentheses, separated by spaces, dots, or dashes.* <li>A string starting and ending with a digit, containing digits, spaces, dots, and/or dashes.* </ul>*/public static final Pattern PHONE= Pattern.compile(                      // sdd = space, dot, or dash"(\\+[0-9]+[\\- \\.]*)?"        // +<digits><sdd>*+ "(\\([0-9]+\\)[\\- \\.]*)?"   // (<digits>)<sdd>*+ "([0-9][0-9\\- \\.]+[0-9])"); // <digit><digit|sdd>+<digit>/***  Convenience method to take all of the non-null matching groups in a*  regex Matcher and return them as a concatenated string.**  @param matcher      The Matcher object from which grouped text will*                      be extracted**  @return             A String comprising all of the non-null matched*                      groups concatenated together*/public static final String concatGroups(Matcher matcher) {StringBuilder b = new StringBuilder();final int numGroups = matcher.groupCount();for (int i = 1; i <= numGroups; i++) {String s = matcher.group(i);if (s != null) {b.append(s);}}return b.toString();}/*** Convenience method to return only the digits and plus signs* in the matching string.** @param matcher      The Matcher object from which digits and plus will*                     be extracted** @return             A String comprising all of the digits and plus in*                     the match*/public static final String digitsAndPlusOnly(Matcher matcher) {StringBuilder buffer = new StringBuilder();String matchingRegion = matcher.group();for (int i = 0, size = matchingRegion.length(); i < size; i++) {char character = matchingRegion.charAt(i);if (character == '+' || Character.isDigit(character)) {buffer.append(character);}}return buffer.toString();}/*** Do not create this static utility class.*/private Patterns() {}
}

Java提取文本中的超链接添加a标签相关推荐

  1. 提取文本中的汉字字符串

    java 编程点滴 提取文本中的汉字字符串 提取文本中的汉字字符串 代码中含有中文字符,希望将代码中的中文字符提取出来,输出到数据库表格,然后补充对应的英文翻译. 继续处理代码,将文中的中文字符,通过 ...

  2. hive UDF 提取文本中的国家名

    要提取文本中的国家名,我的思路是: 1. 先分词,提取出地区名 2. 将国家名通过外部文件的形式读入HASHSET 3. 对第1步中提取的地区名到2中的HASESET进行查询,如果有,则返回 当UDF ...

  3. java 区分中英文,Java 区分文本中的中英文字符函数

    Java 区分文本中的中英文字符函数 复制代码 代码如下: public class EnCnTest { public static void main(String args[]) { byte ...

  4. Java提取视频中的音频

    话不多说,上代码 Scanner scanner = new Scanner(System.in);System.out.println("请输入被提取音频的视频:");Strin ...

  5. Excel VBA: 提取文本中的数字

    在处理Excel表格时,经常会碰到文字中掺杂数字而且需要对这些数字进行处理的情况.写一个简单的VBA函数,即可轻松提取文本中的数字. 先看效果: 再上代码: Function GetNum&( ...

  6. shell 批量提取文本中字符

    shell 批量提取文本中字符 利用awk提取文本中的字符串,亲测有效 1.创建txt文件的方法 (1)利用vim编辑 vi系列编辑器有两种分别是vi 和vim ,建议使用vim ,vim编辑器更方便 ...

  7. php a标签里 href的mysql_php,正则表达式_php提取html中指定div下a标签的text和href问题,php,正则表达式 - phpStudy...

    php提取html中指定div下a标签的text和href问题 已解决,有点凌乱,速度就行. header('content-type:application/json;charset=utf8'); ...

  8. archives_do.php,织梦后台文章管理中增加批量添加tag标签功能

    需求背景: 之前发布了不少文章,后来发现之前的这些文章当时发布时都没有添加tag标签,然而现在需要都添加上tag标签,然而这个在默认的织梦后台是没办法批量实现的,只能一篇篇文章去修改. 思考: 织梦原 ...

  9. php正则提取a,正则表达式 - php提取html中指定div下a标签的text和href问题

    正则表达式 - php提取html中指定div下a标签的text和href问题原创2016-06-06 20:27:030617 已解决,有点凌乱,速度就行. (.*?)%sim'; preg ...

最新文章

  1. Java后端Java面试题总结2021(春招+秋招+社招)
  2. JavaScript substr() 和 substring() 方法的区别
  3. 玩游戏4g计算机的内存不足,windows8.1玩游戏经常提示内存不足
  4. ssh 远程登录不用密码 ssh-keygen -t rsa
  5. centos虚拟机根目录空间分配
  6. lvds接口屏线安装图解_五分钟让你学会液晶拼接屏安装方法
  7. dp之01背包hdu3466(带限制的,当你所拥有的钱数大于某个限定值时才可以购买该物品)...
  8. 「报表开发者必看」避开这6个误区,报表才能好看又实用
  9. selenium 无法启动IE浏览器的解决方法
  10. 图像处理-自适应的二值化图像
  11. jupyter notebook 修改主题、字体、字号等
  12. 模仿showModalDialog的总在最前
  13. 关于Python学习的一点说明
  14. vs2008怎么创建c语言程序,VS2008的使用
  15. Houdini 快捷键
  16. cpu性能测试软件 国际象棋,Fritz Chess Benchmark
  17. Lee滤波和Refine-Lee滤波
  18. VxWorks编译ACE和TAO
  19. 查看Ubuntu下的U盘
  20. [喵咪软件推荐(1)]全球国家信息库

热门文章

  1. 字节首款VR头显来了:2992元起,光学清晰度提升86%,不拿手柄也能玩
  2. 证券从业资格证知识点易错
  3. 数据结构(二)栈和队列(模板实现)
  4. 斯坦福大学-李菲菲,深度学习10-13课时听课笔记
  5. linux 监控硬盘读写,干货|监控硬盘读写
  6. 应对steamVR 软件报错306的问题
  7. 运算放大器工作原理是什么?
  8. inc fun funadmin.php,织梦安装百度编辑器
  9. 全球及中国电子线路保护器行业发展格局与投资规划研究报告2022版
  10. Android中自定义农历日历,CalendarView Android 上一个优雅、万能自定义 UI、性能高效的日历控件,热插拔!热插拔!热插拔!重要的事 @codeKK Android开源站...