每日一练
来源:互联网 发布:office2003是什么软件 编辑:程序博客网 时间:2024/06/10 18:46
关于java解析word的练习
package jim.java.ExtracorWord;import java.io.File;import java.io.FileInputStream;import java.io.FileWriter;import java.io.IOException;import java.io.PrintWriter;import org.textmining.text.extraction.WordExtractor;public class ExtracorWord {public static String getText(String doc){String text = "";try{//创建读入流读取DOC文件FileInputStream in = new FileInputStream(new File(doc));//创建WordExtractorWordExtractor extractor = new WordExtractor();//提取文本text = extractor.extractText(in);}catch(Exception e){text = " ";}return text;}public static void toTextFile(String doc,String txt) throws IOException{//提取文本String text = getText(doc);//写入文本PrintWriter pw = new PrintWriter(new FileWriter(new File(txt)));pw.write(text);pw.flush();pw.close();System.out.println(text);System.out.println("成功写入文本文件"+txt);}public static void main(String[] args) throws IOException{String text = getText("Hello.doc");System.out.println(text);toTextFile("Hello.doc","word.txt");}}
关于java解析html的练习:
package jim.Lius;import java.io.File;import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.FileWriter;import java.io.IOException;import java.io.PrintWriter;import java.io.UnsupportedEncodingException;import java.io.Writer;import lius.index.html.JTidyHtmlIndexer;public class ExtractorAll {public static String getText(String doc) {String text = "";String ext = doc.substring(doc.lastIndexOf(".")+1);if(ext.equalsIgnoreCase("html")||ext.equalsIgnoreCase("html")||ext.equalsIgnoreCase("shtml")){JTidyHtmlIndexer ji = new JTidyHtmlIndexer();File f = new File(doc);try {ji.setStreamToIndex(new FileInputStream(f));} catch (FileNotFoundException e) {// TODO Auto-generated catch blocke.printStackTrace();}text = ji.getContent();try {text = new String(text.getBytes("utf8"));} catch (UnsupportedEncodingException e) {// TODO Auto-generated catch blocke.printStackTrace();}System.out.println(text);}return text;}public static void toTextFile(String doc,String txt) throws IOException{//提取文本String text = getText(doc);//写入文本PrintWriter pw = new PrintWriter(new FileWriter(new File(txt)));pw.write(text);pw.flush();pw.close();System.out.println("成功写入文本文件"+txt);}public static void main(String[] args ){String text = getText("Hello.html");System.out.println(text);try {toTextFile("Hello.html","Hello.txt");} catch (IOException e) {// TODO Auto-generated catch blocke.printStackTrace();}}}
今天的收获:
了解了xml以及Dom4j基本入门了
今天的不足:
今天的效率有点低.
- 算法每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- 每日一练
- PC 仿真环境Bochs ---Linux-0.12
- 实现输出2个数中较大的一个
- poj 3252 Round Number
- 关于3C2440 FCLK, HCLK, PCLK的关系
- 【转载】alv 动态显示列
- 每日一练
- Windows核心编程【6】小结
- 晚上,说不出的愤怒
- (转)JQuery操作checkbox、radio等示例
- mini2440 IIC-AT24C08a读写实验
- 【转载】SAP PP T-Code
- poj1855(Mint)
- RT-Linux开发之-实现原理
- ini2440 触摸屏实验