目录

第一章:绪论

课件

下载:绪论

基于Java的文本分析工具

环境搭建

JDK安装及配置

JAVA_HOME: C:\Program Files\Java\jdk1.8.0_301
CLASSPATH: %JAVA_HOME%\lib\tools.jar;%JAVA_HOME%\lib\dt.jar;
PATH: %JAVA_HOME%\bin;%JAVA_HOME%\jre\bin;
> java
> javac

Maven安装及配置

> mvn -v
<localRepository>D:/dev/maven/repo</localRepository>

集成开发环境Eclipse安装及配置

测试实例

POM配置文件

POM.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
 
  <groupId>cn.edu.bjut</groupId>
  <artifactId>text-mining</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>
 
  <name>text-mining</name>
  <url>http://maven.apache.org</url>
 
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
 
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
 
    <dependency>
      <groupId>org.apache.opennlp</groupId>
      <artifactId>opennlp-tools</artifactId>
      <version>1.9.3</version>
    </dependency>
  </dependencies>
</project>

实例代码

TokenizerExample.java
package cn.edu.bjut.chapter1;
 
import opennlp.tools.tokenize.SimpleTokenizer;
 
public class TokenizerExample {
	public static void main(String[] args) {
		String text = "Text analysis and text mining are amazing!";
		SimpleTokenizer tokenizer = SimpleTokenizer.INSTANCE;
 
		System.out.println("Tokens:");
		String[] tokens = tokenizer.tokenize(text);
		for (String token : tokens) {
			System.out.println(token);
		}
	}
}