Greenplum利用gpload,gpfist实现数据入库

来源:互联网 发布:centos 6.7 64位下载 编辑:程序博客网 时间:2024/06/03 01:20

1.python版本要求2.4.4以上

[root@test install]# pythonPython 2.6.2 (r262:71600, May 14 2009, 10:46:21) [GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> 

2.PyYAML 3.10包配置

Downloads ↓

YAML parser and emitter for Python

YAML is a data serialization format designed for human readability and interaction with scripting languages. PyYAML is a YAML parser and emitter for Python.

PyYAML features a complete YAML 1.1 parser, Unicode support, pickle support, capable extension API, and sensible error messages. PyYAML supports standard YAML tags and provides Python-specific tags that allow to represent an arbitrary Python object.

PyYAML is applicable for a broad range of tasks from complex configuration files to object serialization and persistance.

[root@test dataload]# tar -zxvf PyYAML-3.10.tar.gz
[root@test PyYAML-3.10]# python setup.py install

3.yaml-0.1.4配置

下载源代码包:http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz。编译和安装LibYAML

[root@test dataload]# tar -zxvf yaml-0.1.4.tar.gz
[root@test dataload]# cd yaml-0.1.4
[root@test yaml-0.1.4]# ./configure
[root@test yaml-0.1.4]# make && make install

4.gpload.gpfdist工具配置

下载:greenplum-loaders-4.2.1.0-build-2-RHEL5-x86_64.zip

[root@test dataload]# ./greenplum-loaders-4.2.1.0-build-2-RHEL5-x86_64.bin 

********************************************************************************    Do you accept the Greenplum Loaders license agreement? [yes | no]******************************************************************************** 

选择:yes 到安装完成

修改greenplum_loaders_path.sh中GPHOME_LOADERS改为你安装的路径

.bash_profile中添加环境GP变量:

export PGDATABASE=gptestexport PGHOST=127.0.0.1export PGPORT=5432export PGUSER=gpadminexport PGPASSWORD=gpadmin

source  ~/.bash_profile

5.编写数据入库yaml控制文件

[root@test bin]# more gpload.yml ---VERSION: 1.0.0.1DATABASE: gptestUSER: gpadminHOST: 127.0.0.1PORT: 5432GPLOAD:  INPUT:    - SOURCE:        LOCAL_HOSTNAME:          - test        PORT: 55555        FILE:          - /home/tmp/test1    - COLUMNS:        - id: int        - name: text        - aa: text        - time: timestamp without time zone        - bb: text        - cc: text        - dd: int        - ee: int        - ff: text        - gg: text        - hh: text        - ii: text        - jj: text        - kk: text        - ll: text    - FORMAT: text    - DELIMITER: ','    - ERROR_LIMIT: 25  OUTPUT:    - TABLE: test_gpload    - MODE: INSERT
注:COLUMNS中字段应与数据库中表字段及数据类型匹配

6.执行gpload

[root@test bin]# gpload -f gpload.yml2012-10-30 00:06:42|INFO|gpload session started 2012-10-30 00:06:422012-10-30 00:06:43|INFO|started gpfdist -p 55555 -P 55556 -f "/home/tmp/test" -t 302012-10-30 00:06:50|INFO|running time: 7.92 seconds2012-10-30 00:06:50|INFO|rows Inserted          = 2050922012-10-30 00:06:50|INFO|rows Updated           = 02012-10-30 00:06:50|INFO|data formatting errors = 02012-10-30 00:06:50|INFO|gpload succeeded[root@test bin]# 







原创粉丝点击