Review Board 1.7.22


FLUME-2030 - Documentation of Configuration Changes JMSSource, HBaseSink, AsyncHBaseSink and ElasticSearchSink

Review Request #10817 - Created April 28, 2013 and updated

Israel Ekpo
flume-1.4
FLUME-1886, FLUME-1889, FLUME-1994, FLUME-2030
Reviewers
Flume
flume-git
FLUME-2030 - Documentation of Configuration Changes JMSSource, HBaseSink, AsyncHBaseSink and ElasticSearchSink

- Updated user guide to illustrate the replacement of FQCNs with enum constants (lowercased) for built-in sources and sinks.
- Added System Requirements needed to run Flume.
- Added documentation to encourage users to migrate to 1.x from 0.9.x so as to take advantage of improvements available in Flume NG
- Added documentation to inform users that Apache Flume is not limited only to log data aggregation.
N/A

Diff revision 1 (Latest)

  1. flume-ng-doc/sphinx/FlumeUserGuide.rst: Loading...
flume-ng-doc/sphinx/FlumeUserGuide.rst
Revision 38f2205 New Change
[20] 25 lines
[+20]
26

    
   
26

   
27
Apache Flume is a distributed, reliable, and available system for efficiently
27
Apache Flume is a distributed, reliable, and available system for efficiently
28
collecting, aggregating and moving large amounts of log data from many
28
collecting, aggregating and moving large amounts of log data from many
29
different sources to a centralized data store.
29
different sources to a centralized data store.
30

    
   
30

   

    
   
31
The use of Apache Flume is not only restricted to log data aggregation. 

    
   
32
Since data sources are customizable, Flume can be used to transport massive quantities

    
   
33
of event data including but not limited to network traffic data, social-media-generated data, 

    
   
34
email messages and pretty much any data source possible.

    
   
35

   
31
Apache Flume is a top level project at the Apache Software Foundation.
36
Apache Flume is a top level project at the Apache Software Foundation.

    
   
37

   
32
There are currently two release code lines available, versions 0.9.x and 1.x.
38
There are currently two release code lines available, versions 0.9.x and 1.x.
33
This documentation applies to the 1.x codeline.
39

   
34
Please click here for
40
Documentation for the 0.9.x track is available at 
35
`the Flume 0.9.x User Guide <http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_.
41
`the Flume 0.9.x User Guide <http://archive.cloudera.com/cdh/3/flume/UserGuide/>`_.
36

    
   
42

   

    
   
43
This documentation applies to the 1.4.x track.

    
   
44

   

    
   
45
New and existing users are encouraged to use the 1.x releases so as to 

    
   
46
leverage the performance improvements and configuration flexibilities available 

    
   
47
in the latest architecture.

    
   
48

   

    
   
49

   
37
System Requirements
50
System Requirements
38
-------------------
51
-------------------
39

    
   
52

   
40
TBD
53
#. Java Runtime Environment - Java 1.6 or later (Java 1.7 Recommended)

    
   
54
#. Memory - Sufficient memory for configurations used by sources, channels or sinks

    
   
55
#. Disk Space - Sufficient disk space for configurations used by channels or sinks

    
   
56
#. Directory Permissions - Read/Write permissions for directories used by agent
41

    
   
57

   
42
Architecture
58
Architecture
43
------------
59
------------
44

    
   
60

   
45
Data flow model
61
Data flow model
[+20] [20] 1118 lines
[+20]
1164

    
   
1180

   
1165
.. code-block:: properties
1181
.. code-block:: properties
1166

    
   
1182

   
1167
  a1.sources = r1
1183
  a1.sources = r1
1168
  a1.channels = c1
1184
  a1.channels = c1
1169
  a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
1185
  a1.sources.r1.type = http
1170
  a1.sources.r1.port = 5140
1186
  a1.sources.r1.port = 5140
1171
  a1.sources.r1.channels = c1
1187
  a1.sources.r1.channels = c1
1172
  a1.sources.r1.handler = org.example.rest.RestHandler
1188
  a1.sources.r1.handler = org.example.rest.RestHandler
1173
  a1.sources.r1.handler.nickname = random props
1189
  a1.sources.r1.handler.nickname = random props
1174

    
   
1190

   
[+20] [20] 499 lines
[+20]
1674

    
   
1690

   
1675
==================  ======================================================  ==============================================================================
1691
==================  ======================================================  ==============================================================================
1676
Property Name       Default                                                 Description
1692
Property Name       Default                                                 Description
1677
==================  ======================================================  ==============================================================================
1693
==================  ======================================================  ==============================================================================
1678
**channel**         --
1694
**channel**         --
1679
**type**            --                                                      The component type name, needs to be ``org.apache.flume.sink.hbase.HBaseSink``
1695
**type**            --                                                      The component type name, needs to be ``hbase``
1680
**table**           --                                                      The name of the table in Hbase to write to.
1696
**table**           --                                                      The name of the table in Hbase to write to.
1681
**columnFamily**    --                                                      The column family in Hbase to write to.
1697
**columnFamily**    --                                                      The column family in Hbase to write to.
1682
batchSize           100                                                     Number of events to be written per txn.
1698
batchSize           100                                                     Number of events to be written per txn.
1683
serializer          org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
1699
serializer          org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
1684
serializer.*        --                                                      Properties to be passed to the serializer.
1700
serializer.*        --                                                      Properties to be passed to the serializer.
[+20] [20] 5 lines
[+20]
1690

    
   
1706

   
1691
.. code-block:: properties
1707
.. code-block:: properties
1692

    
   
1708

   
1693
  a1.channels = c1
1709
  a1.channels = c1
1694
  a1.sinks = k1
1710
  a1.sinks = k1
1695
  a1.sinks.k1.type = org.apache.flume.sink.hbase.HBaseSink
1711
  a1.sinks.k1.type = hbase
1696
  a1.sinks.k1.table = foo_table
1712
  a1.sinks.k1.table = foo_table
1697
  a1.sinks.k1.columnFamily = bar_cf
1713
  a1.sinks.k1.columnFamily = bar_cf
1698
  a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
1714
  a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
1699
  a1.sinks.k1.channel = c1
1715
  a1.sinks.k1.channel = c1
1700

    
   
1716

   
[+20] [20] 12 lines
[+20]
1713

    
   
1729

   
1714
================  ============================================================  ====================================================================================
1730
================  ============================================================  ====================================================================================
1715
Property Name     Default                                                       Description
1731
Property Name     Default                                                       Description
1716
================  ============================================================  ====================================================================================
1732
================  ============================================================  ====================================================================================
1717
**channel**       --
1733
**channel**       --
1718
**type**          --                                                            The component type name, needs to be ``org.apache.flume.sink.hbase.AsyncHBaseSink``
1734
**type**          --                                                            The component type name, needs to be ``asynchbase``
1719
**table**         --                                                            The name of the table in Hbase to write to.
1735
**table**         --                                                            The name of the table in Hbase to write to.
1720
zookeeperQuorum   --                                                            The quorum spec. This is the value for the property ``hbase.zookeeper.quorum`` in hbase-site.xml
1736
zookeeperQuorum   --                                                            The quorum spec. This is the value for the property ``hbase.zookeeper.quorum`` in hbase-site.xml
1721
znodeParent       /hbase                                                        The base path for the znode for the -ROOT- region. Value of ``zookeeper.znode.parent`` in hbase-site.xml
1737
znodeParent       /hbase                                                        The base path for the znode for the -ROOT- region. Value of ``zookeeper.znode.parent`` in hbase-site.xml
1722
**columnFamily**  --                                                            The column family in Hbase to write to.
1738
**columnFamily**  --                                                            The column family in Hbase to write to.
1723
batchSize         100                                                           Number of events to be written per txn.
1739
batchSize         100                                                           Number of events to be written per txn.
[+20] [20] 15 lines
[+20]
1739

    
   
1755

   
1740
.. code-block:: properties
1756
.. code-block:: properties
1741

    
   
1757

   
1742
  a1.channels = c1
1758
  a1.channels = c1
1743
  a1.sinks = k1
1759
  a1.sinks = k1
1744
  a1.sinks.k1.type = org.apache.flume.sink.hbase.AsyncHBaseSink
1760
  a1.sinks.k1.type = asynchbase
1745
  a1.sinks.k1.table = foo_table
1761
  a1.sinks.k1.table = foo_table
1746
  a1.sinks.k1.columnFamily = bar_cf
1762
  a1.sinks.k1.columnFamily = bar_cf
1747
  a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
1763
  a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
1748
  a1.sinks.k1.channel = c1
1764
  a1.sinks.k1.channel = c1
1749

    
   
1765

   
[+20] [20] 10 lines
[+20]
1760

    
   
1776

   
1761
================  ==================================================================  =======================================================================================================
1777
================  ==================================================================  =======================================================================================================
1762
Property Name     Default                                                             Description
1778
Property Name     Default                                                             Description
1763
================  ==================================================================  =======================================================================================================
1779
================  ==================================================================  =======================================================================================================
1764
**channel**       --
1780
**channel**       --
1765
**type**          --                                                                  The component type name, needs to be ``org.apache.flume.sink.elasticsearch.ElasticSearchSink``
1781
**type**          --                                                                  The component type name, needs to be ``elasticsearch``
1766
**hostNames**     --                                                                  Comma separated list of hostname:port, if the port is not present the default port '9300' will be used
1782
**hostNames**     --                                                                  Comma separated list of hostname:port, if the port is not present the default port '9300' will be used
1767
indexName         flume                                                               The name of the index which the date will be appended to. Example 'flume' -> 'flume-yyyy-MM-dd'
1783
indexName         flume                                                               The name of the index which the date will be appended to. Example 'flume' -> 'flume-yyyy-MM-dd'
1768
indexType         logs                                                                The type to index the document to, defaults to 'log'
1784
indexType         logs                                                                The type to index the document to, defaults to 'log'
1769
clusterName       elasticsearch                                                       Name of the ElasticSearch cluster to connect to
1785
clusterName       elasticsearch                                                       Name of the ElasticSearch cluster to connect to
1770
batchSize         100                                                                 Number of events to be written per txn.
1786
batchSize         100                                                                 Number of events to be written per txn.
[+20] [20] 7 lines
[+20]
1778

    
   
1794

   
1779
.. code-block:: properties
1795
.. code-block:: properties
1780

    
   
1796

   
1781
  a1.channels = c1
1797
  a1.channels = c1
1782
  a1.sinks = k1
1798
  a1.sinks = k1
1783
  a1.sinks.k1.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink
1799
  a1.sinks.k1.type = elasticsearch
1784
  a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
1800
  a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
1785
  a1.sinks.k1.indexName = foo_index
1801
  a1.sinks.k1.indexName = foo_index
1786
  a1.sinks.k1.indexType = bar_type
1802
  a1.sinks.k1.indexType = bar_type
1787
  a1.sinks.k1.clusterName = foobar_cluster
1803
  a1.sinks.k1.clusterName = foobar_cluster
1788
  a1.sinks.k1.batchSize = 500
1804
  a1.sinks.k1.batchSize = 500
[+20] [20] 1173 lines
[+20]
2962
org.apache.flume.Source                                       seq                     org.apache.flume.source.SequenceGeneratorSource
2978
org.apache.flume.Source                                       seq                     org.apache.flume.source.SequenceGeneratorSource
2963
org.apache.flume.Source                                       exec                    org.apache.flume.source.ExecSource
2979
org.apache.flume.Source                                       exec                    org.apache.flume.source.ExecSource
2964
org.apache.flume.Source                                       syslogtcp               org.apache.flume.source.SyslogTcpSource
2980
org.apache.flume.Source                                       syslogtcp               org.apache.flume.source.SyslogTcpSource
2965
org.apache.flume.Source                                       multiport_syslogtcp     org.apache.flume.source.MultiportSyslogTCPSource
2981
org.apache.flume.Source                                       multiport_syslogtcp     org.apache.flume.source.MultiportSyslogTCPSource
2966
org.apache.flume.Source                                       syslogudp               org.apache.flume.source.SyslogUDPSource
2982
org.apache.flume.Source                                       syslogudp               org.apache.flume.source.SyslogUDPSource

    
   
2983
org.apache.flume.Source                                       spooldir                org.apache.flume.source.SpoolDirectorySource

    
   
2984
org.apache.flume.Source                                       http                    org.apache.flume.source.http.HTTPSource

    
   
2985
org.apache.flume.Source                                       thrift                  org.apache.flume.source.ThriftSource

    
   
2986
org.apache.flume.Source                                       jms                     org.apache.flume.source.jms.JMSSource
2967
org.apache.flume.Source                                       --                      org.apache.flume.source.avroLegacy.AvroLegacySource
2987
org.apache.flume.Source                                       --                      org.apache.flume.source.avroLegacy.AvroLegacySource
2968
org.apache.flume.Source                                       --                      org.apache.flume.source.thriftLegacy.ThriftLegacySource
2988
org.apache.flume.Source                                       --                      org.apache.flume.source.thriftLegacy.ThriftLegacySource
2969
org.apache.flume.Source                                       --                      org.example.MySource
2989
org.apache.flume.Source                                       --                      org.example.MySource
2970

    
   
2990

   
2971
org.apache.flume.Sink                                         null                    org.apache.flume.sink.NullSink
2991
org.apache.flume.Sink                                         null                    org.apache.flume.sink.NullSink
2972
org.apache.flume.Sink                                         logger                  org.apache.flume.sink.LoggerSink
2992
org.apache.flume.Sink                                         logger                  org.apache.flume.sink.LoggerSink
2973
org.apache.flume.Sink                                         avro                    org.apache.flume.sink.AvroSink
2993
org.apache.flume.Sink                                         avro                    org.apache.flume.sink.AvroSink
2974
org.apache.flume.Sink                                         hdfs                    org.apache.flume.sink.hdfs.HDFSEventSink
2994
org.apache.flume.Sink                                         hdfs                    org.apache.flume.sink.hdfs.HDFSEventSink
2975
org.apache.flume.Sink                                         --                      org.apache.flume.sink.hbase.HBaseSink
2995
org.apache.flume.Sink                                         hbase                   org.apache.flume.sink.hbase.HBaseSink
2976
org.apache.flume.Sink                                         --                      org.apache.flume.sink.hbase.AsyncHBaseSink
2996
org.apache.flume.Sink                                         asynchbase              org.apache.flume.sink.hbase.AsyncHBaseSink
2977
org.apache.flume.Sink                                         --                      org.apache.flume.sink.elasticsearch.ElasticSearchSink
2997
org.apache.flume.Sink                                         elasticsearch           org.apache.flume.sink.elasticsearch.ElasticSearchSink
2978
org.apache.flume.Sink                                         file_roll               org.apache.flume.sink.RollingFileSink
2998
org.apache.flume.Sink                                         file_roll               org.apache.flume.sink.RollingFileSink
2979
org.apache.flume.Sink                                         irc                     org.apache.flume.sink.irc.IRCSink
2999
org.apache.flume.Sink                                         irc                     org.apache.flume.sink.irc.IRCSink

    
   
3000
org.apache.flume.Sink                                         thrift                  org.apache.flume.sink.ThriftSink
2980
org.apache.flume.Sink                                         --                      org.example.MySink
3001
org.apache.flume.Sink                                         --                      org.example.MySink
2981

    
   
3002

   
2982
org.apache.flume.ChannelSelector                              replicating             org.apache.flume.channel.ReplicatingChannelSelector
3003
org.apache.flume.ChannelSelector                              replicating             org.apache.flume.channel.ReplicatingChannelSelector
2983
org.apache.flume.ChannelSelector                              multiplexing            org.apache.flume.channel.MultiplexingChannelSelector
3004
org.apache.flume.ChannelSelector                              multiplexing            org.apache.flume.channel.MultiplexingChannelSelector
2984
org.apache.flume.ChannelSelector                              --                      org.example.MyChannelSelector
3005
org.apache.flume.ChannelSelector                              --                      org.example.MyChannelSelector
[+20] [20] 41 lines
  1. flume-ng-doc/sphinx/FlumeUserGuide.rst: Loading...