Review Board 1.7.22


SQOOP-683 Documenting sqoop.mysql.export.sleep.ms - easy throttling feature for direct MySQL exports

Review Request #7880 - Created Nov. 5, 2012 and updated

Zoltán Tóth-Czifra
Reviewers
Sqoop
sqoop-trunk
Code review for SQOOP-683, see https://issues.apache.org/jira/browse/SQOOP-683.
Converted to XML with asciidoc, the affected part:

<simpara>Sometimes you need to export large data with Sqoop to a live MySQL cluster that
is under a high load serving random queries from the users of our product.
While data consistency issues during the export can be easily solved with a
staging table, there is still a problem: the performance impact caused by the
heavy export.</simpara>
<simpara>First off, the resources of MySQL dedicated to the import process can affect
the performance of the live product, both on the master and on the slaves.
Second, even if the servers can handle the import with no significant
performance impact (mysqlimport should be relatively "cheap"), importing big
tables can cause serious replication lag in the cluster risking data
inconsistency.</simpara>
<simpara>With <literal>-D sqoop.mysql.export.sleep.ms=time</literal>, where <emphasis>time</emphasis> is a value in
milliseconds, you can let the server relax between checkpoints and the replicas
catch up by pausing the export process after transferring the number of bytes
specified in <literal>sqoop.mysql.export.checkpoint.bytes</literal>. Experiment with different
settings of these two parameters to archieve an export pace that doesn&#8217;t
endanger the stability of your MySQL cluster.</simpara>
<important><simpara>Note that any arguments to Sqoop that are of the form <literal>-D
parameter=value</literal> are Hadoop <emphasis>generic arguments</emphasis> and must appear before
any tool-specific arguments (for example, <literal>--connect</literal>, <literal>--table</literal>, etc).
Don&#8217;t forget that these parameters only work with the <literal>--direct</literal> flag set.</simpara></important>

Diff revision 3 (Latest)

1 2 3
1 2 3

  1. src/docs/user/compatibility.txt: Loading...
src/docs/user/compatibility.txt
Revision 3576fd7 New Change
[20] 135 lines
[+20]
136
sqoop.mysql.export.checkpoint.bytes=size+, where _size_ is a value in
136
sqoop.mysql.export.checkpoint.bytes=size+, where _size_ is a value in
137
bytes. Set _size_ to 0 to disable intermediate checkpoints,
137
bytes. Set _size_ to 0 to disable intermediate checkpoints,
138
but individual files being exported will continue to be committed
138
but individual files being exported will continue to be committed
139
independently of one another.
139
independently of one another.
140

    
   
140

   

    
   
141
Sometimes you need to export large data with Sqoop to a live MySQL cluster that

    
   
142
is under a high load serving random queries from the users of your application.

    
   
143
While data consistency issues during the export can be easily solved with a

    
   
144
staging table, there is still a problem with the performance impact caused by

    
   
145
the heavy export.

    
   
146

   

    
   
147
First off, the resources of MySQL dedicated to the import process can affect

    
   
148
the performance of the live product, both on the master and on the slaves.

    
   
149
Second, even if the servers can handle the import with no significant

    
   
150
performance impact (mysqlimport should be relatively "cheap"), importing big

    
   
151
tables can cause serious replication lag in the cluster risking data

    
   
152
inconsistency.

    
   
153

   

    
   
154
With +-D sqoop.mysql.export.sleep.ms=time+, where _time_ is a value in

    
   
155
milliseconds, you can let the server relax between checkpoints and the replicas

    
   
156
catch up by pausing the export process after transferring the number of bytes

    
   
157
specified in +sqoop.mysql.export.checkpoint.bytes+. Experiment with different

    
   
158
settings of these two parameters to archieve an export pace that doesn't

    
   
159
endanger the stability of your MySQL cluster.

    
   
160

   
141
IMPORTANT: Note that any arguments to Sqoop that are of the form +-D
161
IMPORTANT: Note that any arguments to Sqoop that are of the form +-D
142
parameter=value+ are Hadoop _generic arguments_ and must appear before
162
parameter=value+ are Hadoop _generic arguments_ and must appear before
143
any tool-specific arguments (for example, +\--connect+, +\--table+, etc).
163
any tool-specific arguments (for example, +\--connect+, +\--table+, etc).

    
   
164
Don't forget that these parameters are only supported with the +\--direct+

    
   
165
flag set.
144

    
   
166

   
145
PostgreSQL
167
PostgreSQL
146
~~~~~~~~~~
168
~~~~~~~~~~
147

    
   
169

   
148
Sqoop supports JDBC-based connector for PostgreSQL: http://jdbc.postgresql.org/
170
Sqoop supports JDBC-based connector for PostgreSQL: http://jdbc.postgresql.org/
[+20] [20] 59 lines
  1. src/docs/user/compatibility.txt: Loading...