Review Board 1.7.22


SQOOP-1155: Sqoop 2 documentation for connector development

Review Request #13089 - Created July 30, 2013 and updated

Masatake Iwasaki
sqoop2
SQOOP-1155
Reviewers
Sqoop
sqoop-sqoop2
documentation for connector development
checked compiled form by sphinx.
docs/src/site/sphinx/ConnectorDevelopment.rst
New File

    
   
1
.. Licensed to the Apache Software Foundation (ASF) under one or more

    
   
2
   contributor license agreements.  See the NOTICE file distributed with

    
   
3
   this work for additional information regarding copyright ownership.

    
   
4
   The ASF licenses this file to You under the Apache License, Version 2.0

    
   
5
   (the "License"); you may not use this file except in compliance with

    
   
6
   the License.  You may obtain a copy of the License at

    
   
7

   

    
   
8
       http://www.apache.org/licenses/LICENSE-2.0

    
   
9

   

    
   
10
   Unless required by applicable law or agreed to in writing, software

    
   
11
   distributed under the License is distributed on an "AS IS" BASIS,

    
   
12
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

    
   
13
   See the License for the specific language governing permissions and

    
   
14
   limitations under the License.

    
   
15

   

    
   
16

   

    
   
17
=============================

    
   
18
Sqoop 2 Connector Development

    
   
19
=============================

    
   
20

   

    
   
21
This document describes you how to implement connector for Sqoop 2.

    
   
22

   

    
   
23

   

    
   
24
What is Connector?

    
   
25
++++++++++++++++++

    
   
26

   

    
   
27
Connector provides interaction with external databases.

    
   
28
Connector reads data from databases for import,

    
   
29
and write data to databases for export.

    
   
30
Interaction with Hadoop is taken cared by common modules of Sqoop 2 framework.

    
   
31

   

    
   
32

   

    
   
33
Connector Implementation

    
   
34
++++++++++++++++++++++++

    
   
35

   

    
   
36
The SqoopConnector class defines functionality

    
   
37
which must be provided by Connectors.

    
   
38
Each Connector must extends SqoopConnector and overrides methods shown below.

    
   
39
::

    
   
40

   

    
   
41
  public abstract String getVersion();

    
   
42
  public abstract ResourceBundle getBundle(Locale locale);

    
   
43
  public abstract Class getConnectionConfigurationClass();

    
   
44
  public abstract Class getJobConfigurationClass(MJob.Type jobType);

    
   
45
  public abstract Importer getImporter();

    
   
46
  public abstract Exporter getExporter();

    
   
47
  public abstract Validator getValidator();

    
   
48
  public abstract MetadataUpgrader getMetadataUpgrader();

    
   
49

   

    
   
50
The getImporter method returns Importer_ instance

    
   
51
which is a placeholder for the modules needed for import.

    
   
52

   

    
   
53
The getExporter method returns Exporter_ instance

    
   
54
which is a placeholder for the modules needed for export.

    
   
55

   

    
   
56
Methods such as getBundle, getConnectionConfigurationClass,

    
   
57
getJobConfigurationClass and getValidator

    
   
58
are concerned to `Connector configurations`_ .

    
   
59

   

    
   
60

   

    
   
61
Importer

    
   
62
========

    
   
63

   

    
   
64
Connector#getImporter method returns Importer instance

    
   
65
which is a placeholder for the modules needed for import

    
   
66
such as Partitioner_ and Extractor_ .

    
   
67
Built-in GenericJdbcConnector defines Importer like this.

    
   
68
::

    
   
69

   

    
   
70
  private static final Importer IMPORTER = new Importer(

    
   
71
      GenericJdbcImportInitializer.class,

    
   
72
      GenericJdbcImportPartitioner.class,

    
   
73
      GenericJdbcImportExtractor.class,

    
   
74
      GenericJdbcImportDestroyer.class);

    
   
75
  

    
   
76
  ...

    
   
77
  

    
   
78
  @Override

    
   
79
  public Importer getImporter() {

    
   
80
    return IMPORTER;

    
   
81
  }

    
   
82

   

    
   
83

   

    
   
84
Extractor

    
   
85
---------

    
   
86

   

    
   
87
Extractor (E for ETL) extracts data from external database and

    
   
88
writes it to Sqoop framework for import.

    
   
89

   

    
   
90
Extractor must overrides extract method.

    
   
91
::

    
   
92

   

    
   
93
  public abstract void extract(ExtractorContext context,

    
   
94
                               ConnectionConfiguration connectionConfiguration,

    
   
95
                               JobConfiguration jobConfiguration,

    
   
96
                               Partition partition);

    
   
97

   

    
   
98
The extract method extracts data from database in some way and

    
   
99
writes it to DataWriter (provided by context) as `Intermediate representation`_ .

    
   
100

   

    
   
101
Extractor must iterates in the extract method until the data from database exhausts.

    
   
102
::

    
   
103

   

    
   
104
  while (resultSet.next()) {

    
   
105
    ...

    
   
106
    context.getDataWriter().writeArrayRecord(array);

    
   
107
    ...

    
   
108
  }

    
   
109

   

    
   
110

   

    
   
111
Partitioner

    
   
112
-----------

    
   
113

   

    
   
114
Partitioner creates Partition instances based on configurations.

    
   
115
The number of Partition instances is interpreted as the number of map tasks.

    
   
116
Partition instances are passed to Extractor_ as the argument of extract method.

    
   
117
Extractor_ determines which portion of the data to extract by Partition.

    
   
118

   

    
   
119
There is no actual convention for Partition classes

    
   
120
other than being actually Writable and toString()-able.

    
   
121
::

    
   
122

   

    
   
123
  public abstract class Partition {

    
   
124
    public abstract void readFields(DataInput in) throws IOException;

    
   
125
    public abstract void write(DataOutput out) throws IOException;

    
   
126
    public abstract String toString();

    
   
127
  }

    
   
128

   

    
   
129
Connectors can define the design of Partition on their own.

    
   
130

   

    
   
131

   

    
   
132
Initializer and Destroyer

    
   
133
-------------------------

    
   
134

   

    
   
135
Initializer is instantiated before the submission of MapReduce job

    
   
136
for doing preparation such as adding dependent jar files.

    
   
137

   

    
   
138
Destroyer is instantiated after MapReduce job is finished for clean up.

    
   
139

   

    
   
140

   

    
   
141
Exporter

    
   
142
========

    
   
143

   

    
   
144
Connector#getExporter method returns Exporter instance

    
   
145
which is a placeholder for the modules needed for export

    
   
146
such as Loader_ .

    
   
147
Built-in GenericJdbcConnector defines Exporter like this.

    
   
148
::

    
   
149

   

    
   
150
  private static final Exporter EXPORTER = new Exporter(

    
   
151
      GenericJdbcExportInitializer.class,

    
   
152
      GenericJdbcExportLoader.class,

    
   
153
      GenericJdbcExportDestroyer.class);

    
   
154
  

    
   
155
  ...

    
   
156
  

    
   
157
  @Override

    
   
158
  public Exporter getExporter() {

    
   
159
    return EXPORTER;

    
   
160
  }

    
   
161

   

    
   
162

   

    
   
163
Loader

    
   
164
------

    
   
165

   

    
   
166
Loader (L for ETL) receives data from Sqoop framework and

    
   
167
loads it to external database.

    
   
168

   

    
   
169
Loader must overrides load method.

    
   
170
::

    
   
171

   

    
   
172
  public abstract void load(LoaderContext context,

    
   
173
                            ConnectionConfiguration connectionConfiguration,

    
   
174
                            JobConfiguration jobConfiguration) throws Exception;

    
   
175

   

    
   
176
The load method reads data from DataReader (provided by context)

    
   
177
in `Intermediate representation`_ and loads it to database in some way.

    
   
178

   

    
   
179
Loader must iterates in the load method until the data from DataReader exhausts.

    
   
180
::

    
   
181

   

    
   
182
  while ((array = context.getDataReader().readArrayRecord()) != null) {

    
   
183
    ...

    
   
184
  }

    
   
185

   

    
   
186

   

    
   
187
Initializer and Destroyer

    
   
188
-------------------------

    
   
189

   

    
   
190
Initializer is instantiated before the submission of MapReduce job

    
   
191
for doing preparation such as adding dependent jar files.

    
   
192

   

    
   
193
Destroyer is instantiated after MapReduce job is finished for clean up.

    
   
194

   

    
   
195

   

    
   
196
Connector Configurations

    
   
197
++++++++++++++++++++++++

    
   
198

   

    
   
199
Configurations

    
   
200
==============

    
   
201

   

    
   
202
The definition of the configurations are represented

    
   
203
by models defined in org.apache.sqoop.model package.

    
   
204

   

    
   
205

   

    
   
206
ConnectionConfigurationClass

    
   
207
----------------------------

    
   
208

   

    
   
209

   

    
   
210
JobConfigurationClass

    
   
211
---------------------

    
   
212

   

    
   
213

   

    
   
214
ResourceBundle

    
   
215
==============

    
   
216

   

    
   
217
Resources for Configurations_ are stored in properties file

    
   
218
accessed by getBundle method of the Connector.

    
   
219

   

    
   
220

   

    
   
221
Validator

    
   
222
=========

    
   
223

   

    
   
224
Validator validates configurations set by users.

    
   
225

   

    
   
226

   

    
   
227
Internal of Sqoop2 MapReduce Job

    
   
228
++++++++++++++++++++++++++++++++

    
   
229

   

    
   
230
Sqoop 2 provides common MapReduce modules such as SqoopMapper and SqoopReducer

    
   
231
for the both of import and export.

    
   
232

   

    
   
233
- InputFormat create splits using Partitioner.

    
   
234

   

    
   
235
- SqoopMapper invokes Extractor's extract method.

    
   
236

   

    
   
237
- SqoopReducer do no actual works.

    
   
238

   

    
   
239
- OutputFormat invokes Loader's load method (via SqoopOutputFormatLoadExecutor).

    
   
240

   

    
   
241
.. todo: sequence diagram like figure.

    
   
242

   

    
   
243
For import, Extractor provided by Connector extracts data from databases,

    
   
244
and Loader provided by Sqoop2 loads data into Hadoop.

    
   
245

   

    
   
246
For export, Extractor provided Sqoop2 exracts data from Hadoop,

    
   
247
and Loader provided by Connector loads data into databases.

    
   
248

   

    
   
249

   

    
   
250
.. _`Intermediate representation`: https://cwiki.apache.org/confluence/display/SQOOP/Sqoop2+Intermediate+representation
docs/src/site/sphinx/index.rst
Revision 15ddfbb New Change
 
  1. docs/src/site/sphinx/ConnectorDevelopment.rst: Loading...
  2. docs/src/site/sphinx/index.rst: Loading...