158 159
159 160
160 161
161 162
162 163
163 164
164 165
165 166
166 167
167 168
168 169
169 170
170 171
171 172
172 173
173 174
174 175
175 176
176 177
177 178
178 179
179 180
180 181
181 182
182 183
183 184
184 185
185 187
186 188
187 189
188 190
189 191
190 192
191 193
192 194
193 195
194 196
195 197
196 198
197 199
198 200
199 201
200 202
201 203
202 204
203 205
204 206
205 207
206 208
207 209
208 210
209 211
210 212
211 213
212 214
213 215
214 216
215 217
216 218
217 219
218 220
219 221
220 222
221 223
222 224
223 225
224 226
225 227
226 228
227 229
228 230
229 231
230 232
231 233
232 234
233 235
234 236
235 237
236 238
237 239
238 240
239 241
240 242
241 243
242 244
243 245
244 246
245 247
246 248
247 249
248 250
249 251
250 252
251 253
252 254
253 255
254 256
255 257
256 258
257 259
258 260
259 261
260 262
261 263
262 264
263 265
264 266
265 267
266 268
267 269
268 270
269 271
270 272
271 273
272 274
273 275
274 276
275 277
276 278
277 279
278 280
279 281
280 282
281 283
282 284
283 285
284 286
285 287
286 288
287 289
288 290
289 291
290 292
291 293
292 294
293 295
294 296
295 297
296 298
297 299
298 300
299 301
300 302
301 303
302 304
303 305
304 306
305 307
306 308
307 309
308 310
309 311
310 312
311 313
312 314
313 315
314 316
315 317
316 318
317 319
318 320
319 321
320 322
321 323
322 325
323 326
324 327
325 328
326 329
327 330
328 331
329 332
330 333
331 334
332 335
333 336
334 337
335 338
336 339
337 340
338 341
339 342
340 343
341 344
342 345
343 346
344 347
345 348
346 349
347 350
348 351
349 352
350 353
351 354
352 355
353 356
354 357
355 358
356 359
357 360
358 361
359 362
360 363
361 364
362 365
363 366
364 367
365 369
366 370
367 371
In December 1995, I wrote an article for Database Programming & Design magazine entitled “I Want a Data Warehouse, So What Is It Again?” A few months later, I began writing Data Warehousing For Dummies (Wiley), building on the article’s content to help readers make sense of first-generation data warehousing.
Fast-forward a quarter of a century, and I could very easily write an article entitled “I Want a Data Lake, So What Is It Again?” This time, I’m cutting right to the chase with Data Lakes For Dummies. To quote a famous former baseball player named Yogi Berra, it’s déjà vu all over again!
Nearly every large and upper-midsize company and governmental agency is building a data lake or at least has an initiative on the drawing board. That’s the good news.
The not-so-good news, though, is that you’ll find a disturbing lack of agreement about data lake architecture, best practices for data lake development, data lake internal data flows, even what a data lake actually is! In fact, many first-generation data lakes have fallen short of original expectations and need to be rearchitected and rebuilt.
As with data warehousing in the mid-’90s, the data lake concept today is still a relatively new one. Consequently, almost everything about data lakes — from its very definition to alternatives for integration with or migration from existing data warehouses — is still very much a moving target. Software product vendors, cloud service providers, consulting firms, industry analysts, and academics often have varying — and sometimes conflicting — perspectives on data lakes. So, how do you navigate your way across a data lake when the waters are especially choppy and you’re being tossed from side to side?
That’s where Data Lakes For Dummies comes in.
Data Lakes For Dummies helps you make sense of the ABCs — acronym anarchy, buzzword bingo, and consulting confusion — of today’s and tomorrow’s data lakes.
This book is not only a tutorial about data lakes; it also serves as a reference that you may find yourself consulting on a regular basis. So, you don’t need to memorize large blocks of content (there’s no final exam!) because you can always go back to take a second or third or fourth look at any particular point during your own data lake efforts.
Right from the start, you find out what your organization should expect from all the time, effort, and money you’ll put into your data lake initiative, as well as see what challenges are lurking. You’ll dig deep into data lake architecture and leading cloud platforms and get your arms around the big picture of how all the pieces fit together.
One of the disadvantages of being an early adopter of any new technology is that you sometimes make mistakes or at least have a few false starts. Plenty of early data lake efforts have turned into more of a data dump, with tons of data that just isn’t very accessible or well organized. If you find yourself in this situation, fear not: You’ll see how to turn that data dump into the data lake you originally envisioned.
I don’t use many special conventions in this book, but you should be aware that sidebars (the gray boxes you see throughout the book) and anything marked with the Technical Stuff icon are all skippable. So, if you’re short on time, you can pass over these pieces without losing anything essential. On the other hand, if you have the time, you’re sure to find fascinating information here!
Within this book, you may note that some web addresses break across two lines of text. If you’re reading this book in print and want to visit one of these web pages, simply key in the web address exactly as it’s noted in the text, pretending as though the line break doesn’t exist. If you’re reading this as an e-book, you’ve got it easy — just click the web address to be taken directly to the web page.
The most relevant assumption I’ve made is that if you’re reading this book, you either are or will soon be working on a data lake initiative.
Maybe you’re a data strategist and architect, and what’s most important to you is sifting through mountains of sometimes conflicting — and often incomplete — information about data lakes. Your organization already makes use of earlier-generation data warehouses and data marts, and now it’s time to take that all-important next step to a data lake. If that’s the case, you’re definitely in the right place.
If you’re a developer or data architect who is working on a small subset of the overall data lake, your primary focus is how a particular software package or service works. Still, you’re curious about where your daily work fits into your organization’s overall data lake efforts. That’s where this book comes in: to provide context and that “aha!” factor to the big picture that surrounds your day-to-day tasks.
Читать дальше