From 70 to 4 minutes: How I Performance-Tuned 3 Long-Running Transformations

Preface

Okay. This article is going to become very technical very quickly so here’s the “too long; didn’t read” (tl;dr) version:

Eliminate unnecessary database operations
Use READ BINARY instead of LOOP WHERE

Overview

Not too many Business Intelligence Architects can program proficiently in ABAP, so they rely on approaches such as using standard function modules and standard data operations to get the work done.

I get that. Most projects are started with fuzzy requirements with increasing complexity as it nears completion. A senior developer or architect in that position would most likely be focused on delivering a finished product that is aligned with what the business had requested.

Too often, I am asked to look into slow running data loads to uncover areas for improvement. Looking at a “finished” product yields a small benefit of reviewing things in hindsight. My programming background helps too.

Background

Recently, a semiconductor manufacturing company asked me to review 3 transformations that were taking an unusually long time to complete each night. Using the “Quick & Dirty” approach that I mentioned in my Run Faster SAP BW e-book (and further explained in the email course), I identified two no-no’s that is all-too prevalent in custom development for transformations.

[thrive_leads id=’384′]

If your nightly data load is taking more then 30 minutes, then see if the custom code in its transformation falls into what I have described below.

Eliminate unnecessary database operations

Database reads have a big impact on performance because of the time required to query and retrieve the data. A well-designed program (extraction, transformation, or otherwise) takes some precaution to minimize this operation.

So imagine my surprise when I found 3 function calls (for the function module DATE_TO_PERIOD_CONVERT) in the object routines. To put it another way, the program was accessing the underlying table T009B 150,000 times per data package. This particular data load required a little over 20 minutes to finish each night.

Adjusting the logic required that I implement an optimized version of this operation in the transformation end-routine. The result is 1 database read (fetching all records of the table T009B) and reducing the operation from 20 minutes to under 1 minute.

Use READ BINARY instead of LOOP WHERE

The next two transformations had large processing blocks in their end-routines. From experience, I’ve seen performance degradation issues when using the LOOP WHERE clause to access large internal tables.

Let’s say that your DTP processes 50,000 records per data package. During the end-routine, your logic needs to process an internal table that contains 200,000 records. For each of the 50,000 record, you may need to access the 200,000 records sequentially. In the worst case scenario, you’re looking at 10 billion operations (10,000,000,000) per data package. That’s just crazy!

Binary reads are much better than sequential LOOP WHERE operations. Wikipedia explains it well. “A binary search halves the number of items to check with each iteration, so locating an item (or determining its absence) takes logarithmic time.” You can read more about the binary search algorithm here.

So how do you replace a LOOP WHERE with a READ BINARY operation?

Assuming that you’re already processing the RESULT_PACKAGE into the field symbol <f_internal_table>, your LOOP WHERE may look like this …

LOOP AT t_internal_table ASSIGNING <f_internal_table> 
  WHERE field = <f_results>-field.
ENDLOOP.

While the above statement is simple enough, the performance hit is killer. To rewrite the logic using a READY BINARY approach, some care is needed but the results speak for themselves.

Here’s the “pseudo” code. Try it out, but don’t simply lift and paste. I wrote this bit in an online HTML editor after all. 🙂

CLEAR l_index.
SORT t_internal_table BY field ASCENDING.

//find the first match
READ t_internal_table ASSIGNING <f_internal_table>
  WITH KEY field = <f_results>-field
  BINARY SEARCH.
IF SY-SUBRC = 0.
  l_index = SY-TABIX.
ELSE. 
  CLEAR l_index.
ENDIF.

//process matches
WHILE l_index IS NOT INITIAL 
  AND <f_internal_table>=field = <f_results>-field. 

  //perform operation 
  ...

  //increment index
  l_index = l_index + 1.
ENDWHILE.

Rewriting the LOOP WHERE statement may be a little involved but the performance gains were clear. Both of the transformations (which required 20-30 minutes to complete each night) finished in under 1.5 minutes.

Conclusion

So there you go! Two key improvements that you can easily complete in one afternoon for large performance gains.

Give this a try and let me know how it speed up your data loads.

[thrive_leads id=’377′]