Friday, April 27, 2012

Generic Delta Mechanism


Generic delta - How does it work (and not).

Generic delta in custom extractors allows you to incorporate delta capability into your datasource. If this is setup for your datasource, it will let BW request and extract delta. Now, if it is as easy to choose this option in RSO2, why would I not enable all my custom datasources with delta (regardless whether I use delta or not) and use it when needed? Let us see if it is as easy as it seems.

How does it work?

You define a generic extractor based on a table/view, an info set query, or on top of a function module. The S-API, calledwhen a BW extraction is scheduled, gets the data from the underlying extractor (view eg), and applies the delta functionality to filter delta records before passing it to BW. For identifying delta, it needs an identifier - a field that is part of the extractor structure. It also needs to know how to apply this delta filter on the specified field. It can not know what is going on in the underlying data tables, what all have changed there; it can only build selections based on some marker value up to which data was extracted, and the next selection needs to be greater than that.
Let us see how it works in each of the case it allows you to choose -

Limitations of each of the delta identifier options

Numeric pointer

You specify a field (type numeric) from the extract structure as the delta identifier. Every time a delta extraction is called, S-API filters out records which have a numeric pointer value greater  than the maximum extracted value in the last run. Let us dig what the system does here (true for other delta identifiers also), so we know where to debug when we don't get our delta-
When the first initialization call is made, a delta queue entry is created (RSA7), in addition to selecting all the data under the datasource and sending it to BW. Current value of delta identifier field is written to delta table ROOSGENDLM (in current pointer field). This same is visible in RSA7 if we click the stat icon for the datasource. It tells us the maximum value extracted to BW (for the delta field).
Next initialization, this value is read to filter records with values greater than this (with safety interval adjustments, which are mentioned later) value. The pointer value is updated at the end. This continues in each delta extraction.
An additional field 'repeat pointer' stores the last run value (Same table), so we can do a 'repeat delta' if our delta doesn't succeed.
This process is similar for other delta identifier types.
A numeric pointer field is not easy to find in most cases. Sure we have fields like document no, but this will not capture a document that was extracted earlier and has been changed (as the pointer value can only go up). Records are extracted only once in this case, and it is not easy to find examples of such delta extraction. (Additional problem: if there is a record out of sequence, it will also create problems; eg say record no 1001, 1002 were for one type of docs, and 2002, 2003 for another type; these were extracted. The pointer value will have 2003, and if now docs 1004, 1005, 2004 are created, next delta will only extract 2004. Quite common scenario with tables with multiple document ranges).
Let us now look at the other two options

Calendar day

This seems like a good candidate, what with most data tables having fields like 'Change date'. It is. However, before usingit, we need to understand how BW/S-API uses it. Let us say AEDAT is the field in the view from which we are extracting the delta.
For one, S-API filters data based on system date, and extracts data up to today only (if there is no safety interval specified). Another thing to note (both for timestamp and date) is that the  delta pointer stores the 'calendar' day (and 'system timestamp') values in the pointers, not the largest value in the extracted data. To clarify this, Let us say we run the extraction of 08/04 (MM/DD), records will be selected up to AEDAT value of 08/04. Let us say, we had records for 08/01, and 08/02, and these were extracted successfully. The pointer in RSA7 will not have 08/02, but 08/04. For any reason, if there are now records added with value of 08/03 (because they are updated from some other system and there is a one day delay), these would not be extracted in the next run. Next run would extract data for 08/05 onwards, and you may wonder how come you got data up to 08/02 and then for 08/05 but the 08/03 data is not selected. But that is logical, as it is CALDAY, not AEDAT that essentially is being used by the system to store the pointer.
Also, if you put data for a future date expecting it to be selected (as it is higher than current pointer value), that will also not work. In effect, S-API will apply filter as 'AEDAT GT pointer_value AND AEDAT le current_date'.
One limitation is that you can get data only up to a date, so it can bring delta only once a day. Another thing to note is what happens if records are posted on the same date after extraction has happened? These records will not be selected. To avoid this, it makes sense to only extract data up to 'yesterday'. This is achieved by setting the safety interval limit (upper) to 1, which adjusts the selection to "up to current date minus one".

Timestamp

Timestamp seems to be the most obvious preference, since you can get the latest delta to BW. However, in most cases, it is not easy to find a timestamp field (you do have change_date and change_time fields, but not one field with change_date and time in most tables). You also need to be aware of the fact that SAP applies system-timestamp values for selecting the data. So, similar to example above, Let us say you are extracting at 20060804000000, you get all data up to 2006080400000. This value is stored in the pointer (20060804000000). However, your source table is getting updated by a timestamp value of another system (say by getting it from a file, or, from RFC call, whatever) and is not in sync with SAP server time. Now, after this extraction, another record comes to the underlying table which has a timestamp value of 20060803230001 (because it is coming from PuertoRico instead of Hawaii, say). Now, this record is not going to be extracted in the next delta (or any delta afterwards), because its timestamp value is lower than the pointer from the last extraction.
The case where the timestamp field is being updated with a value/field other than the source system current timestamp is a different issue than that of the system taking a finite amount of time in updating the record; the later being handled by using safety interval limits.

Safety intervals: How safe?

Safety interval limits are introduced to ensure that we do take into account of the fact that records are being modified in the source system at the time of delta extraction. If an update is  happening at the time of delta extraction, how do we make sure that it is extracted? Safety interval adjusts the selection filter on the delta field to a 'reasonable' extent (specified by the user) thus making sure that the datasource takes care of such possibility.
Safety intervals are applicable for all delta types; however it is mostly usable with 'Timestamp' only (and with CALDAY, it is normal to have the lower limit blank and upper limit as '1' as explained earlier). An understanding of how safety intervals work will make it clear.
Upper limit (5), lower limit (1) [specified in safety intervals]
Case 1 - Numeric pointer (last extracted value 10011; Data in underlying table (Max value = 10025) )
Data would be extracted from a min-value of 10010 (lower limit adjusts the last counter 10011 by specified value - 1) max-value of 10020 (after adjustment for upper safety limit - by 5). The current pointer would now indicate 10020 in RSA7.
Case 2 - Timestamp (Last extracted value 20060804113000; current system time: 20060805080000)
Next extraction will select records from timestamp value 20060804112959 (after adjusting 1 second as specified in lower limit) to timestamp value 20060805075955 (adjusting it by 5 seconds as specified in upper limit).
Similar would be the case for calendar days, however, since there is not a likelihood of records overlapping over days, the limits are not used for such delta type.
For more understanding of these limits, do an F1 on these fields in RSO2. It also explains why and when one may need to specify these limits.
Since the lower safety interval limit adjusts the lower selection value of the delta identifier field, it could be possible that some records are brought twice. One needs to be mindful of this fact (and so it shall go to an 'overwrite' type of ODS first), and for this reason, it is only available if the datasource is defined as 'new status for changed records' and not for 'additive delta'.

When should I be using these safety interval limits?

If you are extracting on timestamp, you can think of using it. In most cases, upper limit will suffice if you can correctlyassume what the value should be. If your system takes at the maximum 2 minutes to post a corresponding record, specify 120 (for timestamp) as the upper limit. If, it is a numeric pointer type, and in 2 minutes maximum number of txns posted to the system is 20, you should specify 20. In case of CALDAY type field, it is pretty much standard to set '1' as upper safety limit and blank lower limit.
You will not be extracting everything changed in the source system if you specify an upper limit (since the upper limit of selection is being adjusted), so if you see some records in the source which should but is not coming through in your delta, do check if it is due to safety interval settings.
If still you want to make sure you are getting everything (since there can be cases when your assumption above doesn't hold), you can set the lower limit. If you think that maximum 100 new records are created (while your extraction is taking place), you can specify 100 (for numeric pointer) as the lower limit. Remember that in this case, there is a possibility that you extract some records that has already been extracted in the last run.

What is this option of additive delta/new records?

This sets the delta property of the datasource. Depending on its value, the datasource may be incompatible to certain data targets. For example, if it is additive delta (ROOSOURCE-DELTA = 'ADD'), it may not be fit for updating into an ODS with overwrite. For 'New image' option, the field ROOSOURCE-DELTA is set to 'AIE'.

What do I get my function module to work with in case of delta (in addition to template FM)?

This is not in scope of this document. However, in brief one point is worth mentioning. You get your selections coming from the info-source (or RSA3) in the internal table I_T_SELECT when the first call to FM is made. What is not obvious is that you are also getting the selection for the delta-pointer field in this same table, being added by S-API.
Let us say, you specified AEDAT as the delta identifier field. Your infopackage is sending a request to select data based on CO_CODE and CURR_TYPE fields. You will get corresponding entries in I_T_SELECT table with the selection values for CO_CODE and CURR_TYPE fields which you can read and accordingly select data in your FM. In addition, I_T_SELECT will also have selection values for AEDAT. This would have been appended by S-API, and it would be based on the 'current pointer' entry that you see in RSA7, and current date (in case of CALDAY type delta).
Similar would be the case for any other type of delta-identifier field.

What of deleted records? Well, what of it?

You do not want everything to work perfect, do you? You will be jobless else.
Before you get happy with your testing and declare it working as it should, do consider the case when data is deleted from the source tables. If it is not physically deleted, just marked as being deleted, you can have this marker field in your extractor and build your update accordingly. However, what if the data in the source table is deleted physically? You do not have the record any more, so there is no chance of getting it as a delta.
There are no easy solutions for this. One option is to try and see if the deletion is captured in the change document tables. If it is, you can build the delta record from it (at least the key), which will further need to be handled in the update rules so as to properly cancel out the existing record. This of course means building FM based extractor.
Other options could be to enhance the application logic (while posting the transaction to delete the record) to capture the deletion information in some Z table and read/refresh it while extracting.

5 comments:

  1. finally I understood, Very good example about safety upper limit and lower limit.., I didnot get anywhere this kind of explanation abt this topic.

    Thanks,
    Naresh

    ReplyDelete
  2. Very nicely illustrated all the generic delta settings.. Thanks a lot

    ReplyDelete
  3. Hi,

    I have a requirement wherein I need to a custom delta extractor based on several ztables
    Here during delta extraction, deletion too takes place from some ztables. Can you please let me know how to create a delta extractor which is full proof in capturing all deltas.
    As per my proposed design, the extracted data would be stored into a write optimized dso and then to a cube.
    So what all settings should I keep in mind.
    its urgent, so it would be gr8 if you could guide me in doin this.
    I am reachable on savvysap2013@gmail.com.
    Regards
    Savvy

    ReplyDelete
  4. hi Savvy,

    First Create a view is the combination of required tables. Based on view we can create a generic data source.

    Creating a database view
    http://www.saptechnical.com/Tutorials/ABAP/View/Database2.htm

    Thanks&Regards,
    Nageswararao Jampani

    ReplyDelete
  5. THANKS A lot.finally i too understood..

    ReplyDelete