Home » SQL & PL/SQL » SQL & PL/SQL » Optimization of SQL (12.2)
Optimization of SQL [message #684317] Sun, 09 May 2021 03:10 Go to next message
swasora
Messages: 1
Registered: May 2021
Junior Member
Hi Team



I have the below code where it processes trillions of records and i would ideally want to break this down to a more simpler query or number of queries that will allow the repetetive condition checks in case statements to be simplified and helps in performance.Any advise will be really helpful

Here if you can see the date - interval condition for a similar let of columns are repeated while the sum of the computations is carried out.What would be the best way to tune the query so that the aaggregations are done in an optimal manner




select a.cust_id,a.cust_dt,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '14' day) ) and c.ea_flg in ('6') then c.some_value else 0 end) column 1,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '14' day) ) and c.ea_flg in ('7') then c.some_value else 0 end) column 2,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '14' day) ) and c.ea_flg in ('8') then c.some_value else 0 end) column 3,






sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '7' day) ) and c.ea_flg in ('5') then c.some_value else 0 end) column 4,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '7' day) ) and c.ea_flg in ('34') then c.some_value else 0 end) column 5,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '7' day) ) and c.ea_flg in ('56') then c.some_value else 0 end) column 6,






sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '30' day) ) and c.ea_flg in ('76') then c.some_value else 0 end) column 7,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '30' day) ) and c.ea_flg in ('23') then c.some_value else 0 end) column 8,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '30' day) ) and c.ea_flg in ('54') then c.some_value else 0 end) column 9,










sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '60' day) ) and c.ea_flg in ('78') then c.some_value else 0 end) column 10,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '60' day) ) and c.ea_flg in ('99') then c.some_value else 0 end) column 11,


sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '60' day) ) and c.ea_flg in ('21') then c.some_value else 0 end) column 12






from 


table_1 a,table_2 b,table_3 c

join conditions on a,b,c  
where a.cust_dt = sysdate -1





group by 1,2; 


table a = 24k. Here cust_dt is unique

Table b = 50-60 million cust_dt is repeated

Table C = 80 million (approx) cust_dt is repeated



joins between the 3 tables are on below condition

a.cust_id=b.cust_id

b.some_col1 = c.some_col1

b.cust_dt=c.cust_dt
Re: Optimization of SQL [message #684318 is a reply to message #684317] Sun, 09 May 2021 05:21 Go to previous messageGo to next message
John Watson
Messages: 8560
Registered: January 2010
Location: Global Village
Senior Member
Welcome to the forum.
Please read the OraFAQ Forum Guide and How to use [code] tags and make your code easier to read.

You have not said what the problem is, can I assume that it is performance? You would like the query to run faster?
If so, it is unusual that column projections such as the CASE statements to which you refer are a significant issue. More likely the problem is likely to do with the execution plan. You had better show it, together with the real query and some idea of the data volumes. For example, we know that the query returns only one row, because you have an equality predicate on a unique column of A. But I have no idea how may rows come from B and C to be aggregated.

[Updated on: Sun, 09 May 2021 05:22]

Report message to a moderator

Re: Optimization of SQL [message #684319 is a reply to message #684318] Sun, 09 May 2021 06:55 Go to previous message
Solomon Yakobson
Messages: 3063
Registered: January 2010
Location: Connecticut, USA
Senior Member
Code you posted is not a working code:

sum(case when a.cust_dt < b.cust_dt and a.cust_dt >= (b.cust_dt - interval '14' day) ) and c.ea_flg in ('6') then c.some_value else 0 end) column 1
Extra ), space is select list aliases: column 1 and RTFM on aggregations and how group by 1,2 would work vs order by 1,2.

Anyway, I'd try something like:

with t1 as (
            select  /*+ materialize */
                    a.cust_id a_cust_dt,
                    a.cust_dt
              from  table_1 a
              where a.cust_dt = sysdate - 1
           ),
     t2 as (
            select  /*+ materialize */ -- index on table_2(cust_id,cust_dt,some_col1) would increase performance
                    t1.cust_id,
                    t1.a_cust_dt,
                    b.cust_dt b_cust_dt,
                    b.cust_dt - t1.cust_dt cust_dt_diff
                    b.some_col1
             from  t1,
                   table_2 b
             where b.cust_id = t1.cust_id
               and t1.a_cust_dt >= b.cust_dt - 60
               and t1.a_cust_dt < b.cust_dt
          )
select  -- index on table_3(cust_dt,some_col1,ea_flg) would increase performance
        t2.cust_id,
        t2.a_cust_dt cust_dt
        sum(
            case
              when t2.cust_dt_diff <= 14 and c.ea_flg in ('6') then c.some_value
              else 0
            end
           ) column1,
        sum(
            case
              when t2.cust_dt_diff <= 14 and c.ea_flg in ('7') then c.some_value
              else 0
            end
           ) column2,
        sum(
            case
              when t2.cust_dt_diff <= 14 and c.ea_flg in ('8') then c.some_value
              else 0
            end
           ) column3,
        sum(
            case
              when t2.cust_dt_diff <= 7 and c.ea_flg in ('5') then c.some_value
              else 0
            end
           ) column4,
        sum(
            case
              when t2.cust_dt_diff <= 7 and c.ea_flg in ('34') then c.some_value
              else 0
            end
           ) column5,
        sum(
            case
              when t2.cust_dt_diff <= 7 and c.ea_flg in ('56') then c.some_value
              else 0
            end
           ) column6,
        sum(
            case
              when t2.cust_dt_diff <= 30 and c.ea_flg in ('76') then c.some_value
              else 0
            end
           ) column7,
        sum(
            case
              when t2.cust_dt_diff <= 30 and c.ea_flg in ('23') then c.some_value
              else 0
            end
           ) column8,
        sum(
            case
              when t2.cust_dt_diff <= 30 and c.ea_flg in ('54') then c.some_value
              else 0
            end
           ) column9,
        sum(
            case
              when t2.cust_dt_diff <= 60 and c.ea_flg in ('78') then c.some_value
              else 0
            end
           ) column10,
        sum(
            case
              when t2.cust_dt_diff <= 60 and c.ea_flg in ('99') then c.some_value
              else 0
            end
           ) column11,
        sum(
            case
              when t2.cust_dt_diff <= 60 and c.ea_flg in ('21') then c.some_value
              else 0
            end
           ) column12
  from  t2,
        table_3 c
  where c.cust_dt = t2.b_cust_dt
    and c.some_col1 = t2.some_col1
    and c.ea_flg in (5,6,7,8,21,23,34,54,56,76,78,99) 
  group by t2.cust_id,
           t2.a_cust_dt
/
SY.
Previous Topic: multiple queries and 1 output window/result
Next Topic: Timezone Europe/Paris
Goto Forum:
  


Current Time: Mon Jun 21 23:06:49 CDT 2021