OK, so got a bit bored and decided to add several hundred thousand rows to a table.
Not too sure of distribution of values over the range of categories, transactioncodes and dates, so made it pretty even throughout the range of data.
Then tried half a dozen different approaches, and analysed the data / table structures.
Assuming the unique identity was a clustered primary key (yeah big assumption), suggest creating the folowwing index :
CREATE NONCLUSTERED INDEX [idx_my_table_trans_cat_id] ON [dbo].[my_table]
(
[transactioncode] ASC,
[category] ASC,
[unique_id] ASC
)
INCLUDE ( [date])
WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF)
and then the query :
;with cte2 as
(
select a.unique_id, a.category, a.date,(select min(unique_id) from my_table c where c.unique_id > a.unique_id) as next_unique_id
from my_table a
where a.transactioncode = 'A'
)
select *
from cte2 a
inner join my_table b on a.next_unique_id = b.unique_id and a.category = b.category and b.transactioncode = 'R'
where datediff(d, a.date, b.date) = 1
seems to work really well using the covering index without going back to raw data...
But then there are a few assumptions (as per prior posting as well), and really does need to be cleared up, though, pretty happy with the results given those assumptions thus far.