I have been asked to investigate another interesting problem worth writing about. It’s worth mentioning here because it deals with a problem I believed to have long since been solved: tablespace fragmentation. However, in this case it was a locally managed tablespace (LMT) with Automatic Segment Space Management (ASSM) enabled. It should be difficult to have fragmentation on one of these, but as you will see it is not impossible.
So the story started innocently enough with an ORA-01653 while shuffling around subpartitions to another tablespace:
01653, 00000, "unable to extend table %s.%s by %s in tablespace %s" // *Cause: Failed to allocate an extent of the required number of blocks for // a table segment in the tablespace indicated. // *Action: Use ALTER TABLESPACE ADD DATAFILE statement to add one or more // files to the tablespace indicated.
Well except that there was plenty of free space available, the sum(bytes) in dba_free_tablespaces for the tablespace showed in excess of 20 GB. So it could only be: fragmentation. And indeed I quickly found out that the largest contiguous free block was 8 MB. Which should really have been enough for most use cases… After a little digging around I noticed that all subpartitions on the tablespace had at least 9 extents, some more. The sum of bytes per subpartition ever so slightly exceeded 8MB which was the cause for the fragmentation. Here is an example:
SQL> select segment_name,partition_name,bytes 2 from dba_extents 3 where segment_name = 'T1_SUBPART' 4 and partition_name = 'SYS_SUBP26737'; SEGMENT_NAME PARTITION_NAME BYTES ------------------------------ ------------------------------ ---------- T1_SUBPART SYS_SUBP26737 1048576 T1_SUBPART SYS_SUBP26737 1048576 T1_SUBPART SYS_SUBP26737 1048576 T1_SUBPART SYS_SUBP26737 1048576 T1_SUBPART SYS_SUBP26737 1048576 T1_SUBPART SYS_SUBP26737 1048576 T1_SUBPART SYS_SUBP26737 131072 T1_SUBPART SYS_SUBP26737 131072 T1_SUBPART SYS_SUBP26737 131072 T1_SUBPART SYS_SUBP26737 131072 T1_SUBPART SYS_SUBP26737 196608 T1_SUBPART SYS_SUBP26737 1048576 T1_SUBPART SYS_SUBP26737 1048576 ---------- sum 9109504 13 rows selected.
Some of the extents are nicely aligned along 1MB boundaries, but there are a few that are 128kb in size. Multiply this by a few thousand and you get the picture. It’s not good if you have 20,000 entries in DBA_FREE_SPACE for a given tablespace!
The error message was thrown during the execution of an anonymous PL/SQL block that moved lots of subpartitions from one tablespace to another. There were these already mentioned roughly few thousand subpartitions to be moved plus a few unpartitioned tables. The process aborted halfway. The following is more or less a transcript of my troubleshooting efforts.
The test case
To reproduce the problem in our lab I created a table with lots of sub partitions, using the following SQL statement. It doesn’t matter that the subpartitions are more or less empty in this case.
CREATE TABLE t1_subpart ( id, t_pad, date_Created, date_completed, state, spcol ) partition BY range (id) interval (1000) subpartition BY list (spcol) subpartition template ( subpartition sp1 VALUES ('0'), subpartition sp2 VALUES ('1'), subpartition sp3 VALUES ('2'), subpartition sp4 VALUES ('3'), subpartition spdef VALUES (DEFAULT) ) ( partition p_man VALUES less than (10000) ) AS WITH v1 AS (SELECT rownum n FROM dual CONNECT BY level <= 10000 ) SELECT mod(rownum,1000000) id, rpad(rownum,100) t_pad, TRUNC(sysdate) - 180 + TRUNC((rownum-1)/3)/86400 date_created, TRUNC(sysdate) - 180 + TRUNC((rownum-1)/3)/86400 + dbms_random.value(1800, 86400)/86400 date_completed, CASE WHEN mod(rownum,100000) = 0 THEN CAST('RARE' AS VARCHAR2(12)) WHEN mod(rownum,10000) = 0 THEN CAST('FAIRLY RARE' AS VARCHAR2(12)) WHEN mod(rownum,1000) = 0 THEN CAST('NOT RARE' AS VARCHAR2(12)) WHEN mod(rownum,100) = 0 THEN CAST('COMMON' AS VARCHAR2(12)) ELSE CAST('THE REST' AS VARCHAR2(12)) END state, TO_CHAR(mod(rownum, 4)) spcol FROM v1, v1 WHERE rownum <= 1e6;
(If this looks familiar to you then you probably know this presentation by Jonathan Lewis)
A couple of things to remember on this 11.2.0.3 database: Oracle introduced large extents for partitions, as explained in “Initial Extent Size of a Partition Changed to 8MB from 64KB After Upgrade to 11.2.0.2 or Later (Doc ID 1295484.1)”. So having empty partitions is now a huge waste of space!
But back to the problem: is there anything that would lead you to suspect fragmentation?
SQL> select max(blocks) from dba_segments where segment_name = 'T1_SUBPART'; MAX(BLOCKS) ----------- 1024 Elapsed: 00:00:00.15
No, all subpartitions are 8 MB, most of them 1x8M, < 15 different:
select s.segment_name,s.partition_name, s.blocks, s.extents from user_segments s where s.segment_name = 'T1_SUBPART' and extents <> 1 order by extents desc; SEGMENT_NAME PARTITION_NAME BLOCKS EXTENTS ------------------------------ ------------------------------ ---------- ---------- T1_SUBPART SYS_SUBP29210 1024 8 T1_SUBPART SYS_SUBP32032 1024 5 T1_SUBPART SYS_SUBP29180 1024 5 T1_SUBPART SYS_SUBP28738 1024 4 T1_SUBPART SYS_SUBP32624 1024 4 T1_SUBPART SYS_SUBP26953 1024 4 T1_SUBPART SYS_SUBP30400 1024 4 T1_SUBPART SYS_SUBP30996 1024 4 T1_SUBPART SYS_SUBP31591 1024 4 T1_SUBPART SYS_SUBP28143 1024 4 T1_SUBPART SYS_SUBP27548 1024 4 T1_SUBPART SYS_SUBP29805 1024 4
Most of the subpartitions have 1×8 MB extents, very few have more, but all sum up to 8MB:
SQL> select count(extents),extents from user_segments s where s.segment_name = 'T1_SUBPART' group by extents COUNT(EXTENTS) EXTENTS -------------- ---------- 4942 1 9 4 2 5 1 8 Elapsed: 00:00:00.66 SQL> SQL> select distinct bytes from user_segments where segment_name = 'T1_SUBPART'; BYTES ---------- 8388608
Next I am trying to reproduce the problem as closely as possible. Here, HCC compression has been enabled by the customer on the table, after which the partitions have been moved to the destination tablespace. Note that this problem has nothing to do with HCC. I have since tried to reproduce this case outside of Exadata with the same result.
SQL> alter table t1_subpart compress for query high; Table altered.
Next I’ll show you DBA_FREE_SPACE for the tablepace:
SQL> select * from dba_free_space 2* where tablespace_name = 'DESTINATION'; TABLESPACE_NAME FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO ------------------------------ ---------- ---------- ---------- ---------- ------------ DESTINATION 10 128 1072693248 130944 10 DESTINATION 11 128 1072693248 130944 11 Elapsed: 00:00:00.01
So now I’m going to move all of the subpartitions. At first I couldn’t reproduce the fragmentation at all when I moved these serially-everything stayed just as it was, some of the subpartitions which consisted of more than 1 extent actually were coalesced into just 1. I didn’t pay attention to the “parallel” keyword as I thought that was just a means of speeding things up. When after the 3rd time I had no luck reproducing the fragmentation I actually tried with parallel, and here is how it went. First the anonymous PL/SQL block:
SQL> !cat movemove.sql begin for i in (select subpartition_name from user_tab_subpartitions where table_name = 'T1_SUBPART' and tablespace_name <> 'DESTINATION') loop begin execute immediate 'alter table t1_subpart move subpartition ' || i.subpartition_name || ' tablespace DESTINATION parallel'; exception when others then dbms_output.put_line('could not move ' || i.subpartition_name || ' due to ' || sqlerrm(sqlcode)); end; end loop; end; /
This takes a long time, and the quality of the script is not production-worthy! In this case the anonymous block completed without issues, but that was just 1 table and not the same volume as the real system where this happened. Look at the extent distribution now:
SQL> select count(extents),extents from user_segments s 2 where s.segment_name = 'T1_SUBPART' 3 group by extents; COUNT(EXTENTS) EXTENTS -------------- ---------- 634 1 3959 13 356 5 4 48 1 8
Also consider the impact on DBA_FREE_SPACE:
SQL> select count(1) from dba_free_Space where tablespace_name = 'DESTINATION'; COUNT(1) ---------- 19796
Now looking at a random subpartition shows the dilemma:
SQL> select PARTITION_NAME, SEGMENT_TYPE, TABLESPACE_NAME, EXTENT_ID, BYTES, BLOCKS 2 from dba_extents where segment_name = 'T1_SUBPART' 3 and partition_name = 'SYS_SUBP26737' 4 order by extent_id; PARTITION_NAME SEGMENT_TYPE TABLESPACE_NAME EXTENT_ID BYTES BLOCKS ------------------------------ ------------------ ------------------------------ ---------- ---------- ---------- SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 0 131072 16 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 1 131072 16 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 2 131072 16 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 3 131072 16 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 4 196608 24 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 5 1048576 128 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 6 1048576 128 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 7 1048576 128 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 8 1048576 128 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 9 1048576 128 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 10 1048576 128 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 11 1048576 128 SYS_SUBP26737 TABLE SUBPARTITION DESTINATION 12 1048576 128 ---------- ---------- sum 9109504 1112
So there you go-I don’t really know what to say other than “d’oh”. It probably doesn’t hurt you too much but it’s still worth knowing that if you are consolidating thousands of segments in a previously empty tablespace you can get into trouble. If I work out why it does this I’ll post a follow-up. For now the work around is to move segments serially. Yes I know.
Hi Martin,
it probably makes sense to link this post here:
http://oracle-randolf.blogspot.com/2013/04/asm-au-size-and-lmt-autoallocate.html
because it stresses another important point that can lead to the “unable to extend” error messages, although there is still plenty of space available in a tablespace.
Randolf
One of the reasons all my LMTs with ASSM also have uniform allocation, size depending on the size of the segments I keep there. Life is too short to worry about tablespace fragmentation…
Martin,
The problem is a side effect of extent trimming in parallel CTAS, move, insert, and merge. Every PX slave thinks it is creating its own segment, but eventually the query co-ordinator takes over and tries to claw back space from all (or all but the last) PX-created segment.
This can leave holes which are multiples of 64KB or 1MB in the tablespace – and if all you do is those parallel operations and if Oracle always starts allocating at 8MB there’s no way those holes can be filled up.
Nuno’s solution isn’t (generally) any better, since the absence of extent trimming means the empty space is inside the object rather than visible in the tablespace.
I’ll be writing this up one day – it’s on my length “to write” list.
Hi Jonathan,
Thanks for your explanation, I am looking forward to this future post.
Martin
Ditto! Will be interesting to revisit LMT in the context of partitions and space management. Not sure this 8MB thing that Oracle has now for partitions was such a good idea…
I think I am missing where the ‘parallel’ keyword appears in your PL/SQL block?
LarryD
You are right, the PL/SQL block shows what I tried _before_ adding the parallel keyword to the alter table … move … command. Should be ok now.
Martin