Martins Blog

Trying to explain complex things in simple terms

Massive tablespace fragmentation on LMT with ASSM

Posted by Martin Bach on January 27, 2014

I have been asked to investigate another interesting problem worth writing about. It’s worth mentioning here because it deals with a problem I believed to have long since been solved: tablespace fragmentation. However, in this case it was a locally managed tablespace (LMT) with Automatic Segment Space Management (ASSM) enabled. It should be difficult to have fragmentation on one of these, but as you will see it is not impossible.

So the story started innocently enough with an ORA-01653 while shuffling around subpartitions to another tablespace:

01653, 00000, "unable to extend table %s.%s by %s in tablespace %s"
// *Cause:  Failed to allocate an extent of the required number of blocks for
//          a table segment in the tablespace indicated.
// *Action: Use ALTER TABLESPACE ADD DATAFILE statement to add one or more
//          files to the tablespace indicated.

Well except that there was plenty of free space available, the sum(bytes) in dba_free_tablespaces for the tablespace showed in excess of 20 GB. So it could only be: fragmentation. And indeed I quickly found out that the largest contiguous free block was 8 MB. Which should really have been enough for most use cases… After a little digging around I noticed that all subpartitions on the tablespace had at least 9 extents, some more. The sum of bytes per subpartition ever so slightly exceeded 8MB which was the cause for the fragmentation. Here is an example:

SQL> select segment_name,partition_name,bytes
  2  from dba_extents
  3  where segment_name = 'T1_SUBPART'
  4  and partition_name = 'SYS_SUBP26737';

SEGMENT_NAME                   PARTITION_NAME                      BYTES
------------------------------ ------------------------------ ----------
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      131072
T1_SUBPART                     SYS_SUBP26737                      196608
T1_SUBPART                     SYS_SUBP26737                     1048576
T1_SUBPART                     SYS_SUBP26737                     1048576
                                                              ----------
sum                                                              9109504

13 rows selected.

Some of the extents are nicely aligned along 1MB boundaries, but there are a few that are 128kb in size. Multiply this by a few thousand and you get the picture. It’s not good if you have 20,000 entries in DBA_FREE_SPACE for a given tablespace!

The error message was thrown during the execution of an anonymous PL/SQL block that moved lots of subpartitions from one tablespace to another. There were these already mentioned roughly few thousand subpartitions to be moved plus a few unpartitioned tables. The process aborted halfway. The following is more or less a transcript of my troubleshooting efforts.

The test case

To reproduce the problem in our lab I created a table with lots of sub partitions, using the following SQL statement. It doesn’t matter that the subpartitions are more or less empty in this case.

CREATE TABLE t1_subpart
  (
    id, t_pad, date_Created, date_completed, state, spcol
  )
  partition BY range (id) interval (1000)
  subpartition BY list (spcol) subpartition template
  (
    subpartition sp1 VALUES ('0'),
    subpartition sp2 VALUES ('1'),
    subpartition sp3 VALUES ('2'),
    subpartition sp4 VALUES ('3'),
    subpartition spdef VALUES (DEFAULT)
  )
  (
    partition p_man VALUES less than (10000)
  ) AS
WITH v1 AS
  (SELECT rownum n FROM dual CONNECT BY level <= 10000
  )
SELECT mod(rownum,1000000) id,
  rpad(rownum,100) t_pad,
  TRUNC(sysdate) - 180 + TRUNC((rownum-1)/3)/86400 date_created,
  TRUNC(sysdate) - 180 + TRUNC((rownum-1)/3)/86400 + dbms_random.value(1800, 86400)/86400 date_completed,
  CASE
    WHEN mod(rownum,100000) = 0
    THEN CAST('RARE' AS VARCHAR2(12))
    WHEN mod(rownum,10000) = 0
    THEN CAST('FAIRLY RARE' AS VARCHAR2(12))
    WHEN mod(rownum,1000) = 0
    THEN CAST('NOT RARE' AS VARCHAR2(12))
    WHEN mod(rownum,100) = 0
    THEN CAST('COMMON' AS   VARCHAR2(12))
    ELSE CAST('THE REST' AS VARCHAR2(12))
  END state,
  TO_CHAR(mod(rownum, 4)) spcol
FROM v1,
  v1
WHERE rownum <= 1e6; 

(If this looks familiar to you then you probably know this presentation by Jonathan Lewis)

A couple of things to remember on this 11.2.0.3 database: Oracle introduced large extents for partitions, as explained in “Initial Extent Size of a Partition Changed to 8MB from 64KB After Upgrade to 11.2.0.2 or Later (Doc ID 1295484.1)”. So having empty partitions is now a huge waste of space!

But back to the problem: is there anything that would lead you to suspect fragmentation?

SQL> select max(blocks) from dba_segments where segment_name = 'T1_SUBPART';

MAX(BLOCKS)
-----------
       1024

Elapsed: 00:00:00.15

No, all subpartitions are 8 MB, most of them 1x8M, < 15 different:

select s.segment_name,s.partition_name, s.blocks, s.extents
from user_segments s
where s.segment_name = 'T1_SUBPART'
and extents <> 1
order by extents desc;

SEGMENT_NAME                   PARTITION_NAME                     BLOCKS    EXTENTS
------------------------------ ------------------------------ ---------- ----------
T1_SUBPART                     SYS_SUBP29210                        1024          8
T1_SUBPART                     SYS_SUBP32032                        1024          5
T1_SUBPART                     SYS_SUBP29180                        1024          5
T1_SUBPART                     SYS_SUBP28738                        1024          4
T1_SUBPART                     SYS_SUBP32624                        1024          4
T1_SUBPART                     SYS_SUBP26953                        1024          4
T1_SUBPART                     SYS_SUBP30400                        1024          4
T1_SUBPART                     SYS_SUBP30996                        1024          4
T1_SUBPART                     SYS_SUBP31591                        1024          4
T1_SUBPART                     SYS_SUBP28143                        1024          4
T1_SUBPART                     SYS_SUBP27548                        1024          4
T1_SUBPART                     SYS_SUBP29805                        1024          4

Most of the subpartitions have 1×8 MB extents, very few have more, but all sum up to 8MB:

SQL> select count(extents),extents from user_segments s
   where s.segment_name = 'T1_SUBPART'
   group by extents

COUNT(EXTENTS)    EXTENTS
-------------- ----------
          4942          1
             9          4
             2          5
             1          8

Elapsed: 00:00:00.66
SQL>

SQL> select distinct bytes from user_segments where segment_name = 'T1_SUBPART';

     BYTES
----------
   8388608

Next I am trying to reproduce the problem as closely as possible. Here, HCC compression has been enabled by the customer on the table, after which the partitions have been moved to the destination tablespace. Note that this problem has nothing to do with HCC. I have since tried to reproduce this case outside of Exadata with the same result.

SQL> alter table t1_subpart compress for query high;

Table altered.

Next I’ll show you DBA_FREE_SPACE for the tablepace:

SQL> select * from dba_free_space
  2* where tablespace_name = 'DESTINATION';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
DESTINATION                            10        128 1072693248     130944           10
DESTINATION                            11        128 1072693248     130944           11

Elapsed: 00:00:00.01

So now I’m going to move all of the subpartitions. At first I couldn’t reproduce the fragmentation at all when I moved these serially-everything stayed just as it was, some of the subpartitions which consisted of more than 1 extent actually were coalesced into just 1. I didn’t pay attention to the “parallel” keyword as I thought that was just a means of speeding things up. When after the 3rd time I had no luck reproducing the fragmentation I actually tried with parallel, and here is how it went. First the anonymous PL/SQL block:

SQL> !cat movemove.sql
begin
 for i in (select subpartition_name from user_tab_subpartitions
           where table_name = 'T1_SUBPART' and tablespace_name <> 'DESTINATION')
 loop
  begin
   execute immediate 'alter table t1_subpart move subpartition ' || i.subpartition_name || ' tablespace DESTINATION parallel';
  exception
   when others then
    dbms_output.put_line('could not move ' || i.subpartition_name || ' due to ' || sqlerrm(sqlcode));
  end;
 end loop;
end;
/

This takes a long time, and the quality of the script is not production-worthy! In this case the anonymous block completed without issues, but that was just 1 table and not the same volume as the real system where this happened. Look at the extent distribution now:

SQL> select count(extents),extents from user_segments s
  2  where s.segment_name = 'T1_SUBPART'
  3  group by extents;

COUNT(EXTENTS)    EXTENTS
-------------- ----------
           634          1
          3959         13
           356          5
             4         48
             1          8

Also consider the impact on DBA_FREE_SPACE:

SQL> select count(1) from dba_free_Space where tablespace_name = 'DESTINATION';

  COUNT(1)
----------
     19796

Now looking at a random subpartition shows the dilemma:

SQL> select PARTITION_NAME, SEGMENT_TYPE, TABLESPACE_NAME, EXTENT_ID, BYTES, BLOCKS
  2  from dba_extents where segment_name = 'T1_SUBPART'
  3  and partition_name = 'SYS_SUBP26737'
  4  order by extent_id;

PARTITION_NAME                 SEGMENT_TYPE       TABLESPACE_NAME                 EXTENT_ID      BYTES     BLOCKS
------------------------------ ------------------ ------------------------------ ---------- ---------- ----------
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             0     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             1     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             2     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             3     131072         16
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             4     196608         24
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             5    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             6    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             7    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             8    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                             9    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                            10    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                            11    1048576        128
SYS_SUBP26737                  TABLE SUBPARTITION DESTINATION                            12    1048576        128
                                                                                            ---------- ----------
sum                                                                                            9109504       1112

So there you go-I don’t really know what to say other than “d’oh”. It probably doesn’t hurt you too much but it’s still worth knowing that if you are consolidating thousands of segments in a previously empty tablespace you can get into trouble. If I work out why it does this I’ll post a follow-up. For now the work around is to move segments serially. Yes I know.

About these ads

7 Responses to “Massive tablespace fragmentation on LMT with ASSM”

  1. Hi Martin,

    it probably makes sense to link this post here:

    http://oracle-randolf.blogspot.com/2013/04/asm-au-size-and-lmt-autoallocate.html

    because it stresses another important point that can lead to the “unable to extend” error messages, although there is still plenty of space available in a tablespace.

    Randolf

  2. One of the reasons all my LMTs with ASSM also have uniform allocation, size depending on the size of the segments I keep there. Life is too short to worry about tablespace fragmentation…

  3. Martin,

    The problem is a side effect of extent trimming in parallel CTAS, move, insert, and merge. Every PX slave thinks it is creating its own segment, but eventually the query co-ordinator takes over and tries to claw back space from all (or all but the last) PX-created segment.

    This can leave holes which are multiples of 64KB or 1MB in the tablespace – and if all you do is those parallel operations and if Oracle always starts allocating at 8MB there’s no way those holes can be filled up.

    Nuno’s solution isn’t (generally) any better, since the absence of extent trimming means the empty space is inside the object rather than visible in the tablespace.

    I’ll be writing this up one day – it’s on my length “to write” list.

  4. Larry said

    I think I am missing where the ‘parallel’ keyword appears in your PL/SQL block?

    LarryD

    • Martin Bach said

      You are right, the PL/SQL block shows what I tried _before_ adding the parallel keyword to the alter table … move … command. Should be ok now.

      Martin

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 2,271 other followers

%d bloggers like this: