17 maart 2023

How do you create XML files from an Oracle database? And how do you collect and load these files into a reporting database? The answers don’t have to be complicated. In this blog, I will show you how I developed a system that creates, collects, and loads XML files from an Oracle database to a reporting database. 

This system mainly focuses on collecting data about Oracle databases, but can also be used to store data from other sources, such as operating systems. In addition, it can be used to report about trends in size and load. 

Once a day, this system can collect data about “anything” you want to report about. For example:

  • sizes of tablespaces and schemas 
  • all kinds of performance information collected from dictionary views like v$sysstat
  • run times from database jobs and backups

The information is collected once a day and stored in XML files. These files are transported to a server with our reporting database, where all the information about various objects is collected. 

But let’s look at the basics first.

Step 1: creating XML files

Creating basic XML files from an Oracle database is actually quite simple when you use the SQL/XML standard functions. To illustrate this, I will write a query to create an XML file with information about the size of the database schemas. At the end of this blog, I’ll give you a few other examples.

The XML file contains 2 parts. First, we have (header) XML elements with general information about where the file comes from, like the database name, server name and timestamp. Next, we add XML elements about the actual data (in this case, the schemas). In other words: the owner and size for each schema.

For the schema sizes, we collect the extent sizes for dabases extents: 

SELECT e.owner      AS "OWNER"
,      SUM(e.bytes) AS "BYTES"
FROM dba_extents e
GROUP BY e.owner
 
 
OWNER                   BYTES
------------------ ----------
SYS                1692065792
SYSTEM               14376960
DBSNMP                 196608
METRIC              125042688
FLOWS_FILES           1638400
ORDS_METADATA         1048576
TSTUSR                2416640
APEX_200100         246415360
GSMADMIN_INTERNAL      950272
QMETRIC              13959168
OUTLN                  155648
XDB                 613220352
 
12 ROWS selected.

 

Next step: translate each row of the result into an XML element. To do this, we use the functions xmlelement and xmlforest.

SELECT xmlelement
  ( SCHEMA
  , xmlforest
    (  e.owner      AS "OWNER"
    ,  SUM(e.bytes) AS "BYTES"
    )
   ) xmldata
FROM   dba_extents e
GROUP  BY e.owner
/
 
XMLDATA
-------------------------------------------------------------------------
<SCHEMA><OWNER>SYS</OWNER><BYTES>1692065792</BYTES></SCHEMA>
<SCHEMA><OWNER>SYSTEM</OWNER><BYTES>14376960</BYTES></SCHEMA>
<SCHEMA><OWNER>DBSNMP</OWNER><BYTES>196608</BYTES></SCHEMA>
<SCHEMA><OWNER>METRIC</OWNER><BYTES>125042688</BYTES></SCHEMA>
<SCHEMA><OWNER>FLOWS_FILES</OWNER><BYTES>1638400</BYTES></SCHEMA>
<SCHEMA><OWNER>ORDS_METADATA</OWNER><BYTES>1048576</BYTES></SCHEMA>
<SCHEMA><OWNER>TSTUSR</OWNER><BYTES>2416640</BYTES></SCHEMA>
<SCHEMA><OWNER>APEX_200100</OWNER><BYTES>246415360</BYTES></SCHEMA>
<SCHEMA><OWNER>GSMADMIN_INTERNAL</OWNER><BYTES>950272</BYTES></SCHEMA>
<SCHEMA><OWNER>QMETRIC</OWNER><BYTES>13959168</BYTES></SCHEMA>
<SCHEMA><OWNER>OUTLN</OWNER><BYTES>155648</BYTES></SCHEMA>
<SCHEMA><OWNER>XDB</OWNER><BYTES>613220352</BYTES></SCHEMA>
 
12 ROWS selected.

Now, we have 12 XML elements, and each is about a schema. Each element has two properties: the owner and the number of bytes. Using XMLAGG, we join these 12 elements into one: 

SET feedback 1
col XMLDATA format a60
 
SELECT xmlagg (xmlelement
   ( SCHEMA
    , xmlforest
     ( e.owner      AS "OWNER"
     , SUM(e.bytes) AS "BYTES"
     )
    )) xmldata
FROM   dba_extents e
GROUP  BY e.owner
/
 
 
XMLDATA
------------------------------------------------------------
<SCHEMA><OWNER>SYS</OWNER><BYTES>1692098560</BYTES></SCHEMA>
<SCHEMA><OWNER>SYSTEM</OWNER><BYTES>14376960</BYTES></SCHEMA
><SCHEMA><OWNER>DBSNMP</OWNER><BYTES>196608</BYTES></SCHEMA>
<SCHEMA><OWNER>METRIC</OWNER><BYTES>125042688</BYTES></SCHEM
A><SCHEMA><OWNER>FLOWS_FILES</OWNER><BYTES>1638400</BYTES></
SCHEMA><SCHEMA><OWNER>ORDS_METADATA</OWNER><BYTES>1048576</B
YTES></SCHEMA><SCHEMA><OWNER>TSTUSR</OWNER><BYTES>2416640</B
YTES></SCHEMA><SCHEMA><OWNER>APEX_200100</OWNER><BYTES>24641
5360</BYTES></SCHEMA><SCHEMA><OWNER>GSMADMIN_INTERNAL</OWNER
><BYTES>950272</BYTES></SCHEMA><SCHEMA><OWNER>QMETRIC</OWNER
><BYTES>13959168</BYTES></SCHEMA><SCHEMA><OWNER>OUTLN</OWNER
><BYTES>155648</BYTES></SCHEMA><SCHEMA><OWNER>XDB</OWNER><BY
TES>613220352</BYTES></SCHEMA>
 
1 ROW selected.

The XML file should also contain header information about the database and instance, so let’s write a query to achieve this. 

SELECT 'SCHEMAS' docname
,      dbid
,      name
,      instance_name
,      host_name
,      db_unique_name
,      to_char(sysdate,'DD-MM-YYYY-HH24:MI:SS') DATE_TIME
FROM   v$database
,      v$instance
/

You can also add extra columns for the header information when needed. If you work with a container database for example, you could add the name of the pluggable database. Make sure you have enough information in the header to identify a unique database. Also include a document name that is used to recognize the structure of the XML when we load it into our monitoring database later.

The header information query is also rewritten in XML format and combined with the XML element SCHEMAS. It’s based on the query for the schema sizes.

SELECT xmlelement("DATABASE",
            xmlelement("DOCNAME",'SCHEMAS'),
            xmlelement("DBID",dbid),
            xmlelement("DBNAME",name),
            xmlelement("INSTANCE_NAME",instance_name),
            xmlelement("HOST_NAME",host_name),
            xmlelement("DB_UNIQUE_NAME",db_unique_name),            xmlelement("DATE_TIME",to_char(sysdate,'DD-MM-YYYY-HH24:MI:SS')),
            xmlelement("SCHEMAS",
                         ( SELECT xmlagg( xmlelement
                             ( SCHEMA
                             , xmlforest
                               ( e.owner      AS "OWNER"
                               , SUM(e.bytes) AS "BYTES"
                             )
                           ))
                           FROM dba_extents e
                           GROUP BY e.owner
                         )
                      ) AS schema
            ) AS xmldata
FROM v$database
,    v$instance
/

When executed, the query results in a single XML document.

XMLDATA
------------------------------------------------------------
<DATABASE><DOCNAME>SCHEMAS</DOCNAME><DBID>1203838370</DBID><
DBNAME>XMIXP</DBNAME><INSTANCE_NAME>XMIXP</INSTANCE_NAME><HO
ST_NAME>orclmetricdbhst</HOST_NAME><DB_UNIQUE_NAME>XMIXP</DB
_UNIQUE_NAME><DATE_TIME>14-10-2021-15:08:24</DATE_TIME><SCHE
MAS><SCHEMA><OWNER>APEX_200100</OWNER><BYTES>246415360</BYTE
S></SCHEMA><SCHEMA><OWNER>DBSNMP</OWNER><BYTES>196608</BYTES
></SCHEMA><SCHEMA><OWNER>FLOWS_FILES</OWNER><BYTES>1638400</
BYTES></SCHEMA><SCHEMA><OWNER>GSMADMIN_INTERNAL</OWNER><BYTE
S>950272</BYTES></SCHEMA><SCHEMA><OWNER>METRIC</OWNER><BYTES
>125042688</BYTES></SCHEMA><SCHEMA><OWNER>ORDS_METADATA</OWN
ER><BYTES>1048576</BYTES></SCHEMA><SCHEMA><OWNER>OUTLN</OWNE
R><BYTES>155648</BYTES></SCHEMA><SCHEMA><OWNER>QMETRIC</OWNE
R><BYTES>13959168</BYTES></SCHEMA><SCHEMA><OWNER>TSTUSR</OWN
ER><BYTES>2416640</BYTES></SCHEMA><SCHEMA><OWNER>SYS</OWNER>
<BYTES>1692098560</BYTES></SCHEMA><SCHEMA><OWNER>SYSTEM</OWN
ER><BYTES>14376960</BYTES></SCHEMA><SCHEMA><OWNER>XDB</OWNER
><BYTES>613220352</BYTES></SCHEMA></SCHEMAS></DATABASE>
 
1 ROW selected.

Schedule a daily script for the query above. If we use the sql spool command with a unique output name and the “.xml” extension, we should have an XML file with the schema sizes. Make sure that each file has a unique name. This can be done by including the timestamp, object name and document name in the file name. 

When creating the file, it is important that the complete XML is on one line. You cannot have line breaks in the file, so you must use a large line-size in SQL Plus or Linux commands like sed or xargs to remove any line breaks.

Step 2: collect the XML files

All XML files should be loaded into an Oracle database. To do this, we need to collect the XML files daily. 

Copy the XML file to the server where we collect all our information. Depending on your situation, you can use scp, email, FTP or any other protocol to collect the files. All XML files for all monitored databases can be stored in one directory.

Step 3: read XML into the database staging table

All files can be loaded into the reporting database with SQL Loader. At this point, it isn’t necessary to know anything about the actual structure of the XML files. We just need to create a staging table that can contain XML data:  

CREATE TABLE STAGE_XMLDATA
(filename varchar2(120)
, XMLDATA SYS.XMLTYPE )
;

The script for loading XML files has several steps. We use a bash shell script in combination with SQL Loader and SQL Plus.

The basic script:

  • makes a list of all XML files in the LOAD directory
  • creates a SQL Loader control file 
  • starts SQL Loader to load all XML files in the staging table

# ##################################
# specify directories
# ##################################
 
HOMEDIR=/home/oracle/scripts/load_data
LOAD_DIR=/u01/xmldir
INBOX=/u01/metrics/attachments
 
# ##################################
# specify file names
# ##################################
 
LST_FILE=${HOMEDIR}/tmp/load_${TIMESTAMP}.txt
CTL_FILE=${HOMEDIR}/tmp/load_${TIMESTAMP}.ctl
LOG_FILE=${HOMEDIR}/log/load_${TIMESTAMP}.log
BAD_FILE=${HOMEDIR}/log/load_${TIMESTAMP}.bad
SQL_LOG_FILE=${HOMEDIR}/log/load_${TIMESTAMP}.sql_log
 
# ##################################
# specify DATABASE CONNECT string
# ##################################
 
CONNECT_STRING="qmetric/secretpasswd@myserver:1521/METRICDB"
 
# ##################################
# FIRST we need TO collect the files INTO one directory,
# AND CREATE a list containing ALL the file names.
# ##################################
 
find ${LOAD_DIR} -TYPE f -name ‘*.xml’  | sort  > ${LST_FILE}
 
# ##################################
# CREATE an SQL loader control file
# ##################################
 
cat <<EOF > ${CTL_FILE}
LOAD DATA
INFILE '${LST_FILE}'
REPLACE
INTO TABLE STAGE_XMLDATA
(
filename filler CHAR(120),
XMLDATA  lobfile(filename) TERMINATED BY eof
)
EOF
 
# #############################
# LOAD files WITH SQL loader
# #############################
 
sqlldr userid=${CONNECT_STRING} control=${CTL_FILE}  log=${LOG_FILE} bad=${BAD_FILE}

 

Using SQL Loader, the XML files are loaded into the STAGE_XMLDATA table. Each XML file results in one row.

To see the XML again, simply query the table: 

SELECT XMLDATA FROM STAGE_XMLDATA
/
 
XMLDATA
-----------------------------------------------------------------------
<DATABASE>
  <DOCNAME>SCHEMAS</DOCNAME>
  <DBID>1203838370</DBID>
  <DBNAME>XMIXP</DBNAME>
  <INSTANCE_NAME>XMIXP</INSTANCE_NAME>
  <HOST_NAME>orclmetricdbhst</HOST_NAME>
  <DB_UNIQUE_NAME>XMIXP</DB_UNIQUE_NAME>
  <DATE_TIME>14-10-2021-10:40:00</DATE_TIME>
  <SCHEMAS>
    <SCHEMA>
      <OWNER>APEX_200100</OWNER>
      <BYTES>246415360</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>DBSNMP</OWNER>
      <BYTES>196608</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>FLOWS_FILES</OWNER>
      <BYTES>1638400</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>GSMADMIN_INTERNAL</OWNER>
      <BYTES>950272</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>METRIC</OWNER>
      <BYTES>106233856</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>ORDS_METADATA</OWNER>
      <BYTES>1048576</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>OUTLN</OWNER>
      <BYTES>155648</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>QMETRIC</OWNER>
      <BYTES>13959168</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>TSTUSR</OWNER>
      <BYTES>2416640</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>SYS</OWNER>
      <BYTES>1555423232</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>SYSTEM</OWNER>
      <BYTES>14376960</BYTES>
    </SCHEMA>
    <SCHEMA>
      <OWNER>XDB</OWNER>
      <BYTES>613220352</BYTES>
    </SCHEMA>
  </SCHEMAS>
</DATABASE>
 
1 ROW selected.

Step 4: convert XML back to normal data

All XML files are now loaded into the STAGE_XMLDATA table. Our next step is to convert the data into a normal format, so it can be stored into the history tables. These contain the daily size of all schemas. 

In the first step, we included a document name in each XML file. This name is used to recognize the structure for each file, as there can be multiple scripts to create XML files. For each unique document, we now create an SQL script to move data from the staging table to the history tables. In this case, we query rows with docname = 'SCHEMAS'

The actual conversion from XML to a normal table format is done using the XMLTable function. The column names here match with the XML tags used when we created the XML files. At this point, you also need to do a data type conversion for all the columns you want to store in a number or date format. For readability, I left out some of the header columns.

SELECT DBID
,      DBNAME
,      to_date(DATE_TIME,'DD-MM-YYYY-HH24:MI:SS')  DATE_TIME
,      OWNER
,      to_number(BYTES)  BYTES
FROM   stage_xmldata x,
           XMLTABLE('/DATABASE'
             PASSING x.xmldata
             COLUMNS
               DBID           VARCHAR2(20)  PATH 'DBID',
               DBNAME         VARCHAR2(20)  PATH 'DBNAME',
               DATE_TIME      VARCHAR2(20)  PATH 'DATE_TIME',
               DOCNAME        VARCHAR2(20)  PATH 'DOCNAME'
             )
,          XMLTABLE('/DATABASE/SCHEMAS/SCHEMA'
             PASSING x.xmldata
             COLUMNS
               OWNER           VARCHAR2(20)  PATH 'OWNER',
               BYTES           VARCHAR2(40)  PATH 'BYTES'
             )
WHERE docname = 'SCHEMAS'
ORDER BY DATE_TIME
/
 
DBID                 DBNAME  DATE_TIME OWNER                     BYTES
-------------------- ------- --------- -------------------- ----------
1203838370           XMIXP   14-OCT-21 APEX_200100           246415360
1203838370           XMIXP   14-OCT-21 DBSNMP                   196608
1203838370           XMIXP   14-OCT-21 FLOWS_FILES             1638400
1203838370           XMIXP   14-OCT-21 GSMADMIN_INTERNAL        950272
1203838370           XMIXP   14-OCT-21 METRIC                106233856
1203838370           XMIXP   14-OCT-21 XDB                   613220352
1203838370           XMIXP   14-OCT-21 OUTLN                    155648
1203838370           XMIXP   14-OCT-21 QMETRIC                13959168
1203838370           XMIXP   14-OCT-21 TSTUSR                  2416640
1203838370           XMIXP   14-OCT-21 SYS                  1555423232
1203838370           XMIXP   14-OCT-21 SYSTEM                 14376960
1203838370           XMIXP   14-OCT-21 ORDS_METADATA           1048576
 
12 ROWS selected.

The output from this query can be used to fill the history tables:

insert into history_schemas
select …..

Other examples

As promised, here are a few other examples of creating XML files from the database or the operating system.

How to create an XML element for all physical IO:

SELECT xmlagg (xmlelement
        (PHYSICAL_IO
        , xmlforest
          ( sn.name     AS "NAME"
          , stat.value  AS "VALUE"
          )
        )) XML
FROM  v$statname  sn
,     v$sysstat   stat
WHERE sn.name LIKE 'physical%'
AND   sn.STATISTIC# = stat.STATISTIC#
/
 
XML
----------------------------------------
<PHYSICAL_IO><NAME>physical READ total I
O requests</NAME><VALUE>34392312</VALUE>
</PHYSICAL_IO><PHYSICAL_IO><NAME>physica
l READ total multi block requests</NAME>
<VALUE>1107651</VALUE></PHYSICAL_IO><PHY
SICAL_IO><NAME>physical READ requests op
timized</NAME><VALUE>0</VALUE></PHYSICAL
_IO><PHYSICAL_IO><NAME>physical READ tot
al bytes optimized</NAME><VALUE>0</VALUE

How to create an XML element for the run times of database jobs:

SELECT  xmlagg ( xmlelement
           ( JOBRUN
           , xmlforest
            ( trunc(ACTUAL_START_DATE)  start_date
            , owner
            , job_name
            , run_duration
            )
           )) JOBRUNS
FROM   dba_scheduler_job_run_details
WHERE trunc(ACTUAL_START_DATE) = trunc(sysdate - 1)
/
 
JOBRUNS
----------------------------------------
<JOBRUN><START_DATE>2021-10-14</START_DA
TE><OWNER>ORACLE_OCM</OWNER><JOB_NAME>MG
MT_CONFIG_JOB_1</JOB_NAME><RUN_DURATION>
+000 00:00:00</RUN_DURATION></JOBRUN><JO
BRUN><START_DATE>2021-10-14</START_DATE>
<OWNER>SYS</OWNER><JOB_NAME>ORA$AT_OS_OP
T_SY_81711</JOB_NAME><RUN_DURATION>+000
00:09:10</RUN_DURATION></JOBRUN><JOBRUN>
<START_DATE>2021-10-14</START_DATE><OWNE
[....]

The following example is a script that can be used to create an XML file about server statistics. In this case, swapping information. 

SWAP_IN=`vmstat -s |grep "pages swapped in" |sed 's/pages swapped in//' |sed 's/ //g'`
SWAP_OUT=`vmstat -s|grep "pages swapped out"|sed 's/pages swapped out//'|sed 's/ //g'`
TIME=`date +%H:%M:%S`
DATE=`date +%d-%m-%Y`
 
#create XML file
 
cat <<EOF > ${XMLFILE}
<METRIC>
<DOCNAME>${DOCNAME}</DOCNAME>
<DATE>${DATE}</DATE>
<TIME>${TIME}</TIME>
<HOST_NAME>${SERVER}</HOST_NAME>
<SWAPDATA>
<SWAP_IN>${SWAP_IN}</SWAP_IN>
<SWAP_OUT>${SWAP_OUT}</SWAP_OUT>
</SWAPDATA>
</METRIC>
EOF

Kun je de vacature die je zoekt niet vinden?

Maak een Jobalert aan en ontvang een melding per mail
wanneer er nieuwe vacatures zijn!