15-11-2023
17 maart 2023
How do you create XML files from an Oracle database? And how do you collect and load these files into a reporting database? The answers don’t have to be complicated. In this blog, I will show you how I developed a system that creates, collects, and loads XML files from an Oracle database to a reporting database.
This system mainly focuses on collecting data about Oracle databases, but can also be used to store data from other sources, such as operating systems. In addition, it can be used to report about trends in size and load.
Once a day, this system can collect data about “anything” you want to report about. For example:
- sizes of tablespaces and schemas
- all kinds of performance information collected from dictionary views like v$sysstat
- run times from database jobs and backups
The information is collected once a day and stored in XML files. These files are transported to a server with our reporting database, where all the information about various objects is collected.
But let’s look at the basics first.
Step 1: creating XML files
Creating basic XML files from an Oracle database is actually quite simple when you use the SQL/XML standard functions. To illustrate this, I will write a query to create an XML file with information about the size of the database schemas. At the end of this blog, I’ll give you a few other examples.
The XML file contains 2 parts. First, we have (header) XML elements with general information about where the file comes from, like the database name, server name and timestamp. Next, we add XML elements about the actual data (in this case, the schemas). In other words: the owner and size for each schema.
For the schema sizes, we collect the extent sizes for dabases extents:
SELECT e.owner AS "OWNER"
, SUM(e.bytes) AS "BYTES"
FROM dba_extents e
GROUP BY e.owner
OWNER BYTES
------------------ ----------
SYS 1692065792
SYSTEM 14376960
DBSNMP 196608
METRIC 125042688
FLOWS_FILES 1638400
ORDS_METADATA 1048576
TSTUSR 2416640
APEX_200100 246415360
GSMADMIN_INTERNAL 950272
QMETRIC 13959168
OUTLN 155648
XDB 613220352
12 ROWS selected.
Next step: translate each row of the result into an XML element. To do this, we use the functions xmlelement and xmlforest.
SELECT xmlelement
( SCHEMA
, xmlforest
( e.owner AS "OWNER"
, SUM(e.bytes) AS "BYTES"
)
) xmldata
FROM dba_extents e
GROUP BY e.owner
/
XMLDATA
-------------------------------------------------------------------------
<SCHEMA><OWNER>SYS</OWNER><BYTES>1692065792</BYTES></SCHEMA>
<SCHEMA><OWNER>SYSTEM</OWNER><BYTES>14376960</BYTES></SCHEMA>
<SCHEMA><OWNER>DBSNMP</OWNER><BYTES>196608</BYTES></SCHEMA>
<SCHEMA><OWNER>METRIC</OWNER><BYTES>125042688</BYTES></SCHEMA>
<SCHEMA><OWNER>FLOWS_FILES</OWNER><BYTES>1638400</BYTES></SCHEMA>
<SCHEMA><OWNER>ORDS_METADATA</OWNER><BYTES>1048576</BYTES></SCHEMA>
<SCHEMA><OWNER>TSTUSR</OWNER><BYTES>2416640</BYTES></SCHEMA>
<SCHEMA><OWNER>APEX_200100</OWNER><BYTES>246415360</BYTES></SCHEMA>
<SCHEMA><OWNER>GSMADMIN_INTERNAL</OWNER><BYTES>950272</BYTES></SCHEMA>
<SCHEMA><OWNER>QMETRIC</OWNER><BYTES>13959168</BYTES></SCHEMA>
<SCHEMA><OWNER>OUTLN</OWNER><BYTES>155648</BYTES></SCHEMA>
<SCHEMA><OWNER>XDB</OWNER><BYTES>613220352</BYTES></SCHEMA>
12 ROWS selected.
Now, we have 12 XML elements, and each is about a schema. Each element has two properties: the owner and the number of bytes. Using XMLAGG, we join these 12 elements into one:
SET feedback 1
col XMLDATA format a60
SELECT xmlagg (xmlelement
( SCHEMA
, xmlforest
( e.owner AS "OWNER"
, SUM(e.bytes) AS "BYTES"
)
)) xmldata
FROM dba_extents e
GROUP BY e.owner
/
XMLDATA
------------------------------------------------------------
<SCHEMA><OWNER>SYS</OWNER><BYTES>1692098560</BYTES></SCHEMA>
<SCHEMA><OWNER>SYSTEM</OWNER><BYTES>14376960</BYTES></SCHEMA
><SCHEMA><OWNER>DBSNMP</OWNER><BYTES>196608</BYTES></SCHEMA>
<SCHEMA><OWNER>METRIC</OWNER><BYTES>125042688</BYTES></SCHEM
A><SCHEMA><OWNER>FLOWS_FILES</OWNER><BYTES>1638400</BYTES></
SCHEMA><SCHEMA><OWNER>ORDS_METADATA</OWNER><BYTES>1048576</B
YTES></SCHEMA><SCHEMA><OWNER>TSTUSR</OWNER><BYTES>2416640</B
YTES></SCHEMA><SCHEMA><OWNER>APEX_200100</OWNER><BYTES>24641
5360</BYTES></SCHEMA><SCHEMA><OWNER>GSMADMIN_INTERNAL</OWNER
><BYTES>950272</BYTES></SCHEMA><SCHEMA><OWNER>QMETRIC</OWNER
><BYTES>13959168</BYTES></SCHEMA><SCHEMA><OWNER>OUTLN</OWNER
><BYTES>155648</BYTES></SCHEMA><SCHEMA><OWNER>XDB</OWNER><BY
TES>613220352</BYTES></SCHEMA>
1 ROW selected.
The XML file should also contain header information about the database and instance, so let’s write a query to achieve this.
SELECT 'SCHEMAS' docname
, dbid
, name
, instance_name
, host_name
, db_unique_name
, to_char(sysdate,'DD-MM-YYYY-HH24:MI:SS') DATE_TIME
FROM v$database
, v$instance
/
You can also add extra columns for the header information when needed. If you work with a container database for example, you could add the name of the pluggable database. Make sure you have enough information in the header to identify a unique database. Also include a document name that is used to recognize the structure of the XML when we load it into our monitoring database later.
The header information query is also rewritten in XML format and combined with the XML element SCHEMAS. It’s based on the query for the schema sizes.
SELECT xmlelement("DATABASE",
xmlelement("DOCNAME",'SCHEMAS'),
xmlelement("DBID",dbid),
xmlelement("DBNAME",name),
xmlelement("INSTANCE_NAME",instance_name),
xmlelement("HOST_NAME",host_name),
xmlelement("DB_UNIQUE_NAME",db_unique_name), xmlelement("DATE_TIME",to_char(sysdate,'DD-MM-YYYY-HH24:MI:SS')),
xmlelement("SCHEMAS",
( SELECT xmlagg( xmlelement
( SCHEMA
, xmlforest
( e.owner AS "OWNER"
, SUM(e.bytes) AS "BYTES"
)
))
FROM dba_extents e
GROUP BY e.owner
)
) AS schema
) AS xmldata
FROM v$database
, v$instance
/
When executed, the query results in a single XML document.
XMLDATA
------------------------------------------------------------
<DATABASE><DOCNAME>SCHEMAS</DOCNAME><DBID>1203838370</DBID><
DBNAME>XMIXP</DBNAME><INSTANCE_NAME>XMIXP</INSTANCE_NAME><HO
ST_NAME>orclmetricdbhst</HOST_NAME><DB_UNIQUE_NAME>XMIXP</DB
_UNIQUE_NAME><DATE_TIME>14-10-2021-15:08:24</DATE_TIME><SCHE
MAS><SCHEMA><OWNER>APEX_200100</OWNER><BYTES>246415360</BYTE
S></SCHEMA><SCHEMA><OWNER>DBSNMP</OWNER><BYTES>196608</BYTES
></SCHEMA><SCHEMA><OWNER>FLOWS_FILES</OWNER><BYTES>1638400</
BYTES></SCHEMA><SCHEMA><OWNER>GSMADMIN_INTERNAL</OWNER><BYTE
S>950272</BYTES></SCHEMA><SCHEMA><OWNER>METRIC</OWNER><BYTES
>125042688</BYTES></SCHEMA><SCHEMA><OWNER>ORDS_METADATA</OWN
ER><BYTES>1048576</BYTES></SCHEMA><SCHEMA><OWNER>OUTLN</OWNE
R><BYTES>155648</BYTES></SCHEMA><SCHEMA><OWNER>QMETRIC</OWNE
R><BYTES>13959168</BYTES></SCHEMA><SCHEMA><OWNER>TSTUSR</OWN
ER><BYTES>2416640</BYTES></SCHEMA><SCHEMA><OWNER>SYS</OWNER>
<BYTES>1692098560</BYTES></SCHEMA><SCHEMA><OWNER>SYSTEM</OWN
ER><BYTES>14376960</BYTES></SCHEMA><SCHEMA><OWNER>XDB</OWNER
><BYTES>613220352</BYTES></SCHEMA></SCHEMAS></DATABASE>
1 ROW selected.
Schedule a daily script for the query above. If we use the sql spool command with a unique output name and the “.xml” extension, we should have an XML file with the schema sizes. Make sure that each file has a unique name. This can be done by including the timestamp, object name and document name in the file name.
When creating the file, it is important that the complete XML is on one line. You cannot have line breaks in the file, so you must use a large line-size in SQL Plus or Linux commands like sed or xargs to remove any line breaks.
Step 2: collect the XML files
All XML files should be loaded into an Oracle database. To do this, we need to collect the XML files daily.
Copy the XML file to the server where we collect all our information. Depending on your situation, you can use scp, email, FTP or any other protocol to collect the files. All XML files for all monitored databases can be stored in one directory.
Step 3: read XML into the database staging table
All files can be loaded into the reporting database with SQL Loader. At this point, it isn’t necessary to know anything about the actual structure of the XML files. We just need to create a staging table that can contain XML data:
CREATE TABLE STAGE_XMLDATA
(filename varchar2(120)
, XMLDATA SYS.XMLTYPE )
;
The script for loading XML files has several steps. We use a bash shell script in combination with SQL Loader and SQL Plus.
The basic script:
- makes a list of all XML files in the LOAD directory
- creates a SQL Loader control file
- starts SQL Loader to load all XML files in the staging table
# ##################################
# specify directories
# ##################################
HOMEDIR=/home/oracle/scripts/load_data
LOAD_DIR=/u01/xmldir
INBOX=/u01/metrics/attachments
# ##################################
# specify file names
# ##################################
LST_FILE=${HOMEDIR}/tmp/load_${TIMESTAMP}.txt
CTL_FILE=${HOMEDIR}/tmp/load_${TIMESTAMP}.ctl
LOG_FILE=${HOMEDIR}/log/load_${TIMESTAMP}.log
BAD_FILE=${HOMEDIR}/log/load_${TIMESTAMP}.bad
SQL_LOG_FILE=${HOMEDIR}/log/load_${TIMESTAMP}.sql_log
# ##################################
# specify DATABASE CONNECT string
# ##################################
CONNECT_STRING="qmetric/secretpasswd@myserver:1521/METRICDB"
# ##################################
# FIRST we need TO collect the files INTO one directory,
# AND CREATE a list containing ALL the file names.
# ##################################
find ${LOAD_DIR} -TYPE f -name ‘*.xml’ | sort > ${LST_FILE}
# ##################################
# CREATE an SQL loader control file
# ##################################
cat <<EOF > ${CTL_FILE}
LOAD DATA
INFILE '${LST_FILE}'
REPLACE
INTO TABLE STAGE_XMLDATA
(
filename filler CHAR(120),
XMLDATA lobfile(filename) TERMINATED BY eof
)
EOF
# #############################
# LOAD files WITH SQL loader
# #############################
sqlldr userid=${CONNECT_STRING} control=${CTL_FILE} log=${LOG_FILE} bad=${BAD_FILE}
Using SQL Loader, the XML files are loaded into the STAGE_XMLDATA table. Each XML file results in one row.
To see the XML again, simply query the table:
SELECT XMLDATA FROM STAGE_XMLDATA
/
XMLDATA
-----------------------------------------------------------------------
<DATABASE>
<DOCNAME>SCHEMAS</DOCNAME>
<DBID>1203838370</DBID>
<DBNAME>XMIXP</DBNAME>
<INSTANCE_NAME>XMIXP</INSTANCE_NAME>
<HOST_NAME>orclmetricdbhst</HOST_NAME>
<DB_UNIQUE_NAME>XMIXP</DB_UNIQUE_NAME>
<DATE_TIME>14-10-2021-10:40:00</DATE_TIME>
<SCHEMAS>
<SCHEMA>
<OWNER>APEX_200100</OWNER>
<BYTES>246415360</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>DBSNMP</OWNER>
<BYTES>196608</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>FLOWS_FILES</OWNER>
<BYTES>1638400</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>GSMADMIN_INTERNAL</OWNER>
<BYTES>950272</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>METRIC</OWNER>
<BYTES>106233856</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>ORDS_METADATA</OWNER>
<BYTES>1048576</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>OUTLN</OWNER>
<BYTES>155648</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>QMETRIC</OWNER>
<BYTES>13959168</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>TSTUSR</OWNER>
<BYTES>2416640</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>SYS</OWNER>
<BYTES>1555423232</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>SYSTEM</OWNER>
<BYTES>14376960</BYTES>
</SCHEMA>
<SCHEMA>
<OWNER>XDB</OWNER>
<BYTES>613220352</BYTES>
</SCHEMA>
</SCHEMAS>
</DATABASE>
1 ROW selected.
Step 4: convert XML back to normal data
All XML files are now loaded into the STAGE_XMLDATA table. Our next step is to convert the data into a normal format, so it can be stored into the history tables. These contain the daily size of all schemas.
In the first step, we included a document name in each XML file. This name is used to recognize the structure for each file, as there can be multiple scripts to create XML files. For each unique document, we now create an SQL script to move data from the staging table to the history tables. In this case, we query rows with docname = 'SCHEMAS'
The actual conversion from XML to a normal table format is done using the XMLTable function. The column names here match with the XML tags used when we created the XML files. At this point, you also need to do a data type conversion for all the columns you want to store in a number or date format. For readability, I left out some of the header columns.
SELECT DBID
, DBNAME
, to_date(DATE_TIME,'DD-MM-YYYY-HH24:MI:SS') DATE_TIME
, OWNER
, to_number(BYTES) BYTES
FROM stage_xmldata x,
XMLTABLE('/DATABASE'
PASSING x.xmldata
COLUMNS
DBID VARCHAR2(20) PATH 'DBID',
DBNAME VARCHAR2(20) PATH 'DBNAME',
DATE_TIME VARCHAR2(20) PATH 'DATE_TIME',
DOCNAME VARCHAR2(20) PATH 'DOCNAME'
)
, XMLTABLE('/DATABASE/SCHEMAS/SCHEMA'
PASSING x.xmldata
COLUMNS
OWNER VARCHAR2(20) PATH 'OWNER',
BYTES VARCHAR2(40) PATH 'BYTES'
)
WHERE docname = 'SCHEMAS'
ORDER BY DATE_TIME
/
DBID DBNAME DATE_TIME OWNER BYTES
-------------------- ------- --------- -------------------- ----------
1203838370 XMIXP 14-OCT-21 APEX_200100 246415360
1203838370 XMIXP 14-OCT-21 DBSNMP 196608
1203838370 XMIXP 14-OCT-21 FLOWS_FILES 1638400
1203838370 XMIXP 14-OCT-21 GSMADMIN_INTERNAL 950272
1203838370 XMIXP 14-OCT-21 METRIC 106233856
1203838370 XMIXP 14-OCT-21 XDB 613220352
1203838370 XMIXP 14-OCT-21 OUTLN 155648
1203838370 XMIXP 14-OCT-21 QMETRIC 13959168
1203838370 XMIXP 14-OCT-21 TSTUSR 2416640
1203838370 XMIXP 14-OCT-21 SYS 1555423232
1203838370 XMIXP 14-OCT-21 SYSTEM 14376960
1203838370 XMIXP 14-OCT-21 ORDS_METADATA 1048576
12 ROWS selected.
The output from this query can be used to fill the history tables:
insert into history_schemas
select …..
Other examples
As promised, here are a few other examples of creating XML files from the database or the operating system.
How to create an XML element for all physical IO:
SELECT xmlagg (xmlelement
(PHYSICAL_IO
, xmlforest
( sn.name AS "NAME"
, stat.value AS "VALUE"
)
)) XML
FROM v$statname sn
, v$sysstat stat
WHERE sn.name LIKE 'physical%'
AND sn.STATISTIC# = stat.STATISTIC#
/
XML
----------------------------------------
<PHYSICAL_IO><NAME>physical READ total I
O requests</NAME><VALUE>34392312</VALUE>
</PHYSICAL_IO><PHYSICAL_IO><NAME>physica
l READ total multi block requests</NAME>
<VALUE>1107651</VALUE></PHYSICAL_IO><PHY
SICAL_IO><NAME>physical READ requests op
timized</NAME><VALUE>0</VALUE></PHYSICAL
_IO><PHYSICAL_IO><NAME>physical READ tot
al bytes optimized</NAME><VALUE>0</VALUE
How to create an XML element for the run times of database jobs:
SELECT xmlagg ( xmlelement
( JOBRUN
, xmlforest
( trunc(ACTUAL_START_DATE) start_date
, owner
, job_name
, run_duration
)
)) JOBRUNS
FROM dba_scheduler_job_run_details
WHERE trunc(ACTUAL_START_DATE) = trunc(sysdate - 1)
/
JOBRUNS
----------------------------------------
<JOBRUN><START_DATE>2021-10-14</START_DA
TE><OWNER>ORACLE_OCM</OWNER><JOB_NAME>MG
MT_CONFIG_JOB_1</JOB_NAME><RUN_DURATION>
+000 00:00:00</RUN_DURATION></JOBRUN><JO
BRUN><START_DATE>2021-10-14</START_DATE>
<OWNER>SYS</OWNER><JOB_NAME>ORA$AT_OS_OP
T_SY_81711</JOB_NAME><RUN_DURATION>+000
00:09:10</RUN_DURATION></JOBRUN><JOBRUN>
<START_DATE>2021-10-14</START_DATE><OWNE
[....]
The following example is a script that can be used to create an XML file about server statistics. In this case, swapping information.
SWAP_IN=`vmstat -s |grep "pages swapped in" |sed 's/pages swapped in//' |sed 's/ //g'`
SWAP_OUT=`vmstat -s|grep "pages swapped out"|sed 's/pages swapped out//'|sed 's/ //g'`
TIME=`date +%H:%M:%S`
DATE=`date +%d-%m-%Y`
#create XML file
cat <<EOF > ${XMLFILE}
<METRIC>
<DOCNAME>${DOCNAME}</DOCNAME>
<DATE>${DATE}</DATE>
<TIME>${TIME}</TIME>
<HOST_NAME>${SERVER}</HOST_NAME>
<SWAPDATA>
<SWAP_IN>${SWAP_IN}</SWAP_IN>
<SWAP_OUT>${SWAP_OUT}</SWAP_OUT>
</SWAPDATA>
</METRIC>
EOF