One of the most common tasks is loading data into a database, specifically the Oracle database in our case. There are many different approaches to loading or importing data into the Oracle database. And when we are talking about CSV file data format, which is separated by a comma, Oracle has an excellent tool to do this called SQL*Loader.
SQL*Loader has more features and capabilities to do this, and now we are going to list down the steps to load or import data that resides in CSV files into the Oracle database using SQL*Loader.
Step 1 : Prepare the Data to Import
Consider that you have an external system that is extracted some data into a CSV file comma separated format. You have it somewhere on the server or locally in your machine doesn’t matter for now. Still, we have a dummy dataset to demonstrate our case, so let’s download this sample CSV file data that we want to import into the Oracle database from here.
Case Number,NASA Center,sra_date,SRA Final,NTR Title
ARC-14136-1,ARC,10/19/2001,Academic Worldwide,Adaptive Relevance-Learning Software Component (ARNIE)
ARC-14293-1,ARC,09/19/2005,Open Source,Genetic Graphs (JavaGenes)
"ARC-14297-1,ARC,11/06/2003,General US,""Automated Domain Decomposition Software, PEGASUS Version 5.0"""
ARC-14379-1,ARC,03/27/2002,General US,Man-machine Integration Design And Analysis System (MIDAS)
ARC-14400-1,ARC,01/29/2001,General US,PLOT3D Version 4.0
ARC-14467-1A,ARC,01/24/2014,Open Source,CFD Utility Software Library (previously Called Aerodynamics Division Software Library ARC-14467-1)
"ARC-14487-1,ARC,02/14/2005,Open Source,""CAPTools-based Automatic Parallelizer Using OpenMP (or """"CAPO"""")"
We have
Step 2 : Create a Database Table
In this step, we’ll create a new database table with columns in our CSV file.
CREATE TABLE XX_DUMMY_DATA_CSV
(
CASE_NUMBER VARCHAR2 (4000)
,NASA_CENTER VARCHAR2 (4000)
,SRA_DATE DATE
,SRA_FINAL VARCHAR2 (4000)
,NTR_TITLE VARCHAR2 (4000)
);
Step 3 : Create a Control File
To use SQL*Loader, we have to create a control file to list the loader’s instructions to upload our data into the Oracle database. We will create this control file with “loaddata.ctl”. observe the extension “.ctl”.
options (
skip=1,
PARALLEL=true,
DIRECT=true,
MULTITHREADING=false
)
load data
infile 'dataset.csv'
APPEND INTO TABLE XX_DUMMY_DATA_CSV
FIELDS TERMINATED BY ","
optionally enclosed by '"'
(
CASE_NUMBER,
NASA_CENTER,
SRA_DATE DATE "mm/dd/YYYY HH24:MI",
SRA_FINAL,
NTR_TITLE
)
Here is the clarification of the keywords used to create this file
SKIP=1 : Skip the first line of CSV, as it contains a header.
PARALLEL=false : Specifies that loads can operate in multiple concurrent sessions to load data into the same table.
DIRECT=true : A value of true specifies a direct path load. A value of false specifies a conventional path load.
MULTITHREADING=false : The default is true on multiple-CPU systems and false on single-CPU systems. This parameter is available only for direct path loads. If you have multiple CPU systems, this parameter will be enabled (set to true) by default, and be aware that multithreading functionality is operating system-dependent. Not all operating systems support multithreading.
DATE “mm/dd/YYYY” : The date format that match CSV file data column.
APPEND : If the data already exists in the table, Oracle SQL*Loader will add the new rows; otherwise, the rows will be inserted. You must have the “SELECT” privilege on the target table for using APPEND.
TRUNCATE | REPLACE : TRUNCATE is used to truncate the table before loading the new data to it. It is used for better performance compared to REPLACE.
REPLACE replaces the table, not individual rows or a set of rows. SQL*Loader will not update the existing rows. Moreover, REPLACE using delete command, so any delete trigger will fire unless DELETE CASCADE has been specified for the table itself.
Caution:
If we used TRUNCATE or REPLACE, all rows will be deleted from the table, and a COMMIT statement will be issued; hence, we can’t recover the data before using those clauses unless it was saved before.
Step 4 : Load the data using sqlldr command
Now it’s time to run the sqlldr command to upload the data from the CSV file to the Oracle database table that we have created earlier.
sqlldr hr/hr@localhost:1521/orcl.localdomain control='loaddata.ctl' log='Results.log'
Since my operating system is windows, I have a folder on drive “D” called “dataset“, so I went to that folder and executed the sqlldr command above from that place only.
After executing the above command, you should get the following message
E:\dataset>sqlldr hr/hr@localhost:1521/orcl.localdomain control='loaddata.ctl' l
og='Results.log'
SQL*Loader: Release 11.1.0.7.0 - Production on Sat Feb 19 00:39:51 2022
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Load completed - logical record count 7.
As you can see, seven logical records have been found inside the CSV file; however, only five have been inserted successfully into our table, and we’ll see why two records are missing.
Step 5 : Check the Database Table
In this step, we will check our database table by executing a simple select statement as below.
SELECT * FROM XX_DUMMY_DATA_CSV;
and the result of this query would be
CASE_NUMBER | NASA_CENTER | SRA_DATE | SRA_FINAL | NTR_TITLE |
---|---|---|---|---|
ARC-14136-1 | ARC | 19/10/2001 | Academic Worldwide | Adaptive Relevance-Learning Software Component (ARNIE) |
ARC-14293-1 | ARC | 19/09/2005 | Open Source | Genetic Graphs (JavaGenes) |
ARC-14379-1 | ARC | 27/03/2002 | General US | Man-machine Integration Design And Analysis System (MIDAS) |
ARC-14400-1 | ARC | 29/01/2001 | General US | PLOT3D Version 4.0 |
ARC-14467-1A | ARC | 24/01/2014 | Open Source | CFD Utility Software Library (previously Called Aerodynamics Division Software Library ARC-14467-1) |
Step 6 : Check the Log File
Oracle SQL*Loader automatically created a log file called “Results.log” to log succeeded or failure records with the reason in case of failure to give you more detail to fix the data issue or take any other appropriate action. And here, you can find the content of this file.
SQL*Loader: Release 11.1.0.7.0 - Production on Sat Feb 19 02:43:38 2022
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Control File: loaddata.ctl
Data File: dataset.csv
Bad File: dataset.bad
Discard File: none specified
(Allow all discards)
Number to load: ALL
Number to skip: 1
Errors allowed: 50
Continuation: none specified
Path used: Direct
Table XX_DUMMY_DATA_CSV, loaded from every logical record.
Insert option in effect for this table: TRUNCATE
Column Name Position Len Term Encl Datatype
------------------------------ ---------- ----- ---- ---- ---------------------
CASE_NUMBER FIRST * , O(") CHARACTER
NASA_CENTER NEXT * , O(") CHARACTER
SRA_DATE NEXT * , O(") DATE mm/dd/YYYY HH24:MI
SRA_FINAL NEXT * , O(") CHARACTER
NTR_TITLE NEXT * , O(") CHARACTER
Record 3: Rejected - Error on table XX_DUMMY_DATA_CSV, column NASA_CENTER.
Column not found before end of logical record (use TRAILING NULLCOLS)
Record 7: Rejected - Error on table XX_DUMMY_DATA_CSV, column NASA_CENTER.
Column not found before end of logical record (use TRAILING NULLCOLS)
Table XX_DUMMY_DATA_CSV:
5 Rows successfully loaded.
2 Rows not loaded due to data errors.
0 Rows not loaded because all WHEN clauses were failed.
0 Rows not loaded because all fields were null.
Date cache:
Max Size: 1000
Entries : 5
Hits : 0
Misses : 0
Bind array size not used in direct path.
Column array rows : 5000
Stream buffer bytes: 256000
Read buffer bytes: 1048576
Total logical records skipped: 1
Total logical records read: 7
Total logical records rejected: 2
Total logical records discarded: 0
Direct path multithreading optimization is disabled
Run began on Sat Feb 19 02:43:38 2022
Run ended on Sat Feb 19 02:43:39 2022
Elapsed time was: 00:00:00.10
CPU time was: 00:00:00.00
As you can see, the total logical records are 7, but out of 7, only five have been loaded or inserted successfully; moreover, two records have been rejected. And if you observe the middle of the file, you will find why those records have been rejected or not loaded.
Record 3: Rejected – Error on table XX_DUMMY_DATA_CSV, column NASA_CENTER. Column not found before the end of the logical record (use TRAILING NULLCOLS) Record 7: Rejected – Error on table XX_DUMMY_DATA_CSV, column NASA_CENTER. Column not found before the end of the logical record (use TRAILING NULLCOLS).
Step 7 : Check bad File
Oracle created another file to log the bad records that are not loaded to let you know them in particular. And here is the content of this file.
"ARC-14297-1,ARC,11/06/2003,General US,""Automated Domain Decomposition Software, PEGASUS Version 5.0"""
"ARC-14487-1,ARC,02/14/2005,Open Source,""CAPTools-based Automatic Parallelizer Using OpenMP (or """"CAPO"""")"
If you observe these records, you will find an extra double quote is there while are not needed; maybe this is one of the reasons why SQL*Loader failed to load those records.
Conclusion
We can use the SQL*Loader tool to load any CSV file, whatever the structure of the file, into the Oracle database by using a set of features coming with this tool. Also, there are many more options available to use in this tool, like loading conditional data.
There is a complete guide that Oracle has created in this link if you want to deep dive into a lot more details of the capabilities of this impressive tool.
Eventually, If you have any questions, please don’t hesitate to ask them in the comments section below. or ask your question from up right corner and an expert will reply your inquery.
Hopefully, it was clear and concise.
If you have an addition to this guide to add extra value for the readers, kindly let me know.