Do you set TdDataAdapter.UpdateBatchSize property? if yes, did it enhance the overall performance?
BTW, you can use TPT API but you must use C++/CLI.
Another option is to partition the data into multiple DataTables; use one thread per connection per DataTable to load the data into empty staging tables. Finally Insert-Select into target table.
I wrote "empty staging tables".
That is each thread/connection/DataTable will load into a separate empty table.
For example lets assume that you want to load 1,000,000 rows. You can partition the rows across 30 threads. Each thread will read X rows into a DataTable, establish a connection to the Teradata database, create a new table, and insert the rows into the table.
Later you can use a single connection to INSERT-SELECT from the 30 staging tables into the target table.
I wrote "C++/CLI". In this case CLI refers to "Common Language Infrastructure".
TPT API is a layer on top of Call-Level-Interface. It should greatly simplify the process of creating custom Extract and Load applications. See http://www.info.teradata.com/eDownload.cfm? itemid=071790014.
Small snippet from TPT API manual:
The Teradata Parallel Transporter Application Programming Interface (Teradata PT API) is a
set of application programming interfaces used to load and export data to and from Teradata
systems. Teradata PT API enables an application to access the Teradata Database using
proprietary Teradata load and export protocols (Teradata FastLoad, Teradata FastExport, and
Teradata MultiLoad). Unlike the Teradata utilities and Teradata Parallel Transporter that are
driven by scripts, Teradata PT API is a functional library that is part of your applications. This
allows the applications to have more control during the load and export processes.
Ron,
My experience relates to extracting data so it may not be 100% applicable so take it for what it's worth :)
I recently completed a project to compare the extract performance characteristics of TPT API, Fast Export, .NET Provider (v1.02) & ODBC. Without getting into all of the details, I was able to achieve raw extract performance equal and in some cases better (depending on volume of extract) using the .NET provider than I could with TPT API and Fast Export.
Bottom line, physical location of the server in relation to the Teradata servers is critical. The .NET provider is essentially a "single session" approach whereas I can increase the # of sessions in TPT & FExp. Across the WAN (multiple network hops), multiple sessions in TPT/FExp will drastically outperform .NET however when on the same LAN (or a max of 2 hops), multiple sessions in TPT/FExp actually showed a significant decrease in performance (possibly related to thread blocking due to more data coming than the utils can handle). In this scenario, where TPT/FExp were forced to run with a single session, .NET performed just as fast and in some cases even faster.
I did write managed C++ wrappers around the TPT API for extract only but my code was not the source of the performance drop as using the stand alone utilities by themselves behaved exactly the same.
Gary
I was reading in the Orange Book, Indices and Active Ingest, it mentioned using Iterative Arrays to achieve 1000+ rows/sec throughput. I have not been able to find anything more about how to code/implement this? Has anyone else read/heard of using iterative arrays with the .Net Provider/
TdDataAdapter class supports Batch Update, which is a feature of ADO.NET 2.0:
http://msdn.microsoft.com/en-us/library/ kbbwt18a(VS.80).aspx
TdDataAdapter.UpdateBatchSize must be set to a number higher than 1, for example 100. It will use Iterative Array to batch all 100 rows together in one call to Teradata Database.
A BulkUpdate implementation for the teradata .net provider would be great.. Cross fingers this is something that will happen soon..
I wonder, has anybody had issues using parameter arrays and teradata odbc to do bulk inserts??
Hi Ron and Shaw,
I have been using .NET Data Provider 1.2.0.0 for a while accessing our TD DW but all has been extracting data into a .NET DataSet/DataTable.
I am working on a project right now that needs to load records from a DataTable back into our TD table. I read your posts and I have been trying to use the TdDataAdapter object to achieve this but just very very frustrating; I have not been able to get it figured out. Getting the same error again and again as below:
Teradata.Client.Provider.TdException: [Teradata Database] [3939] There is a mismatch between the number of parameters specified and the number of parameters required.
I'm attaching my code below; I know I must have done something not right but have not been able to figure out what went wrong. If you can look at my code or post a working code, that would be GREAT HELP and will be GREATLY APPRICIATED!
My TD table is like below (just for testing purpose, I made it a single Char column):
[color=#FF0000]CREATE SET TABLE WORK_TBLS.PROCESSED_JOBS,
NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
MLOAD_NAME CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC
)
PRIMARY INDEX ( MLOAD_NAME );[/color]
My code is as below (in C#, .NET 2.0):
[color=#FF0000] public void UpdateRecordTable()
{
// GetConnectionString() will return a working connection string.
string connectionString = GetConnectionString();
// Connect to the database.
using (TdConnection connection = new TdConnection(connectionString))
{
connection.Open();
DataSet ds = new DataSet();
// Create a TdDataAdapter and fill.
TdDataAdapter adapter = new TdDataAdapter("sel * from WORK_TBLS.PROCESSED_JOBS", connection);
adapter.Fill(ds);
// Set the INSERT command and parameter ANOTHER WAY BUT NOT WORK EITHER.
//adapter.InsertCommand = new TdCommand(
// "INSERT INTO WORK_TBLS.PROCESSED_JOBS (MLOAD_NAME) VALUES (?);",
// connection);
// Set the INSERT command and parameter.
adapter.InsertCommand = new TdCommand(
"INSERT INTO WORK_TBLS.PROCESSED_JOBS (MLOAD_NAME) VALUES (@MLOAD_NAME);",
connection);
adapter.InsertCommand.Parameters.Add("@MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
// Set the batch size.
adapter.UpdateBatchSize = 0;
// Add a row for testing.
ds.Tables[0].Rows.Add("test");
// Execute the update.
adapter.Update(ds);
}
}
[/color]
Thanks very much again!
John
I tried the following sample with .NET Data Provider 13.0; it works.
Note that Teradata support Parameter Markers not parameter names.
public static void Main(String[] args)
{
using (TdConnection cn = new TdConnection(@"Data Source=x;user id=y;password=z;"))
{
cn.Open();
DataSet ds = new DataSet();
TdDataAdapter adapter = new TdDataAdapter("Select * from PROCESSED_JOBS", cn);
adapter.Fill(ds);
adapter.InsertCommand = new TdCommand("INSERT INTO PROCESSED_JOBS (MLOAD_NAME) VALUES (?);", cn);
adapter.InsertCommand.Parameters.Add("MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
ds.Tables[0].Rows.Add("test20");
adapter.UpdateBatchSize = 0;
try
{
adapter.Update(ds);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}
}
Hi Shaw,
IT DOES WORK! THANK YOU SO MUCH! The key is the two lines below:
adapter.InsertCommand = new TdCommand("INSERT INTO PROCESSED_JOBS (MLOAD_NAME) VALUES (?);", cn);
adapter.InsertCommand.Parameters.Add("MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
In my old code and all my old testing, I mostly tested the SQL Server's @xxx style params, which worked for my SQL Server connections. I did suspect it could be the params' problem but was not able to figure out what exactly is the problem.
Thanks a lot again Shaw. I do appreciate it very much.
My test is just a simple one column table. Now I'm having a TD table has a Date field. Off hand do you know how to get a DateTime field from a .NET DataTable inserted into the TD table's Date field?
Another quick question, will .NET Data Provider 13.0 work for Visual Studio 2005 and TD V2R5 or V2R6? Any advantage of using this newer version over the 1.2.0.0?
Take care.
John
When I work with a real TD table with different data types other than string/char, i.e. DateTime and int, I had to convert them specifically. Please see my code below - the for loop statement maybe a performance issue but I could not figure out a work around.
public void UpdateRecordTable(TdConnection conn, DataSet ds)
{
using (conn)
{
TdDataAdapter adapter = new TdDataAdapter();
adapter.InsertCommand = new TdCommand(
"INSERT INTO WORK_TBLS.PROCESSED_JOBS VALUES (?,?,?,?,?,CURRENT_TIMESTAMP);",
connection);
adapter.InsertCommand.Parameters.Add("MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
adapter.InsertCommand.Parameters.Add("JOB_DATE", TdType.Char, 10, "JOB_DATE");
adapter.InsertCommand.Parameters.Add("JOB_TIME", TdType.Integer, 0, "JOB_TIME");
adapter.InsertCommand.Parameters.Add("DATABASE_NAME", TdType.Char, 36, "DATABASE_NAME");
adapter.InsertCommand.Parameters.Add("TABLE_NAME", TdType.Char, 36, "TABLE_NAME");
adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
adapter.UpdateBatchSize = 0;
DataTable dt = new DataTable();
dt.Columns.Add("MLOAD_NAME", typeof(string));
dt.Columns.Add("JOB_DATE", typeof(string));
dt.Columns.Add("JOB_TIME", typeof(int));
dt.Columns.Add("DATABASE_NAME", typeof(string));
dt.Columns.Add("TABLE_NAME", typeof(string));
for (int i = 0; i < ds.Tables[0].Rows.Count; i++)
{
string mname = ds.Tables[0].Rows[0].ToString();
string date = DateTime.Parse(ds.Tables[0].Rows[1].ToString()).ToString("s").Substring(0, 10);
int time = int.Parse(ds.Tables[0].Rows[2].ToString());
string dname = ds.Tables[0].Rows[3].ToString();
string tname = ds.Tables[0].Rows[4].ToString();
dt.Rows.Add(mname, date, time, dname, tname);
}
conn.Open();
int cnt = adapter.Update(dt);
}
}
The for loop populates a rows in a new DataTable object, which is extremely similar to the ds.Tables[0], except having fewer columns, like below:
The first 5 columns will be mapped into the new DataTable and populated by the for loop. Is there a way that I can directly use the ds.Tables[0] to insert data into the TD table rather then having to create a new DataTable and populate/transform data into this new DataTable first then use it to insert data into the Teradata?
Any comments/suggestions/advices will be greatly appreciated!
Thanks very much.
John
Sorry, it doesn't allow me to paste XML layout...deleted all the brackets and pasted below:
xs:element name="MLOAD_NAME" type="xs:string" minOccurs="0"
xs:element name="JOB_DATE_TIME" type="xs:dateTime" minOccurs="0"
xs:element name="JOB_TIME" type="xs:int" minOccurs="0"
xs:element name="DATABASE_NAME" type="xs:string" minOccurs="0"
xs:element name="TABLE_NAME" type="xs:string" minOccurs="0"
xs:element name="ROWS_INSERTED" type="xs:decimal" minOccurs="0"
xs:element name="DUPLICATE_ROWS" type="xs:decimal" minOccurs="0"
xs:element name="ERROR_ROWS" type="xs:decimal" minOccurs="0"
xs:element name="ROWS_UPDATED" type="xs:decimal" minOccurs="0"
xs:element name="ROWS_DELETED" type="xs:decimal" minOccurs="0"
Thanks.
John
Table DDL that I used:
Create table Processed_jobs
(
MLOAD_NAME CHAR(30),
JOB_DATE DATE,
JOB_TIME TIME(6),
DATABASE_NAME CHAR(30),
TABLE_NAME CHAR(30),
ROWS_INSERTED DECIMAL(38, 0),
DUPLICATE_ROWS DECIMAL(38, 0),
ERROR_ROWS DECIMAL(38, 0),
ROWS_UPDATED DECIMAL(38, 0),
ROWS_DELETED DECIMAL(38, 0)
)
PRIMARY INDEX (MLOAD_NAME)
Try this sample;
public static void Main(String[] args)
{
using (TdConnection cn = new TdConnection(@"Data Source=x;user id=y;password=z;"))
{
cn.Open();
DataSet ds = new DataSet();
TdDataAdapter adapter = new TdDataAdapter("Select * from PROCESSED_JOBS", cn);
adapter.FillSchema(ds, SchemaType.Source);
adapter.InsertCommand = new TdCommand("INSERT INTO PROCESSED_JOBS (MLOAD_NAME, JOB_DATE, JOB_TIME, DATABASE_NAME, TABLE_NAME) VALUES (?, ?, ?, ?, ?);", cn);
adapter.InsertCommand.Parameters.Add("P_MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
adapter.InsertCommand.Parameters.Add("P_JOB_DATE", TdType.Date, 10, "JOB_DATE");
adapter.InsertCommand.Parameters.Add("P_JOB_TIME", TdType.Time, 15, "JOB_TIME");
adapter.InsertCommand.Parameters.Add("P_DATABASE_NAME" , TdType.Char, 30, "DATABASE_NAME");
adapter.InsertCommand.Parameters.Add("P_TABLE_NAME", TdType.Char, 30, "TABLE_NAME");
adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
DataRow row = ds.Tables[0].NewRow();
row["MLOAD_NAME"] = "Load 1";
row["JOB_DATE"] = DateTime.Now;
row["JOB_TIME"] = DateTime.Now.TimeOfDay;
row["DATABASE_NAME"] = "My Database";
row["TABLE_NAME"] = "My Table";
ds.Tables[0].Rows.Add(row);
adapter.UpdateBatchSize = 0;
try
{
adapter.Update(ds);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}
}
Hi Shaw, thank you so much again for your help! Have a nice evening. Take care. John
If you want to get the best possible performance importing data into Teradata you need to use FastLoad or the Parallel Transport API. If you want to do Bulk Insert to Teradata from VB.Net, C# or some other CLR language, I've created a Managed C++ wrapper for the Teradata Parallel Transport API (equivalent to FAST LOAD). This should go dramatically faster than using the DataAdapter.
http://tptapiwrapper.codeplex.com/
how about creating a console app that will run fastload then accepts a parameter which is the name of the file("test.fld") then calling that console app in your program... will that work.. assuming you have fastload.exe...
fool

Is there anyway to get the .NET provider to perform anywhere near as well as fastload for loading data? We have several applications, in c#, that need to monitor directories and then load those files into a Teradata warehouse. I would prefer not to have to shell out to call fastload.
Currently I am reading/parsing the file into a dataset and then using a dataadapter to update the table. Performance is no where near what fastload accomplishes. Does anyone have any suggestions/tips/tricks as to how to improve the .NET provider performance?
R