All Forums Connectivity
rluebke 47 posts Joined 11/05
11 Jun 2008
.NET Provider performance

Is there anyway to get the .NET provider to perform anywhere near as well as fastload for loading data? We have several applications, in c#, that need to monitor directories and then load those files into a Teradata warehouse. I would prefer not to have to shell out to call fastload.
Currently I am reading/parsing the file into a dataset and then using a dataadapter to update the table. Performance is no where near what fastload accomplishes. Does anyone have any suggestions/tips/tricks as to how to improve the .NET provider performance?

R

NetFx-3203 155 posts Joined 09/06
11 Jun 2008

Do you set TdDataAdapter.UpdateBatchSize property? if yes, did it enhance the overall performance?

NetFx-3203 155 posts Joined 09/06
12 Jun 2008

BTW, you can use TPT API but you must use C++/CLI.

Another option is to partition the data into multiple DataTables; use one thread per connection per DataTable to load the data into empty staging tables. Finally Insert-Select into target table.

rluebke 47 posts Joined 11/05
13 Jun 2008

Yes I did. It had some improvement, but no where near what fastload was doing. I also tried using a stored procedure with a short name in an attempt to get more rows in the package.

R

rluebke 47 posts Joined 11/05
13 Jun 2008

I tried the multiple data tables (with individual TDConnections) route also. It was actually slower and sometimes I received a deadlock error. My guess is that the INSERT statements generated by the DataAdapter Update method were getting in the way of each other (?). I don't mind shifting into C++ for access to the CLI. Is there any documentation available on how to program to it? Ideally, if I went down this road I would like to create a .NET wrapper DLL for the CLI.

R

NetFx-3203 155 posts Joined 09/06
13 Jun 2008

I wrote "empty staging tables".
That is each thread/connection/DataTable will load into a separate empty table.

For example lets assume that you want to load 1,000,000 rows. You can partition the rows across 30 threads. Each thread will read X rows into a DataTable, establish a connection to the Teradata database, create a new table, and insert the rows into the table.

Later you can use a single connection to INSERT-SELECT from the 30 staging tables into the target table.

I wrote "C++/CLI". In this case CLI refers to "Common Language Infrastructure".

TPT API is a layer on top of Call-Level-Interface. It should greatly simplify the process of creating custom Extract and Load applications. See http://www.info.teradata.com/eDownload.cfm? itemid=071790014.


Small snippet from TPT API manual:

The Teradata Parallel Transporter Application Programming Interface (Teradata PT API) is a
set of application programming interfaces used to load and export data to and from Teradata
systems. Teradata PT API enables an application to access the Teradata Database using
proprietary Teradata load and export protocols (Teradata FastLoad, Teradata FastExport, and
Teradata MultiLoad). Unlike the Teradata utilities and Teradata Parallel Transporter that are
driven by scripts, Teradata PT API is a functional library that is part of your applications. This
allows the applications to have more control during the load and export processes.


gmbehan 1 post Joined 09/08
04 Sep 2008

Ron,

My experience relates to extracting data so it may not be 100% applicable so take it for what it's worth :)

I recently completed a project to compare the extract performance characteristics of TPT API, Fast Export, .NET Provider (v1.02) & ODBC. Without getting into all of the details, I was able to achieve raw extract performance equal and in some cases better (depending on volume of extract) using the .NET provider than I could with TPT API and Fast Export.

Bottom line, physical location of the server in relation to the Teradata servers is critical. The .NET provider is essentially a "single session" approach whereas I can increase the # of sessions in TPT & FExp. Across the WAN (multiple network hops), multiple sessions in TPT/FExp will drastically outperform .NET however when on the same LAN (or a max of 2 hops), multiple sessions in TPT/FExp actually showed a significant decrease in performance (possibly related to thread blocking due to more data coming than the utils can handle). In this scenario, where TPT/FExp were forced to run with a single session, .NET performed just as fast and in some cases even faster.

I did write managed C++ wrappers around the TPT API for extract only but my code was not the source of the performance drop as using the stand alone utilities by themselves behaved exactly the same.

Gary

rdp010 1 post Joined 09/08
23 Sep 2008

I was reading in the Orange Book, Indices and Active Ingest, it mentioned using Iterative Arrays to achieve 1000+ rows/sec throughput. I have not been able to find anything more about how to code/implement this? Has anyone else read/heard of using iterative arrays with the .Net Provider/

NetFx-3203 155 posts Joined 09/06
24 Sep 2008

TdDataAdapter class supports Batch Update, which is a feature of ADO.NET 2.0:

http://msdn.microsoft.com/en-us/library/ kbbwt18a(VS.80).aspx



TdDataAdapter.UpdateBatchSize must be set to a number higher than 1, for example 100. It will use Iterative Array to batch all 100 rows together in one call to Teradata Database.

rluebke 47 posts Joined 11/05
25 Sep 2008

I normally set the UpdateBatchSize to 0 now for my DataAdapter. From previous disucssions in this forum, it appears that the window size of the data going to the database is roughly 1 MB. So when you set the UpdateBatchSize to 0, it will put as many rows in as it can in that 1 MB window.
How you set up the UPDATE statement for the DataAdapter will affect your performance. By default, if you let the framework create the UPDATE statement it tries to compare the values on every column, and as such, must transport alot more data than you may need. That can eat into the 1MB window size pretty quick. On large updates, I manually set the UpdateCommand so that it uses just the primary index columns and the fields I need to change. Using this, I have been able to pack alot more rows in the 1 MB window.
For inserts, you pretty much get what you get but with one caveat. If you have default values in your destination table and can use them, do so.
In a nutshell, any technique you can use to reduce the amount of information per row that needs to be sent back ( and thus giving you more rows per window) will give you better performance.
What I would really like to see implemented in the .NET provider is something similar to the BulkUpdate that the SQL Server provider has.

R

ayman 1 post Joined 11/07
24 Oct 2008

A BulkUpdate implementation for the teradata .net provider would be great.. Cross fingers this is something that will happen soon..

I wonder, has anybody had issues using parameter arrays and teradata odbc to do bulk inserts??

zqi 5 posts Joined 06/04
17 Mar 2009

Hi Ron and Shaw,

I have been using .NET Data Provider 1.2.0.0 for a while accessing our TD DW but all has been extracting data into a .NET DataSet/DataTable.

I am working on a project right now that needs to load records from a DataTable back into our TD table. I read your posts and I have been trying to use the TdDataAdapter object to achieve this but just very very frustrating; I have not been able to get it figured out. Getting the same error again and again as below:

Teradata.Client.Provider.TdException: [Teradata Database] [3939] There is a mismatch between the number of parameters specified and the number of parameters required.

I'm attaching my code below; I know I must have done something not right but have not been able to figure out what went wrong. If you can look at my code or post a working code, that would be GREAT HELP and will be GREATLY APPRICIATED!

My TD table is like below (just for testing purpose, I made it a single Char column):

[color=#FF0000]CREATE SET TABLE WORK_TBLS.PROCESSED_JOBS,
NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
MLOAD_NAME CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC
)
PRIMARY INDEX ( MLOAD_NAME );[/color]


My code is as below (in C#, .NET 2.0):

[color=#FF0000] public void UpdateRecordTable()
{
// GetConnectionString() will return a working connection string.
string connectionString = GetConnectionString();

// Connect to the database.
using (TdConnection connection = new TdConnection(connectionString))
{
connection.Open();
DataSet ds = new DataSet();

// Create a TdDataAdapter and fill.
TdDataAdapter adapter = new TdDataAdapter("sel * from WORK_TBLS.PROCESSED_JOBS", connection);
adapter.Fill(ds);

// Set the INSERT command and parameter ANOTHER WAY BUT NOT WORK EITHER.
//adapter.InsertCommand = new TdCommand(
// "INSERT INTO WORK_TBLS.PROCESSED_JOBS (MLOAD_NAME) VALUES (?);",
// connection);

// Set the INSERT command and parameter.
adapter.InsertCommand = new TdCommand(
"INSERT INTO WORK_TBLS.PROCESSED_JOBS (MLOAD_NAME) VALUES (@MLOAD_NAME);",
connection);
adapter.InsertCommand.Parameters.Add("@MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");

// Set the batch size.
adapter.UpdateBatchSize = 0;

// Add a row for testing.
ds.Tables[0].Rows.Add("test");

// Execute the update.
adapter.Update(ds);
}
}
[/color]


Thanks very much again!
John

NetFx-3203 155 posts Joined 09/06
17 Mar 2009

I tried the following sample with .NET Data Provider 13.0; it works.
Note that Teradata support Parameter Markers not parameter names.



public static void Main(String[] args)
{
using (TdConnection cn = new TdConnection(@"Data Source=x;user id=y;password=z;"))
{
cn.Open();

DataSet ds = new DataSet();

TdDataAdapter adapter = new TdDataAdapter("Select * from PROCESSED_JOBS", cn);
adapter.Fill(ds);

adapter.InsertCommand = new TdCommand("INSERT INTO PROCESSED_JOBS (MLOAD_NAME) VALUES (?);", cn);
adapter.InsertCommand.Parameters.Add("MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None;

ds.Tables[0].Rows.Add("test20");

adapter.UpdateBatchSize = 0;

try
{
adapter.Update(ds);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}

}

zqi 5 posts Joined 06/04
17 Mar 2009

Hi Shaw,

IT DOES WORK! THANK YOU SO MUCH! The key is the two lines below:

adapter.InsertCommand = new TdCommand("INSERT INTO PROCESSED_JOBS (MLOAD_NAME) VALUES (?);", cn);
adapter.InsertCommand.Parameters.Add("MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");

In my old code and all my old testing, I mostly tested the SQL Server's @xxx style params, which worked for my SQL Server connections. I did suspect it could be the params' problem but was not able to figure out what exactly is the problem.

Thanks a lot again Shaw. I do appreciate it very much.

My test is just a simple one column table. Now I'm having a TD table has a Date field. Off hand do you know how to get a DateTime field from a .NET DataTable inserted into the TD table's Date field?

Another quick question, will .NET Data Provider 13.0 work for Visual Studio 2005 and TD V2R5 or V2R6? Any advantage of using this newer version over the 1.2.0.0?

Take care.

John

zqi 5 posts Joined 06/04
18 Mar 2009

When I work with a real TD table with different data types other than string/char, i.e. DateTime and int, I had to convert them specifically. Please see my code below - the for loop statement maybe a performance issue but I could not figure out a work around.

public void UpdateRecordTable(TdConnection conn, DataSet ds)
{
using (conn)
{
TdDataAdapter adapter = new TdDataAdapter();

adapter.InsertCommand = new TdCommand(
"INSERT INTO WORK_TBLS.PROCESSED_JOBS VALUES (?,?,?,?,?,CURRENT_TIMESTAMP);",
connection);

adapter.InsertCommand.Parameters.Add("MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
adapter.InsertCommand.Parameters.Add("JOB_DATE", TdType.Char, 10, "JOB_DATE");
adapter.InsertCommand.Parameters.Add("JOB_TIME", TdType.Integer, 0, "JOB_TIME");
adapter.InsertCommand.Parameters.Add("DATABASE_NAME", TdType.Char, 36, "DATABASE_NAME");
adapter.InsertCommand.Parameters.Add("TABLE_NAME", TdType.Char, 36, "TABLE_NAME");

adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None;
adapter.UpdateBatchSize = 0;

DataTable dt = new DataTable();

dt.Columns.Add("MLOAD_NAME", typeof(string));
dt.Columns.Add("JOB_DATE", typeof(string));
dt.Columns.Add("JOB_TIME", typeof(int));
dt.Columns.Add("DATABASE_NAME", typeof(string));
dt.Columns.Add("TABLE_NAME", typeof(string));

for (int i = 0; i < ds.Tables[0].Rows.Count; i++)
{
string mname = ds.Tables[0].Rows[0].ToString();
string date = DateTime.Parse(ds.Tables[0].Rows[1].ToString()).ToString("s").Substring(0, 10);
int time = int.Parse(ds.Tables[0].Rows[2].ToString());
string dname = ds.Tables[0].Rows[3].ToString();
string tname = ds.Tables[0].Rows[4].ToString();

dt.Rows.Add(mname, date, time, dname, tname);
}

conn.Open();
int cnt = adapter.Update(dt);
}
}


The for loop populates a rows in a new DataTable object, which is extremely similar to the ds.Tables[0], except having fewer columns, like below:















The first 5 columns will be mapped into the new DataTable and populated by the for loop. Is there a way that I can directly use the ds.Tables[0] to insert data into the TD table rather then having to create a new DataTable and populate/transform data into this new DataTable first then use it to insert data into the Teradata?

Any comments/suggestions/advices will be greatly appreciated!

Thanks very much.

John

zqi 5 posts Joined 06/04
18 Mar 2009

Sorry, it doesn't allow me to paste XML layout...deleted all the brackets and pasted below:

xs:element name="MLOAD_NAME" type="xs:string" minOccurs="0"
xs:element name="JOB_DATE_TIME" type="xs:dateTime" minOccurs="0"
xs:element name="JOB_TIME" type="xs:int" minOccurs="0"
xs:element name="DATABASE_NAME" type="xs:string" minOccurs="0"
xs:element name="TABLE_NAME" type="xs:string" minOccurs="0"
xs:element name="ROWS_INSERTED" type="xs:decimal" minOccurs="0"
xs:element name="DUPLICATE_ROWS" type="xs:decimal" minOccurs="0"
xs:element name="ERROR_ROWS" type="xs:decimal" minOccurs="0"
xs:element name="ROWS_UPDATED" type="xs:decimal" minOccurs="0"
xs:element name="ROWS_DELETED" type="xs:decimal" minOccurs="0"

Thanks.
John

NetFx-3203 155 posts Joined 09/06
18 Mar 2009

Table DDL that I used:

Create table Processed_jobs
(
MLOAD_NAME CHAR(30),
JOB_DATE DATE,
JOB_TIME TIME(6),
DATABASE_NAME CHAR(30),
TABLE_NAME CHAR(30),
ROWS_INSERTED DECIMAL(38, 0),
DUPLICATE_ROWS DECIMAL(38, 0),
ERROR_ROWS DECIMAL(38, 0),
ROWS_UPDATED DECIMAL(38, 0),
ROWS_DELETED DECIMAL(38, 0)
)
PRIMARY INDEX (MLOAD_NAME)



Try this sample;

public static void Main(String[] args)
{
using (TdConnection cn = new TdConnection(@"Data Source=x;user id=y;password=z;"))
{
cn.Open();

DataSet ds = new DataSet();

TdDataAdapter adapter = new TdDataAdapter("Select * from PROCESSED_JOBS", cn);
adapter.FillSchema(ds, SchemaType.Source);

adapter.InsertCommand = new TdCommand("INSERT INTO PROCESSED_JOBS (MLOAD_NAME, JOB_DATE, JOB_TIME, DATABASE_NAME, TABLE_NAME) VALUES (?, ?, ?, ?, ?);", cn);
adapter.InsertCommand.Parameters.Add("P_MLOAD_NAME", TdType.Char, 30, "MLOAD_NAME");
adapter.InsertCommand.Parameters.Add("P_JOB_DATE", TdType.Date, 10, "JOB_DATE");
adapter.InsertCommand.Parameters.Add("P_JOB_TIME", TdType.Time, 15, "JOB_TIME");
adapter.InsertCommand.Parameters.Add("P_DATABASE_NAME" , TdType.Char, 30, "DATABASE_NAME");
adapter.InsertCommand.Parameters.Add("P_TABLE_NAME", TdType.Char, 30, "TABLE_NAME");

adapter.InsertCommand.UpdatedRowSource = UpdateRowSource.None;

DataRow row = ds.Tables[0].NewRow();

row["MLOAD_NAME"] = "Load 1";
row["JOB_DATE"] = DateTime.Now;
row["JOB_TIME"] = DateTime.Now.TimeOfDay;
row["DATABASE_NAME"] = "My Database";
row["TABLE_NAME"] = "My Table";

ds.Tables[0].Rows.Add(row);

adapter.UpdateBatchSize = 0;

try
{
adapter.Update(ds);
}
catch (Exception e)
{
Console.WriteLine(e.ToString());
}
}

}

zqi 5 posts Joined 06/04
18 Mar 2009

Hi Shaw, thank you so much again for your help! Have a nice evening. Take care. John

rluebke 47 posts Joined 11/05
12 May 2009

At the risk of beating a dead horse, I still cannot get performance anywhere near comparable to the CLI out of the .NET provider. I have tried playing with window sizes, using stored procedures, parameters, parallel temporary tables, etc. I load/unload multi-gigabyte files in C# and have been trying to avoid shelling out to fastexport or fastload. This seems to me to defeat the purpose of having a .NET provider for anything but the smallest queries.
There are other threads in the forum for the equivalent of a BULK load statement in the .NET provider (and this seems like a good idea given the size and scope of the normal Teradata database). Are there any improvements in TD13, or beyond, which we can hope for to resolve this issue?

R



sflanker 5 posts Joined 06/09
7 weeks ago

If you want to get the best possible performance importing data into Teradata you need to use FastLoad or the Parallel Transport API. If you want to do Bulk Insert to Teradata from VB.Net, C# or some other CLR language, I've created a Managed C++ wrapper for the Teradata Parallel Transport API (equivalent to FAST LOAD). This should go dramatically faster than using the DataAdapter.
http://tptapiwrapper.codeplex.com/

rluebke 47 posts Joined 11/05
1 week ago

I will download that wrapper and see how it works out. Is there any way to get the source code from you?

R

xer 1 post Joined 03/10
1 week ago

how about creating a console app that will run fastload then accepts a parameter which is the name of the file("test.fld") then calling that console app in your program... will that work.. assuming you have fastload.exe...

fool

You must sign in to leave a comment.