Are you reading Excel file using Interop library? Is the application performance sluggish? Well, Interop is slow but that does not mean that you cannot improve performance. Understanding the root cause and avoiding pitfalls will surely help you improve the performance. Making a minor tweaks in code, I was able to reduce the time of reading 4000 rows from 40 minutes to few seconds. That’s a drastic improvement.
So Why Excel Interop is Slow?
From C# / VB .Net or any other code when you call Excel Interop functions, .Net application is trying to work with COM objects and functions. Each function call or property you access requires marshalling data. That is the bottleneck. If you are using a long object chain again and again, that’s where you get a performance hit. The more you cross Interop boundaries, your code will be penalized with performance.
How to Improve Excel Interop Performance?
Your code will be faster, lesser you cross Interop boundaries. You may also drop Interop from code and use alternative methods to access data. Depending on your situation, you may use any of the following methods to improve performance that best suits your application and need.
Reduce Interop Calls
If you are reading each cell one at a time, don’t do it. This is the main culprit for sluggish performance. Instead, you want to read everything in a single function call.
Here is a sample code that causes performance issue. In this code, I am reading one Excel cell at a time. Reading 200 rows and 9 columns take around 8 seconds.
Microsoft.Office.Interop.Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
Workbook xlWorkbook = xlApp.Workbooks.Open(pathToExcelFile);
_Worksheet xlWorksheet = (_Worksheet)xlWorkbook.Sheets[1];
Range xlRange = xlWorksheet.UsedRange;
int rowCount = xlRange.Rows.Count;
// we already know that range starts from 1st row with header and has 9 columns. Skipping 1st header row and reading cell value one by one
DateTime startTime = DateTime.Now;
// Excel range index starts with 1
for(int row = 2; row <= rowCount; row++)
{
// for xlWorksheet.Cells, 1st index is column, 2nd index is row
dynamic firstName = xlWorksheet.Cells[1][row].Value2;
dynamic lastName = xlWorksheet.Cells[2][row].Value2;
dynamic gender = xlWorksheet.Cells[3][row].Value2;
dynamic age = xlWorksheet.Cells[4][row].Value2;
dynamic email = xlWorksheet.Cells[5][row].Value2;
dynamic phone = xlWorksheet.Cells[6][row].Value2;
dynamic education = xlWorksheet.Cells[7][row].Value2;
dynamic occupation = xlWorksheet.Cells[8][row].Value2;
dynamic experience = xlWorksheet.Cells[9][row].Value2;
// now you have values so check for null, cast to appropriate data type and then use it
}
TimeSpan totalRuntime = DateTime.Now - startTime;
xlWorkbook.Close();
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(xlWorkbook);
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(xlApp);
MessageBox.Show(string.Format("Completed in {0}", totalRuntime.ToString()));
Here is a revised code that reads everything in a single function call. Reading 200 rows and 9 columns takes around 200 milliseconds.
Microsoft.Office.Interop.Excel.Application xlApp = new Microsoft.Office.Interop.Excel.Application();
Workbook xlWorkbook = xlApp.Workbooks.Open(pathToExcelFile);
_Worksheet xlWorksheet = (_Worksheet)xlWorkbook.Sheets[1];
Range xlRange = xlWorksheet.UsedRange;
DateTime startTime = DateTime.Now;
dynamic[,] excelData = xlRange.Value2;
int rowCount = excelData.GetUpperBound(0);
// this is how to get cellCount in case you need it.
int cellCount = excelData.GetUpperBound(1);
// even though it is a multidimentional .net array, index starts with 1 so skipping 1st header row
for (int row = 2; row <= rowCount; row++)
{
dynamic firstName = excelData[row, 1] ;
dynamic lastName = excelData[row, 2];
dynamic gender = excelData[row, 3];
dynamic age = excelData[row, 4];
dynamic email = excelData[row, 5];
dynamic phone = excelData[row, 6];
dynamic education = excelData[row, 7];
dynamic occupation = excelData[row, 8];
dynamic experience = excelData[row, 9];
// now you have values so check for null, cast to appropriate data type and then use it
}
TimeSpan totalRuntime = DateTime.Now - startTime;
xlWorkbook.Close();
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(xlWorkbook);
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(xlApp);
MessageBox.Show(string.Format("Completed in {0}", totalRuntime.ToString()));
When you need to read data from Excel and using it within your application, read everything that you need in a variable and only use .Net variable value for any data comparison or manipulation.
For example if you want to validate Excel data, don’t access it multiple times. Here each call takes about 50 ms so total cost is 150 ms.
int age = -1;
if (xlWorksheet.Cells[4][row].Value2 != null && !string.IsNullOrEmpty(xlWorksheet.Cells[4][row].Value2.ToString())) {
age = Convert.ToInt(xlWorksheet.Cells[4][row].Value2.ToString());
}
Instead, read data in variable and only use variable for any data comparison or manipulation. Here total cost is only 50 ms.
// This code is NOT optimized to demonstrate exact equivalent code for reducing Interop calls.
int age = -1;
dynamic excelAge = xlWorksheet.Cells[4][row].Value2;
if (excelAge != null && !string.IsNullOrEmpty(excelAge.ToString())) {
age = Convert.ToInt(excelAge);
}
As the number of cells you read and number of times you access Interop, it dramatically affects performance. Try to avoid Interop calls as much as you can.
Use CSV Instead of Interop
If possible, you should deal only with CSV file and not support Excel.
If you must support Excel file, depending on your situation, you may use Interop to save Excel file as CSV. Once you have CSV, you may use CSVHelper to read file.
Use OLEDB
It is better to avoid Interop altogether. Try to use OLEDB to read from Excel file just like reading data from SQL database. It is way easier and faster compared to Interop.
Leave a Reply