MSSQL: Backup/Restore logically a large table
Objective: We perform 3 data processing steps in this tutorial
1.Download an opensource data in pieces and join them together using powershell and python
2.Load this data into sqlserver
3.Use logical partition function to process a subset of rows; evaluate its performance against 1 shot process
YouTube:
Step 1: Download an opensource data in pieces and join them together using powershell and python
Source of data: https://github.com/JeffSackmann/tennis_atp
Data downloaded: ATP tennis data
Poweshell code:
Python Code:
Step 2) option 1: open SSMS to load the data
Right click on database -> tasks -> Import from Flat file (! import data) -> create a new table -> next -> validate auto column detection -> Load the data
Validate the loaded data by selecting top 1000 rows.
Step 2) Load data using option 2
Use bulk insert
Load data: Same file as above with an additional column to accommodate the recid primary key.
BULK INSERT [dbo].[atp_matches_2M]
FROM 'C:\Users\lab\Documents\Lab\ATP_Merged_bulkcopy.csv'
WITH (FIELDTERMINATOR = ',', FIRSTROW = 2);
Step 3) Use logical partition function to process a subset of rows; evaluate its performance.
Before we try the partition, let us first extract the data as is using the options on sql server options & measure time taken for each option.
a) as bacpac file (export data tier application)
output: C:\Users\lab\Documents\Lab\load_ATP_Data\backup\04nov2025_export_atp_matches_2M.bacpac
Actual:
C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.bacpac
10s
b) as export to a flat file (export data)
output: C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.csv
c) using bcp command
bcp tsqltestdb.dbo.atp_matches_2M out "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -S server_name -T -c
actual:
echo %time%
bcp tsqltestdb.dbo.ATP_DATA_5NOV25_BC out "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -T -c
echo %time%
Output:
194996 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 4219 Average : (46218.54 rows per sec.)
Partition the table logically using ntile function:
Avg: 2secs
Load bacpac:
d) actual partitioning
Partition the data using ntile function (https://learn.microsoft.com/en-us/sql/t-sql/functions/ntile-transact-sql?view=sql-server-ver17)
with t as
(
select recid
,ntile(4) over (order by recid asc) as partid
from tsqltestdb.dbo.ATP_DATA_5NOV25_BC
)
select partid
,min(recid)
,max(recid)
from t
group by partid
order by 1;
Ex.:
bcp "SELECT column1, column2 FROM database_name.schema_name.table_name WHERE condition" queryout "C:\path\to\output.txt" -S server_name -T -c
Actual: servername skipped, since it is localhost
bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 1 and 48749" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part1.txt" -T -c
bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 48750 and 97498" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part2.txt" -T -c
bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 97499 and 146247" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part3.txt" -T -c
bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 146248 and 194996" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part4.txt" -T -c
Time overall: 2s
ex. output:
48749 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1875 Average : (25999.47 rows per sec.)
C:\Users\lab>
Step 4) Are we able to load such extracted data?
a) Load bacpac:
Download sqlpackage.exe app and install it
url: https://aka.ms/dacfx-msi
If your planning to restore the database and schema along with table you extracted this method is ok.
# Import full BACPAC to temp database
"C:\Program Files\Microsoft SQL Server\170\DAC\bin\"SqlPackage.exe /Action:Import /sf:"C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25.bacpac" /tsn:"192.168.1.100" /tdn:"TempDB_Import" /tec:"False"
Actual:
"C:\Program Files\Microsoft SQL Server\170\DAC\bin\"SqlPackage.exe /Action:Import /sf:"C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.bacpac" /tsn:"192.168.1.100" /tdn:"TempDB5nov_Import" /tec:"False"
Query the data to see if the data is loaded properly.
>> it worked just fine, all records part of the only table we were backing up is loaded fine, others DDL is loaded.
But in our case only 1 table; this wont benefit us; so use one of these options...
b) load data alone from bacpac
bcp TempDB_Import.dbo.atp_matches_2M_loadback in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25\Data\dbo.atp_matches_2M\TableData-002-00000.bcp" -T -n
or
BULK INSERT [TempDB5nov_Import].[dbo].[atp_matches_2M]
FROM 'C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25\Data\dbo.atp_matches_2M\TableData-041-00000.bcp'
WITH (DATAFILETYPE = 'native');
actual:
BULK INSERT TempDB_Import.dbo.atp_matches_2M_loadback
FROM 'C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M\Data\dbo.ATP_DATA_5NOV25_BC\TableData-000-00000'
WITH (DATAFILETYPE = 'native');
verify the record count.
>> all ok
1 sample thread was tested above.
b) Load bcp data back
bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -T -c
verify the record count.
Output:
194996 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 4297 Average : (45379.57 rows per sec.)
C:\Users\lab>
Truncate table:
c) Load partitioned data extract back using bcp or bulk copy
bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part1.txt" -T -c
bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part2.txt" -T -c
bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part3.txt" -T -c
bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part4.txt" -T -c
verify the record count.
This should conclude our test scenario.
Duration: 2 secs
Output of 1 thread:
48749 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1250 Average : (38999.20 rows per sec.)
C:\Users\lab>
Foot notes:
select * from sys.dm_os_wait_stats where waiting_tasks_count !=0;
this can help fetch wait statistics aggregated across instance in sql server, running this few seconds apart will let us know what is happening majorly in SQL Server.
No comments:
Post a Comment