Thursday, November 6, 2025

MSSQL: Backup/Restore logically a large table

 MSSQL: Backup/Restore logically a large table


Objective: We perform 3 data processing steps in this tutorial

1.Download an opensource data in pieces and join them together using powershell and python

2.Load this data into sqlserver

3.Use logical partition function to process a subset of rows; evaluate its performance against 1 shot process


YouTube:

Step 1: Download an opensource data in pieces and join them together using powershell and python


Source of data: https://github.com/JeffSackmann/tennis_atp

Data downloaded: ATP tennis data


Poweshell code:


# Define source and output paths

$sourcePath = "C:\Users\lab\Documents\Lab\load_ATP_Data"          # Folder containing CSVs

$outputFile = "C:\Users\lab\Documents\Lab\ATP_Merged.csv"        # Output file


# Get all CSV files in the folder

$csvFiles = Get-ChildItem -Path $sourcePath -Filter *.csv


# Reset the output file if it exists

if (Test-Path $outputFile) { Remove-Item $outputFile }


# Track first file for header handling

$firstFile = $true


foreach ($file in $csvFiles) {

    if ($firstFile) {

        # Write header + content from first file

        Get-Content $file.FullName | Out-File -FilePath $outputFile -Encoding UTF8

        $firstFile = $false

    } else {

        # Skip header (first line) from subsequent files

        Get-Content $file.FullName | Select-Object -Skip 1 | Out-File -FilePath $outputFile -Append -Encoding UTF8

    }

}


Write-Host "CSV merge complete. Output saved to $outputFile"




Python Code:

import pandas as pd

import glob


# Define source and output paths, r string in the begining of the value indicates the value shouldnt be treated specially

source_path = r"C:\Users\lab\Documents\Lab\load_ATP_Data\*.csv"

output_file = r"C:\Users\lab\Documents\Lab\ATP_Merged_py.csv"


# Get all CSV files

all_files = glob.glob(source_path)


# Read and concatenate all CSV files

df_list = [pd.read_csv(file) for file in all_files]

merged_df = pd.concat(df_list, ignore_index=True)


# Export to CSV

merged_df.to_csv(output_file, index=False)


print(f"CSV merge complete. Output saved to {output_file}")


Step 2) option 1: open SSMS to load the data


Right click on database -> tasks -> Import from Flat file (! import data) -> create a new table -> next -> validate auto column detection -> Load the data


Validate the loaded data by selecting top 1000 rows.


SQLQuery3.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (58))* - Microsoft SQL Server Management Studio

SQLQuery3.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (58))* - Microsoft SQL Server Management Studio


Step 2) Load data using option 2


Use bulk insert


create table ddl:


CREATE TABLE [dbo].[atp_matches_2M](

    [recid] int not null identity(1,1) constraint pk_recid1 primary key,

[tourney_id] [nvarchar](50) NULL,

[tourney_name] [nvarchar](50) NULL,

[surface] [nvarchar](50) NULL,

[draw_size] [tinyint] NULL,

[tourney_level] [nvarchar](50) NULL,

[tourney_date] [date] NULL,

[match_num] [smallint] NULL,

[winner_id] [int] NULL,

[winner_seed] [tinyint] NULL,

[winner_entry] [nvarchar](50) NULL,

[winner_name] [nvarchar](50) NULL,

[winner_hand] [nvarchar](50) NULL,

[winner_ht] [tinyint] NULL,

[winner_ioc] [nvarchar](50) NULL,

[winner_age] [float] NULL,

[loser_id] [int] NULL,

[loser_seed] [tinyint] NULL,

[loser_entry] [nvarchar](50) NULL,

[loser_name] [nvarchar](50) NULL,

[loser_hand] [nvarchar](50) NULL,

[loser_ht] [tinyint] NULL,

[loser_ioc] [nvarchar](50) NULL,

[loser_age] [float] NULL,

[score] [nvarchar](50) NULL,

[best_of] [tinyint] NULL,

[round] [nvarchar](50) NULL,

[minutes] [nvarchar](5) NULL,

[w_ace] [nvarchar](5) NULL,

[w_df] [nvarchar](5) NULL,

[w_svpt] [nvarchar](5) NULL,

[w_1stIn] [nvarchar](5) NULL,

[w_1stWon] [nvarchar](5) NULL,

[w_2ndWon] [nvarchar](5) NULL,

[w_SvGms] [nvarchar](5) NULL,

[w_bpSaved] [nvarchar](5) NULL,

[w_bpFaced] [nvarchar](5) NULL,

[l_ace] [nvarchar](5) NULL,

[l_df] [nvarchar](5) NULL,

[l_svpt] [nvarchar](5) NULL,

[l_1stIn] [nvarchar](5) NULL,

[l_1stWon] [nvarchar](5) NULL,

[l_2ndWon] [nvarchar](5) NULL,

[l_SvGms] [nvarchar](5) NULL,

[l_bpSaved] [nvarchar](5) NULL,

[l_bpFaced] [nvarchar](5) NULL,

[winner_rank] [nvarchar](5) NULL,

[winner_rank_points] [nvarchar](5) NULL,

[loser_rank] [nvarchar](5) NULL,

[loser_rank_points] [nvarchar](5) NULL

) ON [PRIMARY]

GO




*new 86 - Notepad++

SQLQuery4.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (82))* - Microsoft SQL Server Management Studio


SQLQuery5.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (68)) - Microsoft SQL Server Management Studio


Load data: Same file as above with an additional column to accommodate the recid primary key.


BULK INSERT [dbo].[atp_matches_2M]

FROM 'C:\Users\lab\Documents\Lab\ATP_Merged_bulkcopy.csv'

WITH (FIELDTERMINATOR = ',', FIRSTROW = 2);


SQLQuery5.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (68))* - Microsoft SQL Server Management Studio

SQLQuery5.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (68))* - Microsoft SQL Server Management Studio


Step 3) Use logical partition function to process a subset of rows; evaluate its performance.


Before we try the partition, let us first extract the data as is using the options on sql server options & measure time taken for each option.


a) as bacpac file (export data tier application)


output: C:\Users\lab\Documents\Lab\load_ATP_Data\backup\04nov2025_export_atp_matches_2M.bacpac

Actual: 

C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.bacpac


Export Data-tier Application 'tsqltestdb'

Export Data-tier Application 'tsqltestdb'

Export Data-tier Application 'tsqltestdb'

Export Data-tier Application 'tsqltestdb'

Export Data-tier Application 'tsqltestdb'

10s


b) as export to a flat file (export data)


output: C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.csv


C:\Windows\system32\cmd.exe


c) using bcp command


bcp tsqltestdb.dbo.atp_matches_2M out "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -S server_name -T -c


actual:

echo %time%

bcp tsqltestdb.dbo.ATP_DATA_5NOV25_BC out "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -T -c

echo %time%


Output:


194996 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total     : 4219   Average : (46218.54 rows per sec.)


C:\Windows\system32\cmd.exe

Partition the table logically using ntile function:

SQLQuery5.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (68))* - Microsoft SQL Server Management Studio

Avg: 2secs


Load bacpac:

Sqlpackage.exe

Select C:\Windows\system32\cmd.exe

SQLQuery9.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (90)) - Microsoft SQL Server Management Studio



d) actual partitioning


Partition the data using ntile function (https://learn.microsoft.com/en-us/sql/t-sql/functions/ntile-transact-sql?view=sql-server-ver17)


with t as

(

select recid

,ntile(4) over (order by recid asc) as partid

from tsqltestdb.dbo.ATP_DATA_5NOV25_BC

)

select partid

,min(recid)

,max(recid)

from t

group by partid

order by 1;



Ex.:

bcp "SELECT column1, column2 FROM database_name.schema_name.table_name WHERE condition" queryout "C:\path\to\output.txt" -S server_name -T -c


Actual: servername skipped, since it is localhost


bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 1 and 48749" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part1.txt" -T -c

bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 48750 and 97498" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part2.txt" -T -c

bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 97499 and 146247" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part3.txt" -T -c

bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 146248 and 194996" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part4.txt" -T -c


Time overall: 2s


ex. output:


48749 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total     : 1875   Average : (25999.47 rows per sec.)


C:\Users\lab>


Step 4) Are we able to load such extracted data?


a) Load bacpac:

Download sqlpackage.exe app and install it


url: https://aka.ms/dacfx-msi


If your planning to restore the database and schema along with table you extracted this method is ok. 


# Import full BACPAC to temp database

"C:\Program Files\Microsoft SQL Server\170\DAC\bin\"SqlPackage.exe /Action:Import /sf:"C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25.bacpac" /tsn:"192.168.1.100" /tdn:"TempDB_Import" /tec:"False"


Actual:

"C:\Program Files\Microsoft SQL Server\170\DAC\bin\"SqlPackage.exe /Action:Import /sf:"C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.bacpac" /tsn:"192.168.1.100" /tdn:"TempDB5nov_Import" /tec:"False"


Select C:\Windows\system32\cmd.exe


SQLQuery9.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (90)) - Microsoft SQL Server Management Studio



Query the data to see if the data is loaded properly.


>> it worked just fine, all records part of the only table we were backing up is loaded fine, others DDL is loaded.


But in our case only 1 table; this wont benefit us; so use one of these options...


b) load data alone from bacpac

bcp TempDB_Import.dbo.atp_matches_2M_loadback in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25\Data\dbo.atp_matches_2M\TableData-002-00000.bcp" -T -n


or


BULK INSERT [TempDB5nov_Import].[dbo].[atp_matches_2M]

FROM 'C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25\Data\dbo.atp_matches_2M\TableData-041-00000.bcp'

WITH (DATAFILETYPE = 'native');



actual:


BULK INSERT TempDB_Import.dbo.atp_matches_2M_loadback

FROM 'C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M\Data\dbo.ATP_DATA_5NOV25_BC\TableData-000-00000'

WITH (DATAFILETYPE = 'native');


verify the record count.


>> all ok

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio


1 sample thread was tested above.



b) Load bcp data back


bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -T -c


verify the record count.


Output:


194996 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total     : 4297   Average : (45379.57 rows per sec.)


C:\Users\lab>


Truncate table:

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio

C:\Windows\system32\cmd.exe

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio




c) Load partitioned data extract back using bcp or bulk copy


bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part1.txt" -T -c

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part2.txt" -T -c

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part3.txt" -T -c

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part4.txt" -T -c


verify the record count.


This should conclude our test scenario.


Duration: 2 secs


Output of 1 thread:

48749 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total     : 1250   Average : (38999.20 rows per sec.)


C:\Users\lab>

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio

SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75)) Executing...* - Microsoft SQL Server Management Studio



Foot notes:


select * from sys.dm_os_wait_stats where waiting_tasks_count !=0;


this can help fetch wait statistics aggregated across instance in sql server, running this few seconds apart will let us know what is happening majorly in SQL Server.


MSSQL: Backup/Restore logically a large table

  MSSQL : Backup/Restore logically a large table Objective: We perform 3 data processing steps in this tutorial 1.Download an opensource dat...