Let us learn Oracle Database Administration skills

Thursday, November 6, 2025

MSSQL: Backup/Restore logically a large table

Objective: We perform 3 data processing steps in this tutorial

1.Download an opensource data in pieces and join them together using powershell and python

2.Load this data into sqlserver

3.Use logical partition function to process a subset of rows; evaluate its performance against 1 shot process

YouTube:

Step 1: Download an opensource data in pieces and join them together using powershell and python

Source of data: https://github.com/JeffSackmann/tennis_atp

Data downloaded: ATP tennis data

Poweshell code:

# Define source and output paths

$sourcePath = "C:\Users\lab\Documents\Lab\load_ATP_Data" # Folder containing CSVs

$outputFile = "C:\Users\lab\Documents\Lab\ATP_Merged.csv" # Output file

# Get all CSV files in the folder

$csvFiles = Get-ChildItem -Path $sourcePath -Filter *.csv

# Reset the output file if it exists

if (Test-Path $outputFile) { Remove-Item $outputFile }

# Track first file for header handling

$firstFile = $true

foreach ($file in $csvFiles) {

if ($firstFile) {

# Write header + content from first file

Get-Content $file.FullName | Out-File -FilePath $outputFile -Encoding UTF8

$firstFile = $false

} else {

# Skip header (first line) from subsequent files

Get-Content $file.FullName | Select-Object -Skip 1 | Out-File -FilePath $outputFile -Append -Encoding UTF8

}

Write-Host "CSV merge complete. Output saved to $outputFile"

Python Code:

import pandas as pd

import glob

# Define source and output paths, r string in the begining of the value indicates the value shouldnt be treated specially

source_path = r"C:\Users\lab\Documents\Lab\load_ATP_Data\*.csv"

output_file = r"C:\Users\lab\Documents\Lab\ATP_Merged_py.csv"

# Get all CSV files

all_files = glob.glob(source_path)

# Read and concatenate all CSV files

df_list = [pd.read_csv(file) for file in all_files]

merged_df = pd.concat(df_list, ignore_index=True)

# Export to CSV

merged_df.to_csv(output_file, index=False)

print(f"CSV merge complete. Output saved to {output_file}")

Step 2) option 1: open SSMS to load the data

Right click on database -> tasks -> Import from Flat file (! import data) -> create a new table -> next -> validate auto column detection -> Load the data

Validate the loaded data by selecting top 1000 rows.

$SQLQuery3.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (58))* - Microsoft SQL Server Management Studio$

Step 2) Load data using option 2

Use bulk insert

create table ddl:

CREATE TABLE [dbo].[atp_matches_2M](

[recid] int not null identity(1,1) constraint pk_recid1 primary key,

[tourney_id] [nvarchar](50) NULL,

[tourney_name] [nvarchar](50) NULL,

[surface] [nvarchar](50) NULL,

[draw_size] [tinyint] NULL,

[tourney_level] [nvarchar](50) NULL,

[tourney_date] [date] NULL,

[match_num] [smallint] NULL,

[winner_id] [int] NULL,

[winner_seed] [tinyint] NULL,

[winner_entry] [nvarchar](50) NULL,

[winner_name] [nvarchar](50) NULL,

[winner_hand] [nvarchar](50) NULL,

[winner_ht] [tinyint] NULL,

[winner_ioc] [nvarchar](50) NULL,

[winner_age] [float] NULL,

[loser_id] [int] NULL,

[loser_seed] [tinyint] NULL,

[loser_entry] [nvarchar](50) NULL,

[loser_name] [nvarchar](50) NULL,

[loser_hand] [nvarchar](50) NULL,

[loser_ht] [tinyint] NULL,

[loser_ioc] [nvarchar](50) NULL,

[loser_age] [float] NULL,

[score] [nvarchar](50) NULL,

[best_of] [tinyint] NULL,

[round] [nvarchar](50) NULL,

[minutes] [nvarchar](5) NULL,

[w_ace] [nvarchar](5) NULL,

[w_df] [nvarchar](5) NULL,

[w_svpt] [nvarchar](5) NULL,

[w_1stIn] [nvarchar](5) NULL,

[w_1stWon] [nvarchar](5) NULL,

[w_2ndWon] [nvarchar](5) NULL,

[w_SvGms] [nvarchar](5) NULL,

[w_bpSaved] [nvarchar](5) NULL,

[w_bpFaced] [nvarchar](5) NULL,

[l_ace] [nvarchar](5) NULL,

[l_df] [nvarchar](5) NULL,

[l_svpt] [nvarchar](5) NULL,

[l_1stIn] [nvarchar](5) NULL,

[l_1stWon] [nvarchar](5) NULL,

[l_2ndWon] [nvarchar](5) NULL,

[l_SvGms] [nvarchar](5) NULL,

[l_bpSaved] [nvarchar](5) NULL,

[l_bpFaced] [nvarchar](5) NULL,

[winner_rank] [nvarchar](5) NULL,

[winner_rank_points] [nvarchar](5) NULL,

[loser_rank] [nvarchar](5) NULL,

[loser_rank_points] [nvarchar](5) NULL

) ON [PRIMARY]

*new 86 - Notepad++

$SQLQuery4.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (82))* - Microsoft SQL Server Management Studio$

$SQLQuery5.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (68)) - Microsoft SQL Server Management Studio$

Load data: Same file as above with an additional column to accommodate the recid primary key.

BULK INSERT [dbo].[atp_matches_2M]

FROM 'C:\Users\lab\Documents\Lab\ATP_Merged_bulkcopy.csv'

WITH (FIELDTERMINATOR = ',', FIRSTROW = 2);

$SQLQuery5.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (68))* - Microsoft SQL Server Management Studio$

Step 3) Use logical partition function to process a subset of rows; evaluate its performance.

Before we try the partition, let us first extract the data as is using the options on sql server options & measure time taken for each option.

a) as bacpac file (export data tier application)

output: C:\Users\lab\Documents\Lab\load_ATP_Data\backup\04nov2025_export_atp_matches_2M.bacpac

Actual:

C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.bacpac

Export Data-tier Application 'tsqltestdb'

10s

b) as export to a flat file (export data)

output: C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.csv

$C:\Windows\system32\cmd.exe$

c) using bcp command

bcp tsqltestdb.dbo.atp_matches_2M out "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -S server_name -T -c

actual:

echo %time%

bcp tsqltestdb.dbo.ATP_DATA_5NOV25_BC out "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -T -c

echo %time%

Output:

194996 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total : 4219 Average : (46218.54 rows per sec.)

$C:\Windows\system32\cmd.exe$

Partition the table logically using ntile function:

$SQLQuery5.sql - DESKTOP-FLLGID8.tsqltestdb (DESKTOP-FLLGID8\lab (68))* - Microsoft SQL Server Management Studio$

Avg: 2secs

Load bacpac:

Sqlpackage.exe

$Select C:\Windows\system32\cmd.exe$

$SQLQuery9.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (90)) - Microsoft SQL Server Management Studio$

d) actual partitioning

Partition the data using ntile function (https://learn.microsoft.com/en-us/sql/t-sql/functions/ntile-transact-sql?view=sql-server-ver17)

with t as

(

select recid

,ntile(4) over (order by recid asc) as partid

from tsqltestdb.dbo.ATP_DATA_5NOV25_BC

)

select partid

,min(recid)

,max(recid)

from t

group by partid

order by 1;

Ex.:

bcp "SELECT column1, column2 FROM database_name.schema_name.table_name WHERE condition" queryout "C:\path\to\output.txt" -S server_name -T -c

Actual: servername skipped, since it is localhost

bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 1 and 48749" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part1.txt" -T -c

bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 48750 and 97498" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part2.txt" -T -c

bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 97499 and 146247" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part3.txt" -T -c

bcp "SELECT * FROM tsqltestdb.dbo.ATP_DATA_5NOV25_BC WHERE recid between 146248 and 194996" queryout "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part4.txt" -T -c

Time overall: 2s

ex. output:

48749 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total : 1875 Average : (25999.47 rows per sec.)

C:\Users\lab>

Step 4) Are we able to load such extracted data?

a) Load bacpac:

Download sqlpackage.exe app and install it

url: https://aka.ms/dacfx-msi

If your planning to restore the database and schema along with table you extracted this method is ok.

# Import full BACPAC to temp database

"C:\Program Files\Microsoft SQL Server\170\DAC\bin\"SqlPackage.exe /Action:Import /sf:"C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25.bacpac" /tsn:"192.168.1.100" /tdn:"TempDB_Import" /tec:"False"

Actual:

"C:\Program Files\Microsoft SQL Server\170\DAC\bin\"SqlPackage.exe /Action:Import /sf:"C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.bacpac" /tsn:"192.168.1.100" /tdn:"TempDB5nov_Import" /tec:"False"

$Select C:\Windows\system32\cmd.exe$

$SQLQuery9.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (90)) - Microsoft SQL Server Management Studio$

Query the data to see if the data is loaded properly.

>> it worked just fine, all records part of the only table we were backing up is loaded fine, others DDL is loaded.

But in our case only 1 table; this wont benefit us; so use one of these options...

b) load data alone from bacpac

bcp TempDB_Import.dbo.atp_matches_2M_loadback in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25\Data\dbo.atp_matches_2M\TableData-002-00000.bcp" -T -n

BULK INSERT [TempDB5nov_Import].[dbo].[atp_matches_2M]

FROM 'C:\Users\lab\Documents\Lab\load_ATP_Data\backup\atp_04nov25\Data\dbo.atp_matches_2M\TableData-041-00000.bcp'

WITH (DATAFILETYPE = 'native');

actual:

BULK INSERT TempDB_Import.dbo.atp_matches_2M_loadback

FROM 'C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M\Data\dbo.ATP_DATA_5NOV25_BC\TableData-000-00000'

WITH (DATAFILETYPE = 'native');

verify the record count.

>> all ok

$SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio$

1 sample thread was tested above.

b) Load bcp data back

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M.txt" -T -c

verify the record count.

Output:

194996 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total : 4297 Average : (45379.57 rows per sec.)

C:\Users\lab>

Truncate table:

$SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio$

$C:\Windows\system32\cmd.exe$

$SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio$

c) Load partitioned data extract back using bcp or bulk copy

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part1.txt" -T -c

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part2.txt" -T -c

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part3.txt" -T -c

bcp [TempDB5nov_Import].[dbo].[atp_matches_2M] in "C:\Users\lab\Documents\Lab\load_ATP_Data\backup\05nov2025_export_atp_matches_2M_part4.txt" -T -c

verify the record count.

This should conclude our test scenario.

Duration: 2 secs

Output of 1 thread:

48749 rows copied.

Network packet size (bytes): 4096

Clock Time (ms.) Total : 1250 Average : (38999.20 rows per sec.)

C:\Users\lab>

$SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75))* - Microsoft SQL Server Management Studio$

$SQLQuery10.sql - DESKTOP-FLLGID8.TempDB5nov_Import (DESKTOP-FLLGID8\lab (75)) Executing...* - Microsoft SQL Server Management Studio$

Foot notes:

select * from sys.dm_os_wait_stats where waiting_tasks_count !=0;

this can help fetch wait statistics aggregated across instance in sql server, running this few seconds apart will let us know what is happening majorly in SQL Server.