Basic Data science with Powershell
satya - 2/21/2020, 11:27:06 AM
My Powershell coding journal is here
satya - 2/21/2020, 11:28:21 AM
Consider a JSON like this
{
"success": true,
"data": [
{
"turbine": "TO02",
"tag_name": "ActivePower",
"tag_value": "1709.2",
"timestamp": "2020-01-30 00:00:04"
},
{
"turbine": "TO02",
"tag_name": "WindSpeed",
"tag_value": "8.82",
"timestamp": "2020-01-30 00:00:04"
},
]
...
}
satya - 2/21/2020, 11:29:58 AM
An object structure
class datum
{
String turbine;
String tagname;
String tagvalue;
String timstamp;
}
class jsonData {
bool success;
//An array of datum objects
datum[] data;
}
satya - 2/21/2020, 11:34:15 AM
I can do this in Powershell
#Read from a JSON file and convert it to JSON
$s = Get-Content $jsonfile | ConvertFrom-Json
#access the success field of the object
$successCode = $s.success
#Get the data array where objects
$data = $s.data
#Report how many objects are there in the data array
p -message "There are $($data.Count) number of records"
satya - 2/21/2020, 11:35:05 AM
Lets read the first 5 rows of that array
#Get first 5
$data | Select-Object -First 5
turbine tag_name tag_value timestamp
------- -------- --------- ---------
TO02 ActivePower 1709.2 2020-01-30 00:00:04
TO02 WindSpeed 8.82 2020-01-30 00:00:04
TO02 ActivePower 1710.2 2020-01-30 00:00:09
TO02 WindSpeed 8.16 2020-01-30 00:00:09
TO02 WindSpeed 8.54 2020-01-30 00:00:18
satya - 2/21/2020, 11:36:14 AM
GetUnique time instances from the 20,000 arrray
function getUniqueTimeInstances($data)
{
#Object (turbine, tag_name, tag_value, timestamp)
$timeArray = $data | Select-Object -Property timestamp -Unique
return $timeArray
}
satya - 2/21/2020, 11:38:39 AM
Where clause: Get all rows where the turbine is T002 and tag name is Activepower
function getNumberOfActivePowerInstancesForTurbine($turbine)
{
$turbineArray = $data | Where-Object {$_.turbine -eq $turbine -and $_.tag_name -eq "ActivePower"}
return $turbineArray
}
satya - 2/21/2020, 11:39:22 AM
Sort the data arrray by timestamp
$sortedData = $data | Sort-Object -Property "timestamp"
satya - 2/21/2020, 11:41:02 AM
Here is how to walk through the dataset
function walkthrough($data)
{
$sortedData = $data | Sort-Object -Property "timestamp"
$c = 0
foreach ($row in $sortedData) {
if ($c -eq 466)
{
break
}
$c = $c + 1
$time = $row.timestamp
$tid = getRecordForTime($time)
if ($tid -eq $null)
{
#record doesn't exist
#p -message 'no record'
addTimeInstance -time $time -row $row
}
else {
#record exists
#p -message "record exists"
updateTimeInstance -time $time -tid $tid -row $row
}
}
}
NOte: All functions are not here
satya - 2/21/2020, 11:43:54 AM
Working with a few arrays
function printTIDArray()
{
#Declare an array of objects
[TimeInstanceDetailData[]]$tidArray = @()
#Read the values from a hashtable and put them in the array
foreach ($item in $script:timeInstanceDataTable.Values)
{
# p -message "item: $($item.time)"
$tidArray += $item
}
#pring a header line
hp -message "The TID array"
#Sort the arrray
$sortedTidArray = $tidArray | Sort-Object -Property "time"
#Print the array in a tabluar format
$sortedTidArray | Format-Table
}
satya - 2/21/2020, 11:44:44 AM
Here is an example of that arrray
time apCount wsCount wdCount tCount
---- ------- ------- ------- ------
2020-01-30 00:00:04 19 19 2 0
2020-01-30 00:00:07 0 0 1 0
2020-01-30 00:00:08 0 1 0 0
2020-01-30 00:00:09 6 5 1 0
2020-01-30 00:00:12 0 0 1 0
2020-01-30 00:00:14 13 13 0 0
2020-01-30 00:00:18 0 1 0 0
2020-01-30 00:00:19 19 18 1 0
2020-01-30 00:00:24 19 19 2 0
2020-01-30 00:00:28 0 1 1 0
2020-01-30 00:00:29 19 18 1 0
2020-01-30 00:00:30 0 0 1 0
2020-01-30 00:00:33 0 0 1 0
2020-01-30 00:00:39 19 19 1 0
2020-01-30 00:00:42 0 0 1 0
2020-01-30 00:00:44 19 19 0 0
2020-01-30 00:00:49 19 19 2 0
2020-01-30 00:00:53 0 1 1 0
2020-01-30 00:00:54 18 17 1 1
2020-01-30 00:00:55 1 1 0 0
2020-01-30 00:00:56 0 0 1 1
2020-01-30 00:01:03 0 2 0 0
2020-01-30 00:01:04 19 17 1 0
2020-01-30 00:01:09 18 18 1 1
2020-01-30 00:01:10 1 1 0 1
2020-01-30 00:01:14 13 10 0 0
satya - 2/21/2020, 11:45:06 AM
This is one way to analyze whats going on with a large sets of input data
This is one way to analyze whats going on with a large sets of input data