It is not uncommon to arrive where we have found ourselves to be. The main thing is to understand this week is that no matter how well you design a solution or start to extend it within your Continual Improvement cycles. In this case, a lapse led to discovering a more significant issue that was not immediately visible with the way the HTML extraction was running.
Taking apart the issue that we identified within the dataflows – the HTML Table extraction not delivering a header was relatively simple. The problem it caused was that the tool could not process future events. While it seems trivial, after all, future events will be blank. The benefit is that being able to process into the future means that you can “stage” your data, so only review and make updates once a month or so (five weeks within each dataflow).
The lessons to take from today are relatively simple; however, they are vital. While we are still looking to understand why there is a difference, it is essential to remember that when reading an HTML source, you should use a “Table” version and be aware of steps that follow on, “Promote Headers” can cause issues in later stages.
- Name everything
Power Query creates new columns normally giving you the option of going back into the script and naming them. Take this opportunity whenever it is available - Use the Applied Steps properties
Renaming steps makes it easioer to go back and debug code, the use of comments (descriptions) adds to this.
Name Everything
Make sure you consider what you are doing and avoid bringing columns into your model without a proper name. Our errors showed an issue with “Column7”. This meant that we had to go through all the steps to identify the source of the problem.
Use the Applied Steps properties.
Within the applied steps, you should take time to consider how you can make your future self (or your replacements) lives more manageable. Name steps in line with what they are doing and the function they are performing, not just the basics of what is being carried out. Where appropriate, you can use the comments to provide more details of why a step is being carried out.
let
Eventnumber = "36",
Source = Web.Contents("https://www.parkrun.org.uk/jesmonddene/results/weeklyresults/?runSeqNumber=" & Event),
#"Extracted Table From Html" = Html.Table(Source, {
{"Link", "a", each [Attributes][href]},
{"Name", ".Results-table-td--name .compact *"},
{"Gender", ".Results-table-td--gender .compact"},
{"GP_Full", ".Results-table-td--gender .detailed"},
{"Category", ".Results-table-td--ageGroup A"},
{"Age Grade", ".Results-table-td--ageGroup .detailed"},
{"FinishTime", ".Results-table-td--time .compact"},
{"PB", ".Results-table-td--time SPAN"},
{"Club", ".Results-table-club A"},
{"r10", ".milestone-r10"},
{"r25", ".milestone-r25"},
{"r50", ".milestone-r50"},
{"r100", ".milestone-r100"},
{"r250", ".milestone-r250"},
{"r500", ".milestone-r500"},
{"v25", ".milestone-v25"},
{"v50", ".milestone-v50"},
{"v100", ".milestone-v100"},
{"v250", ".milestone-v250"},
{"v500", ".milestone-v500"}},
[RowSelector=".Results-table-td--name .compact *"]),
// The URL will need to be rebuilt if a "Go to participant page" is ever to be added?
#"Extract the ParkrunID from link URL" = Table.SplitColumn(Table.TransformColumnTypes(#"Extracted Table From Html", {{"Link", type text}}), "Link", Splitter.SplitTextByEachDelimiter({"/"}, QuoteStyle.Csv, true), {"Link.1", "parkrunID"}),
#"Added Finish Positions" = Table.AddIndexColumn(#"Extract the ParkrunID from link URL", "FinishPosition", 1, 1, Int64.Type),
#"Remove extra text to convert Age Grade to a Perventage" = Table.ReplaceValue(#"Added Finish Positions", " age grade", "", Replacer.ReplaceText, {"Age Grade"}),
#"Added custom" = Table.TransformColumnTypes(Table.AddColumn(#"Remove extra text to convert Age Grade to a Perventage", "Time", each if Text.Length([FinishTime]) =5 then "00:" & [FinishTime] else [FinishTime]), {{"Time", type time}}),
// Link.1
// FinishTIme
#"Remove original columns that are not needed" = Table.RemoveColumns(#"Added custom", {"Link.1", "FinishTime"}),
// Age Grade - Percentage
// ParkrunID - Whole Number
#"Changed column types" = Table.TransformColumnTypes(#"Remove original columns that are not needed", {{"Age Grade", Percentage.Type}, {"parkrunID", Int64.Type}}),
#"Added EventNumber" = Table.TransformColumnTypes(Table.AddColumn(#"Changed column types", "EventNumber", each Eventnumber), {{"EventNumber", Int64.Type}})
in
#"Added EventNumber"