Since a few weeks, I'm participating in Kaggle data science competitions. First I entered the Titanic survival competition and
now I'm trying to predict sales for the German Rossman stores. I wrote a jupiter notebook with some basic exploration. You can view the notebook on github or get an even nicer view on NBviewer
NBviewer.
Enjoy,
Lode
Cast for two
Monday, November 16, 2015
Kaggle competitions
Posted by
cast42
at
9:06 PM
1 comments
Wednesday, July 30, 2014
Calculating Heart Rate zone based on Lactate Threshold Heart Rate
I already investigated the issue of setting heart rate zones in blogpost "Setting Strava heart rate zones based on Lactate threshold heart rate (LTHR)". In the article "HOW TO FIND YOUR HEART RATE ZONES" all zones are defined along with the protocol to determine your lacatate threshold yourself:
- Z1 < 81% of LTHR
- Z2 81% - 89% of LTHR
- Z3 90% - 93% of LTHR
- Z4 94%-99% of LTHR
- Z5a 100%-102% of LTHR
- Z5b 103%-106% of LTHR
- Z5c > 106% of LTHR
- Z1 Endurance: < 141 bpm
- Z2 Moderate: 142 - 156 bpm
- Z3 Tempo: 157 - 163 bpm
- Z4 Threshold: 164 - 173 bpm
- Z5a: 175 - 179 bpm
- Z5b: 180 - 185 bpm
- Z5c: 186 - 187 bpm
Posted by
cast42
at
11:19 AM
0
comments
Sunday, June 22, 2014
Parsing Strava GPX file with python minidom
GPX is more and more becoming the lingua franca for storing cycling rides. Basically it is a list of points, called track points, with satellite positions (latitude and longitude) decorated with extra information (elevation, heartrate, cadence, power or temperature). A track (<trk>), has one or more segments (<trkseg>) with a list of trackpoints (<trkpt>)
<trk> <name>Track Name</name> <trkseg> <trkpt lat="50.653007542714477" lon="5.558940963819623" />Lots of track point...
<trkpt lat="50.653007542714477" lon="5.558940963819623" /> </trkseg> </trk>Sometimes elevation is also included for every trackpoint:
<trk> <name>Track Name</name> <trkseg> <trkpt lat="50.653007542714477" lon="5.558940963819623" > <ele>60.0</ele> </trkpt>Lots of track point...
<trkpt lat="50.653007542714477" lon="5.558940963819623" > <ele>60.0</ele> </trkpt> </trkseg> </trk>Those GPX files can be used as tracks that you can follow. You can copy those file to the New File directory on your Garmin and you can select them to follow. If
<ele> … </ele>elements are included you will see how the height evolves in front of you.
If you download the GPX file from a ride you rode with a bike computer that registers cadence and heartrate, the file will look like this: You can see the whole file here: https://gist.github.com/cast42/727f48a0358fa67e60fb
The elevation element is part of the standard GPS specification as defined by the XML schema provided hre: http://www.topografix.com/gpx.asp The heartrate and cadence are stored using trackpoint extensions as specfied here: http://www8.garmin.com/xmlschemas/GpxExtensionsv3.xsd
Here is a simple Python example to parse a Strava GPX file with extensions using minidom: It is required that cadence and heartrate are added to every trackpoint but temperature is optional.
Posted by
cast42
at
10:08 PM
3
comments
Monday, May 19, 2014
Setting Strava heart rate zones based on Lactate threshold heart rate
- Z1 Recovery 65% - 81%
- Z2 Aerobic 82% -88%
- Z3 Tempo 89% - 93%
- Z4 Subtreshold 94% - 100%
- Z5 Suptheshold 100% - 102%
- Z6 Anaerobic > 102%
- Active recovery < 68% of FTHR
- Endurance 69% - 83% of FTHR
- Tempo 84% - 94% of FTHR
- Lactate threshold 95%-105% of FTHR
- VO2 Max >106% of FTHR
- Anaerobic Capacity N/A
- Neuromuscular N/A
- Endurance : < 68% of LTHR
- Moderate: 69% - 83% of LTHR
- Tempo: 84% - 94% of LTHR
- Threshold: 95%-105% of LTHR
- Anaerobic: > 106% of LTHR
Posted by
cast42
at
1:22 PM
2
comments
Saturday, March 22, 2014
Comparing power models for cycling
On January 23, 1984, Francesco Moser set a new Hour Record of 51.151km/h at altitude in Mexico City. What probably contributed to that success was the theory developed by Prof. Conconi. He theorized that heart rate could be correlated with perceived exertion in order to allow Moser to cycle at the absolute maximum of his capability. Ever since, training with a heart rate monitor became more and more popular.
Training with a heart rate monitor has its limitations. Suppose, last week you drove your favorite time trail lap for 20 minutes at an average heartbeat of 155 beats per minute (bpm). This week you did the same test, riding the same distance but your heart rate was 5 beats higher on average. Does that mean that your condition lowered ? The answer is that you can't be sure. Maybe you had more headwind and had to push harder to ride at the same speed. Maybe you ate something before the test that still had to be digested. Maybe you didn't sleep well ? Maybe you were stressed. Heart rate is only an indication what is going on in the black box of your body but is influenced by a lot of external parameters that can't be controlled during a ride.
Enter the power meters. They measure exactly what power you're pushing instantaneously. In case of headwind, you'll drive slower but the power readings will be higher. It allows to assess if your body is performing better or not without guessing. Therefore power meters allow to train more scientifically. Power meters give objective values about the performance and hence can be better trusted to evaluate your training efforts.
A central curve helping to gauge your performance is the power duration curve. How many power can you generate for how long ? A typical power curve looks like this:
The horizontal axis is the duration in seconds (usually on a logarithmic scale) and the vertical axis is the delivered power in Watt. The maximum power one can generate for 1 second during a sprint is called the Peak Power. The power one can deliver for one hour is called the Functional Threshold Power (FTP). The limit of the power one can generate forever is called the Critical Power (CP).
But how do we obtain such a curve ? If you record your rides with a computer, software can derive this from the measured values. Since curves of different riders show up similarly, researcher started to believe a model could predict the curve based on some measurements. For example, record how much power you can delivers for 1 minute, 3 minutes and 20 minutes and the whole curve can be reconstructed. In the paper "Rationale and resources for teaching the mathematical modeling of athletic training and performance" by Clarke DC, Skiba PF contains a good overview of the state of the art. The formula to derive power in function of the duration is as follows:
$P = AWC (1/t) + CP$ where AWC is the Anaerobic Work Capacity in Joule, $t$ is the duration in seconds and CP is the Critical Power in Watt. The AWC (called W' nowadays) represents the finite amount of energy that is available above the critical power. CP is power that can be sustained without the fatigue for very long time (longer than 10 hours). See also the paper by Charles Dauwe, "Critical Power and Anaerobic Capacity of Grand Cycling Tour Winners". Recently, @acoggan, @veloclinic and @djconnel are working on more sophisticated models.
@veloclinic proposed in "Cycling Research Study Pre Plan":
$P(t) = \frac{W’_1}{t+\tau_1} + \frac{W’_2}{t+\tau_2}$
and since $W’ = P \times \tau $
$P(t) = \frac{P_1 \tau_1}{t+\tau_1} + \frac{P_2 \tau_2}{t+\tau_2}$
@veloclinic guesses that the new Trainings Peak model to be included in WKO4 is:
$P(t) = \frac{FRC}{t} (1-e^{-\frac{t}{\frac{FRC}{P_{max}-FTP}}}) + FTP + \alpha (t-3600)$
For more information about WKO's new model, watch the youtube video.
And Dan Connelly arrived at:
$P(t) = P_1 \frac{\tau_1}{t}( 1 - e^{-\frac{t}{\tau_1}} ) + \frac{P_2}{(1 + \frac{t}{\tau_2})^ {\alpha_2}}$
Fitting those models to my data, gives the following result (source code):
The estimated veloclinic parameters and their 95% confidence interval are:
- $P_1$ = 291.910642398 Watt [252.385602829 331.435681968]
- $\tau_1$ = 28.6068269179 seconds [17.5113005392 39.7023532965]
- $P_2$ = 800.776804574 Watt [726.249462145 875.304147003]
- $\tau_2$ = 0.312268982434 seconds [-0.16239984039 0.786937805257]
The estimated WKO4 parameters and their 95% confidence interval are:
- FRC = 20463.525586 Joule [13518.0176693 27409.0335028]
- $P_{max}$ = 1011.29658965 Watt [937.344413201 1085.2487661]
- FTP = 257.543501763 Watt [226.388909604 288.698093922]
- $\alpha$ = -0.00512616028401 [-0.00777859140802 -0.00247372916]
The estimated djconnel parameters and their 95% confidence interval are:
- $P_1$ = 730.972437292 Watt, [638.464114912 823.480759673]
- $\tau_1$ = 19.9194404736 seconds [9.71207387339 30.1268070738]
- $P_2$ = 324.66235674 Watt [246.999942142 402.324771339]
- $\tau_2$ = 0.312268982434 seconds [-0.16239984039 0.786937805257]
- $\alpha_2$ = 0.312268982434 [-0.16239984039 0.786937805257]
Update: Apparently, the formula in the end of the blogpost from Dan Connely which I used in this blogpost was not correct and must be:
$P(t) = P_1 \frac{\tau_1}{t}( 1 - e^{\frac{-t}{\tau_1}} ) + \frac{P_2}{( 1 + \frac{t}{\alpha_2\tau_2} )^{\alpha_2}}$
Remark the extra $\alpha_2$ in the denomintator under $t$ in the last term of the formula.
Strangely, the confidence interval behaves strange. I have to look into it. The estimated djconnel parameters of the updated model and their 95% confidence interval are:
- $P_1$ = 730.972437292 Watt, [638.464114912 823.480759673]
- $\tau_1$ = 19.9194404736 seconds [9.71207387339 30.1268070738]
- $P_2$ = 324.66235674 Watt [246.999942142 402.324771339]
- $\tau_2$ = 8793.18635004 [-12682.9310423 30269.3037424]
- $\alpha_2$ = 0.312268982434 [-0.16239984039 0.786937805257]
The parameters of this simplified model are:
- $P_1$ = 753.75304433 Watt [676.513606926 830.992481733]
- $\tau_1$ = 27.1488626906 seconds [17.4753205281 36.822404853]
- $P_2$ = 275.99764045 Watt [238.468417448 313.526863451]
- $\tau_2$ = 53841.0314734 seconds [30850.1607097 76831.9022372]
- $P_1$ = 743.251464989 Watt [666.223087684 820.279842294]
- $\tau_1$ = 23.648384644 seconds [14.5239744624 32.7727948257]
- $P_2$ = 297.690593484 Watt [253.858306164 341.522880804]
- $\tau_2$ = 24294.3750137 seconds [8540.40600978 40048.3440176]
Posted by
cast42
at
1:56 PM
2
comments
Labels: cycling
Friday, January 31, 2014
Adding virtual power to TCX for the Tacx Blue Motion Cycling Trainer
I recently bought a cycle trainer for indoor training: Tacx Blue Motion T2600 for 185€ at fiets.be, a local cycling store. Using my Garmin 800, i could record my heartrate, cadence and speed while riding a workout. Since I have no power meter on my bike, I was barred from a feature that higher and more expensive trainers offer. But in the documentation of the trainer, I found a graph showing a linear relation between speed and power. So if I just could add this to the recorded file before submitting it to Strava, I would have trainer for less than 200€ with power measurement. This is the graph:
<Trackpoint> <Time>2014-01-29T20:38:59Z</Time> <AltitudeMeters>157.4000244</AltitudeMeters> <DistanceMeters>14850.7099609</DistanceMeters> <HeartRateBpm xsi:type="HeartRateInBeatsPerMinute_t"> <Value>139</Value> </HeartRateBpm> <Cadence>92</Cadence> <Extensions> <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2" CadenceSensor="Bike"> <Speed>8.4530001</Speed> </TPX> </Extensions> </Trackpoint>At this time, the speed was 8.4530001 meter per second. To convert this to km/h, we have to divide by thousand and multiply by 3600 (the number of seconds in an hour). So speed_in_kmperh = speed /1000.0 * 60 *60 = 30.43080036 km/h. The power developed at that moment was : 30.43080036/6.0*50.0 = 253.590003 Watts. We convert to integer : 253 Watt. To add this to the TCX file, we add a line
<Watts>253</Watts>
as follows:
<Trackpoint> <Time>2014-01-29T20:38:59Z</Time> <AltitudeMeters>157.4000244</AltitudeMeters> <DistanceMeters>14850.7099609</DistanceMeters> <HeartRateBpm xsi:type="HeartRateInBeatsPerMinute_t"> <Value>139</Value> </HeartRateBpm> <Cadence>92</Cadence> <Extensions> <TPX xmlns="http://www.garmin.com/xmlschemas/ActivityExtension/v2" CadenceSensor="Bike"> <Speed>8.4530001</Speed> <Watts>253</Watts> </TPX> </Extensions> </Trackpoint>The next step was to automate the calculation of the power and adding it to the TCX file. I wrote the following Python script to do that:
prompt> python vpower.py > vpower_29-01-14\ 20-53-27.tcxI uploaded the resulting TCX file to Strava and obtained this:
Of course, the power in this workout is based on the fact that I left the lever om my trainer on position 5 during the whole workout. If you change the position of the lever during the workout, this approach will give wrong results.
Is this approach of "virtual power" accurate ? Not as accurate as power meters on the bike but usable I would argue. The concept of "virtual power" is also supported by Trainer Road Software. Later, I found an interactive graph of the speed power relation for Tacx Blue Motion on the website of Tacx. From that graph, I could obtain more precise datapoints : at 60 km/h power is 407 Watt. So next time I use my script, I will use power = speed_in_kmperh / 60.0 * 407.0 as power formula.
Posted by
cast42
at
12:14 PM
3
comments
Labels: cycling
Wednesday, January 15, 2014
Euler : great talk by William Dunham
Having a Google Chrome Cast at home increased my longform consumption on Youtube dramatically. For instance, I discovered this talk about the great mathematician Euler by William Dunham:
You can check it by running following Python code:
You may wonder if there are other polynomials that generate primes. The article "Prime Generating Polynomials" claims that the polynomial $(x^5 - 133x^4 + 6729x^3 - 158379x^2 + 1720294x - 6823316)/4$ generates 57 consecutive primes for $x \in [0,56]$. Also the Wolfram article " Prime-Generating Polynomial indicates that polynomial as a winner. There are of course formula's to generate prime numbers but i think the Euler polynomial is the fastest.
The solution to Euler problem 27 is also interesting. A second degree polynomial generates 71 primes (but they are not consecutive). The polynomial $x^2 - 61x + 971$ generates 71 primes for $ x \in [0,70]$. This is the longest solution when the absolute values of the coefficients are restricted to thousand. If we drop that restriction, an even stronger polynomial is found : $x^2 -79 x + 1601$ generates 80 primes. Remark that the generated primes are not unique. For example the numbers 1601, 41, 197, 797, 1373, 1523 are generated twice. From the 80 primes generated, 40 are unique. I think Euler would not have been impressed.
Posted by
cast42
at
4:48 PM
0
comments
Tuesday, December 11, 2012
Random Forests are the new kid in machine learning town
Is was reading "Specialist Knowledge Is Useless and Unhelpful
When data prediction is a game, the experts lose out." I learned about the new algorithm that seems like a silver bullet for data mining problems: random forests. An explanation in layman's terms can be found on Quora. It still don't fully grasp the idea but I think its worth exploring further to add to a toolbox to solve problems. There exist a R package for random forests for some quick exploration on your datas. Happy hacking.
Posted by
cast42
at
9:39 AM
2
comments
Labels: algorithms, machine learing
Monday, October 10, 2011
Design inspiration
This weekend I was browsing with zite app on Ipad. Especially in the "webdesign and user experience" section I encountered some interesting links:
UX: The Power of Getting Things Designed
- 16 PIXELS For Body Copy. Anything Less Is A Costly Mistake
- Fundamentals of Good UI Design PDF (23 MB) from Ghost in the pixel
Posted by
cast42
at
9:42 AM
0
comments
Labels: usability, user experience, webdesign
Monday, August 22, 2011
Back from holiday
Posted by
cast42
at
11:14 AM
0
comments
Saturday, June 25, 2011
It's not the story, the people, the technology, it's the team
- constant review
- it must be safe for people to tell the truth
- communication should not mirror the organizational structure
- people and how they function is more important than ideas
- do not let success mask problems, do a deep assessment
- mix up creative and technical people
Posted by
cast42
at
2:32 PM
0
comments
Labels: pixar
Wednesday, March 16, 2011
Tuesday, December 07, 2010
Bye bye battery of MacBook Pro
Posted by
cast42
at
7:27 PM
1 comments
Labels: battery, macbook pro
Saturday, September 11, 2010
Fireworks shot with Canon Ixus 300HS
Yesterday evening, I went to the yearly firework held in Leuven, Belgium. I filmed the finale with my new Canon Ixus 300HS (aka Powershot 4000 SD) at 720p. Here's the result:
I also took some pictures in the fireworks mode. I think the movie catches the moment much more than the pictures. But for those interested, here is a picasa album containing the original unedited files. Here's a nice one:
From Vuurwerk |
Posted by
cast42
at
11:40 AM
0
comments
Labels: canon Ixus 300HS, firework, youtube
Saturday, July 31, 2010
Tuesday, July 27, 2010
Hosting my pictures in the cloud: Google Storage
My workflow is now as follows:
- Copy the pictures from my digital camera (currently a Canon Ixus 300 HS) to my mac with Iphoto. Iphoto puts the pictures into new events (= pictures taken at the same time of the day)
- Then I select the pictures I like and put them into an album of Iphoto.
- Next I export the album on Picasa Web using Picasa Web Albums Uploader, selecting "Actual Size" so that it archives the original files in the cloud. This may take a bit longer for uploading but will save the day when all my pictures disappear from my mac and backup.
If I manage to fill up the 20 Gigabyte, it can upgrade to 80 GByte/ year for 20 dollar/year. So that's save for the future ;-)
Posted by
cast42
at
9:08 PM
1 comments
Labels: google, picasa, technology
Wednesday, May 19, 2010
HTML5 example to change the opacity of an image via CSS3
Here's a simple HTML5 example that changes the opacity of an image via CSS3 using an input range element and some Javascript.
<!DOCTYPE html> <html> <head> <title>HTML5 example to change the opacity of an image via CSS3</title> </head> <body> <img id="img_0576" src="IMG_0576.jpg" alt="My bike"style="opacity: 0.5;" /> <input id="img_op" type='range' min='0' max='100' value='50' onchange="changeOpacity()"> <script> function changeOpacity() { var opacity = document.getElementById('img_op').value/100; document.getElementById('img_0576').style.opacity = opacity; } </script> </body> </html>This example is mainly to test out the use of code highlighting on Blogger as explained by Luka Marinko. It seems to work well. Huray!
If you're interested in HTML5 you can follow the Friendfeed on HTML5.
Posted by
cast42
at
2:04 PM
2
comments
Labels: code, friendfeed, HTML5
Monday, April 26, 2010
Trailer for the episode 3 of the virtual revolution on the Flemish television
On tuesday 24 april 2010, VRT will air a dutch spoken version of the third episode of BBC's Virtual Revolution. Here's the trailer:
The code to embed this:
<!-- BEGIN EMBEDCODE CANVAS-->
<div id='canvasvideo_container_47281' style="width: 507px; height: 320px; border: 1px solid black;">
<object id="canvasvideo_47281" width="507" height="320">
<param name="movie" value="http://static.vrt.be/swf/jwplayer45.swf"/>
<param name="allowScriptAccess" value="always" />
<param name="allowFullScreen" value="true" />
<param name="flashvars" value="config=http://video.canvas.be/embed%3Fvideo%3D47281"/>
<param name="wmode" value="transparent">
<embed type="application/x-shockwave-flash" wmode="transparent" name="media" src="http://static.vrt.be/swf/jwplayer45.swf" quality="high" allowscriptaccess="always" allowfullscreen="true" flashvars="config=http://video.canvas.be/embed%3Fvideo%3D47281" width="507" height="320">
</embed>
</object>
</div>
<!-- EINDE EMBEDCODE CANVAS-->
It's a pitty the embed code is not working on Blogger.....
UPDATE: It tried to solve this by using an Iframe:
usign this code:
<iframe src ="http://programmas.canvas.be/wp-content/uploads/2010/04/The-virtual-revolution-Aflevering-3-trailer.html" width="507" height="320">
But that is still not working. Strange.
UPDATE 3: maybe I have to URL unescape the value of flashvars to
http://video.canvas.be/embed?video=47281
Posted by
cast42
at
10:34 PM
3
comments
Saturday, March 06, 2010
Skiing in Stuben am Arlberg, Austria
In a yearly tradition, I publish a short movie about the skiing holiday. It's not as cool as with a Go Pro Hero cam but still interesting. This year I edited again with Imovie on the mac. Imovie is really the tool you need for such a job. When finished, I pushed the button to upload to youtube and half an hour later:
This year we stayed in Stuben am Arlberg in Austria. They say that stuben is the capital of off piste skiing, but this year we stayed on the slopes because the danger of snowavalanches was very real.
I only noticed an annoying bug. Although the star wars end trailer is in imovie:
the end of the movie on Youtube is just the black background with white stars but without the moving end credits. Annoying bug !
Posted by
cast42
at
3:07 PM
2
comments
Labels: austria, imovie, skill level, stuben
Monday, February 01, 2010
How to deal with search crawlers for your mobile site
So you've setup your mobile site (for example hosted at http://m.yoursite.com) derived from a desktop version (for example hosted at http://www.yoursite.com).Typically, you're using a Content Management System and by providing adapted templates for your mobile items, you can provide a mobile version of your site. The question is now how to deal with robots that crawl the web to build a search index. The danger exists that the robots detect duplicate content because the mobile version from a content item might contain the same text and pictures but wrapped on another template. I think the following steps should be taken:
- Only allow the mobile web crawlers with the following robot.txt in the root of the mobile site (for example http://m.yoursite.com) by allowing bots with user agent "Googlebot-Mobile" or "YahooSeeker/M1A1-R2D2" and to disallow all others:
User-agent: Googlebot-Mobile
Disallow:
User-agent: YahooSeeker/M1A1-R2D2
Disallow:
User-agent: *
Disallow: /
Also, disallow mobile crawlers to your desktop version of your site by adding the following robot.txt in the root of your site (for example http://www.yoursite.com/robot.txt ):
User-agent: Googlebot-Mobile
Disallow: /
User-agent: YahooSeeker/M1A1-R2D2
Disallow: /
User-agent: *
Disallow:
With the first robot.txt in the mobile root and the second one (here above) in the root of your desktop site, your mobile site items should only appear when people search with a mobile search engine (for example by using http://m.google.com ) but not when searching with the desktop version (for example http://www.google.com ).
As far as I know, the MSNbot that crawls for the Microsoft Bing index together does not have a bot version that crawls strickly for the mobile Bing search engine at http://m.bing.com. - Add your mobile site to Google : http://www.google.com/support/webmasters/bin/answer.py?answer=40348 , Bing http://www.bing.com/webmaster/SubmitSitePage.aspx, Yahoo (http://siteexplorer.search.yahoo.com/mobilesubmit) and other relevant mobile indexes
- Create a mobile sitesmap : http://www.google.com/support/webmasters/bin/answer.py?answer=34648&cbid=-1rt6r3us7wrvl&src=cb&lev=answer
Let me know I've you have anything to add to this strategy in the comments.
Posted by
cast42
at
5:51 PM
0
comments