# Breeding Algo 2.0

Before we dive into the traits, let’s walk through what we think we know about the new breeding algo and how it differs from 1.0. Let’s breakdown 1.0 first:

### Breeding Algo 1.0:

- Only BA is passed down and only from direct parents
- Variance and Distance Preference were 100% randomly assigned
- The general Breeding Algo equation took an average of the parents BA – solid degradation with natural distribution of results around that, rarely exceeding the parents average.
- Using our tool, it’s roughly Ave(Parents BA) – 15 …add natural distribution around that with 3SDs pretty close to the parents average.

And by contrast, what we know (and in some cases think) about 2.0

### Breeding Algo 2.0

- All three traits (BA, VAR, DP) are used in the new breeding algo.
- Ancestor traits can be used in the equation.
- There is far less degradation in BA, even before accounting for ancestor pulls.
- The correlation between parents and offspring are strongest with BA, then DP, with VAR a distant third.
- The algo aims to curb results on the high and low end. IE There’s a much better chance two bad horses have a better foal then either parent than with two good parents.
- Using our metrics, a rough equation looks something like BA = Ave(SireBAinput,MareBAinput) -5. DP = Ave (SireDPinput, DamDPinput) – 5. VAR = Ave(SireVARinput, DamDPinput) -7. Where 1) the input has some X chance of being pulled from an ancestor and 2) the subtraction slides higher the higher the total, and lower the lower the total.
- My best guess at this point, based on the delta between parents and offspring BA between legendary offspring (no ancestor pulls) and exclusives, is that each side with ancestors has a 10% chance of pulling an ancestor.
- Based on my research, I believe that each trait is pulled from ancestors (or not) individually. As in, an entire ancestor isn’t pulled. This theory would give you about a ~50-60% chance of getting at least ONE trait pulled from an ancestor (if ancestors on both sides of the breed), and a 1 in a Million chance of hitting all 6 traits.
- Frankly I have no idea how to tell how Zed selects the ancestor within the tree or what the odds are within the ancestor tree of getting a specific ancestor. I can’t comprehend how we’ll ever have enough data to even approximate that with how often traits align and match others in the tree.

## BA: Base Ability

In ZED, Base Ability refers to a horse’s innate mean speed, before the addition/subtraction of distance preference on either side. Simply put, it’s your “base” mean speed before the distance preference addition/subtraction. Base Ability is independent from the horse’s distance preference and variance assignments at birth.

** Update:** if you’ve used The Zedge, you know the importance of sample size in trusting the scores. As part of our upcoming metric update, we’ve done a massive overhaul of the main base ability metric, adding conditional logic to find the most reliable base ability metric based on where we have sample. Your horse’s primary “BA” score will use either our updated eBA, 16BA, or DPBA method depending on where the most data lies. Let’s start with every horse’s default BA metric:

## eBA: *Expected Base Ability of Offspring*

eBA is a horse’s expected base ability based on our breeding degradation formula using the expected base ability of it’s parents (or the population average of the Z-level for Genesis). It’s the “default” BA metric every horse starts with until it’s built a reliable sample at 1600 or either/both extremes.

## 16BA: 1600m Derived Base Ability

In order to best isolate base ability from the other two variables, we’ve chosen to use a normalized mean speed at 1600m, as it’s the only distance without preference. There are numerous ways to normalize speeds/times in Zed (to remove ZED’s manipulation of times) each with unique strengths and weaknesses. After creating 3 funnel ELOs for every horse in ZED, we looked how groups of average race ELOs correlated with mean times. We found shockingly high correlations (.97-.98) for the middle finishes, with variance influencing some wiggle on either extreme. That was enough for us to normalize mean speeds for BA using the average race ELO. This ends up completely removing variance influences from the mean times as well, nice little bonus. It also allows us to appropriately weight the competition in every race, as the day-to-day levels change so drastically, based on tournaments, etc.

As with all of our trait metrics, we use a 0-100 scale and conditional highlighting for easier consumption of the data.

## Race Count: *Confidence Level*

The downside of using only 1600m results is that we often lack sufficient sample. The race counts should be used as confidence levels in the 16BA scores (we’ve added conditional highlighting to help with this). For horses with above average variance, less than 25 races and the data can be very noisy, where a cold or hot streak can lead to a misleading finish pattern. 50+ and we’re feeling good about the data.

As part of our update, we’ll apply 16BA as your BA metric when your horse has 50 races or more at 1600 OR has 25 races at 1600 and a VAR score average or below.

## dpBA: Base Ability based on Distance Preference

The final BA metric we use to approximate base ability is dpBA. It’s particularly useful for strong distance preference horses with data at both extremes but very little at 1600. Using average Race ELO again, this metric calculates a horses normalized mean speed at 1800+ and 1400- and uses that average (with some special sauce) to get at a 1600 equivalent BA. It will also use eBA and data at one extreme if that’s all it has.

While our main BA metric will automate all of this and pull the metric most meaningful, we will eventually provide all BA metrics for your horse (we now do).

## DP: Distance Preference

Distance preference is, simply put, a speed add-on/subtraction that results in your horse having an ideal distance away from 1600m. If BA was a stake in the ground at 1600m, distance preference is a plank laying across all distances whose degree of “tilt” represents strength of distance preference. As such, it adds ability to the strong side, and removes ability from your weak side. A DP plank laying flat across your BA stake would represent no DP and likely a horse whose best distance is 1600m.

Using the same normalized times from our average race ELO, removing the effects of variance completely, we can create a BA+DP metric at every distance. Once that’s done, we can run a weighted average on either side of 1600 to derive a delta from the strong side to the weak side. That range represents a horse’s distance preference. It’s adjusted to a -100 to 100 scale where -100 equals the strongest short distance preference, 100 equals the strongest long distance preference and 0 represents no distance preference.

## VAR: Variance

Variance is probably the easiest racing trait to visualize. Less easy to calculate. It’s a measure of a horses range of possible outcomes. A horse with extreme variance will typically have more placements at 1st and 12th (caveat for horses dominating the field in BA or DP), while a lower variance horse will find its placements bunched much closer together. To calculate a horse’s variance, we return in part to the average race ELO to use the highly correlated/average 6th and 7th place times in the race. We then compare that average (6.5th place time) to the ACTUAL race time, and adjust all horses’ times in the race by that amount to adjust for the ZED factor (a manipulation of the race times).

While the most likely way Zed adjust race times is using Z-scores and Standard Deviations (as in, lower/raise every horse’s time by .5 of their SD), using that method is super difficult and requires having enough data on every horses in the race at that distance to adjust accurately. That almost never happens so it often ends up less accurate than a clean raw time addition/subtraction. At least for our purposes.