PHP 트레잇 MathPHP\Statistics\Regression\Methods\LeastSquares

공개 메소드들

메소드	설명
CI ( number $x, $p ) : number	The confidence interval of the regression for Simple Linear Regression ______________ 1 (x - x̄)² CI(x,p) = t * sy * / - + -------- √ n SSx
DFFITS ( ) : array	DFFITS Measures the effect on the regression if each data point is excluded.
FProbability ( ) : number	The probabilty associated with the regression F Statistic
FStatistic ( ) : number	The F statistic of the regression (F test)
PI ( number $x, number $p, integer $q = 1 ) : number	The prediction interval of the regression _________________ 1 1 (x - x̄)² PI(x,p,q) = t * sy * / - + - + -------- √ q n SSx
coefficientOfDetermination ( ) : number	R² - coefficient of determination
cooksD ( ) : array	Cook's Distance A measures of the influence of each data point on the regression.
correlationCoefficient ( ) : number	R - correlation coefficient (Pearson's r)
createDesignMatrix ( mixed $xs ) : Matrix	The Design Matrix contains all the independent variables needed for the least squares regression
degreesOfFreedom ( ) : number	The degrees of freedom of the regression
errorSD ( ) : number	Error Standard Deviation
getProjectionMatrix ( ) : Matrix	Project matrix (influence matrix, hat matrix H) Maps the vector of response values (dependent variable values) to the vector of fitted values (or predicted values).
leastSquares ( array $ys, array $xs, integer $order = 1, integer $fit_constant = 1 ) : Matrix	Linear least squares fitting using Matrix algebra (Polynomial).
leverages ( ) : array	Regression Leverages A measure of how far away the independent variable values of an observation are from those of the other observations.
meanSquareRegression ( ) : number	Mean square regression MSR = SSᵣ / p
meanSquareResidual ( ) : number	Mean of squares for error MSE = SSₑ / ν
meanSquareTotal ( ) : number	Mean of squares total MSTO = SSOT / (n - 1)
r ( ) : number	R - correlation coefficient Convenience wrapper for correlationCoefficient
r2 ( ) : number	R² - coefficient of determination Convenience wrapper for coefficientOfDetermination
regressionVariance ( number $x ) : number	Regression variance
residuals ( ) : array	Get the regression residuals eᵢ = yᵢ - ŷᵢ or in matrix form e = (I - H)y
standardErrors ( ) : array	Standard error of the regression parameters (coefficients)
sumOfSquaresRegression ( ) : number	SSreg - The Sum Squares of the regression (Explained sum of squares)
sumOfSquaresResidual ( ) : number	SSres - The Sum Squares of the residuals (RSS - Residual sum of squares)
sumOfSquaresTotal ( ) : number	SStot - The total Sum Squares
tProbability ( ) : array	The probabilty associated with each parameter's t value
tValues ( ) : array	The t values associated with each of the regression parameters (coefficients)

메소드 상세

CI() 공개 메소드

Where: t is the critical t for the p value sy is the estimated standard deviation of y n is the number of data points x̄ is the average of the x values SSx = ∑(x - x̄)² If $p = .05, then we can say we are 95% confidence the actual regression line will be within an interval of evaluate($x) ± CI($x, .05).

public CI ( number $x, $p ) : number
$x	number
리턴	number

DFFITS() 공개 메소드

https://en.wikipedia.org/wiki/DFFITS ŷᵢ - ŷᵢ₍ᵢ₎ DFFITS = ---------- s₍ᵢ₎ √hᵢᵢ where ŷᵢ is the prediction for point i with i included in the regression ŷᵢ₍ᵢ₎ is the prediction for point i without i included in the regression s₍ᵢ₎ is the standard error estimated without the point in question hᵢᵢ is the leverage for the point Putting it another way: sᵢ is the studentized residual eᵢ sᵢ = -------------- √(MSₑ(1 - hᵢ)) where eᵢ is the residual MSₑ is the mean squares residual Then, s₍ᵢ₎ is the studentized residual with the i-th observation removed: eᵢ s₍ᵢ₎ = ----------------- √(MSₑ₍ᵢ₎(1 - hᵢ)) where _ _ | eᵢ² | ν MSₑ₍ᵢ₎ =| MSₑ - -------- | ----- |_ (1 - h)ν _| ν - 1 Then, ______ hᵢ DFFITS = s₍ᵢ₎ / ------ √ 1 - hᵢ

public DFFITS ( ) : array
리턴	array

FProbability() 공개 메소드

F probability = F distribution CDF(F,d₁,d₂) where: F = F statistic d₁ = degrees of freedom 1 d₂ = degrees of freedom 2 ν = degrees of freedom

public FProbability ( ) : number
리턴	number

FStatistic() 공개 메소드

MSm SSᵣ/p F₀ = --- = ----------- MSₑ SSₑ/(n - p - α) where: MSm = mean square model (regression mean square) = SSᵣ / df(SSᵣ) = SSᵣ/p MSₑ = mean square error (estimate of variance σ² of the random error) = SSₑ/(n - p - α) p = the order of the fitted polynomial α = 1 if the model includes a constant term, 0 otherwise. (p+α = total number of model parameters) SSᵣ = sum of squares of the regression SSₑ = sum of squares of residuals

public FStatistic ( ) : number
리턴	number

PI() 공개 메소드

Where: t is the critical t for the p value sy is the estimated standard deviation of y q is the number of replications n is the number of data points x̄ is the average of the x values SSx = ∑(x - x̄)² If $p = .05, then we can say we are 95% confidence that the future averages of $q trials at $x will be within an interval of evaluate($x) ± PI($x, .05, $q).

public PI ( number $x, number $p, integer $q = 1 ) : number
$x	number
$p	number	0 < p < 1 The P value to use
$q	integer	Number of trials
리턴	number

coefficientOfDetermination() 공개 메소드

Indicates the proportion of the variance in the dependent variable that is predictable from the independent variable. Range of 0 - 1. Close to 1 means the regression line is a good fit https://en.wikipedia.org/wiki/Coefficient_of_determination

public coefficientOfDetermination ( ) : number
리턴	number

cooksD() 공개 메소드

Points with excessive influence may be outliers, or may warrent a closer look. https://en.wikipedia.org/wiki/Cook%27s_distance _ _ eᵢ² | hᵢ | Dᵢ = --- | --------- | s²p |_(1 - hᵢ)²_| where s ≡ (n - p)⁻¹eᵀ (mean square residuals) e is the mean square error of the residual model

public cooksD ( ) : array
리턴	array

correlationCoefficient() 공개 메소드

A measure of the strength and direction of the linear relationship between two variables that is defined as the (sample) covariance of the variables divided by the product of their (sample) standard deviations. n∑⟮xy⟯ − ∑⟮x⟯∑⟮y⟯ -------------------------------- √［（n∑x² − ⟮∑x⟯²）（n∑y² − ⟮∑y⟯²）］

public correlationCoefficient ( ) : number
리턴	number

createDesignMatrix() 공개 메소드

https://en.wikipedia.org/wiki/Design_matrix

public createDesignMatrix ( mixed $xs ) : Matrix
$xs	mixed
리턴	MathPHP\LinearAlgebra\Matrix

degreesOfFreedom() 공개 메소드

The degrees of freedom of the regression

public degreesOfFreedom ( ) : number
리턴	number

errorSD() 공개 메소드

Also called the standard error of the residuals

public errorSD ( ) : number
리턴	number

getProjectionMatrix() 공개 메소드

The diagonal elements of the projection matrix are the leverages. https://en.wikipedia.org/wiki/Projection_matrix H = X⟮XᵀX⟯⁻¹Xᵀ where X is the design matrix

public getProjectionMatrix ( ) : Matrix
리턴	MathPHP\LinearAlgebra\Matrix

leastSquares() 공개 메소드

Generalizing from a straight line (first degree polynomial) to a kᵗʰ degree polynomial: y = a₀ + a₁x + ⋯ + akxᵏ Leads to equations in matrix form: [n Σxᵢ ⋯ Σxᵢᵏ ] [a₀] [Σyᵢ ] [Σxᵢ Σxᵢ² ⋯ Σxᵢᵏ⁺¹] [a₁] [Σxᵢyᵢ ] [ ⋮ ⋮ ⋱ ⋮ ] [ ⋮ ] = [ ⋮ ] [Σxᵢᵏ Σxᵢᵏ⁺¹ ⋯ Σxᵢ²ᵏ ] [ak] [Σxᵢᵏyᵢ] This is a Vandermonde matrix: [1 x₁ ⋯ x₁ᵏ] [a₀] [y₁] [1 x₂ ⋯ x₂ᵏ] [a₁] [y₂] [⋮ ⋮ ⋱ ⋮ ] [ ⋮ ] = [ ⋮] [1 xn ⋯ xnᵏ] [ak] [yn] Can write as equation: y = Xa Solve by premultiplying by transpose Xᵀ: Xᵀy = XᵀXa Invert to yield vector solution: a = (XᵀX)⁻¹Xᵀy (http://mathworld.wolfram.com/LeastSquaresFittingPolynomial.html) For reference, the traditional way to do least squares: _ _ __ x y - xy _ _ m = _________ b = y - mx _ __ (x)² - x²

public leastSquares ( array $ys, array $xs, integer $order = 1, integer $fit_constant = 1 ) : Matrix
$ys	array	y values
$xs	array	x values
$order	integer	The order of the polynomial. 1 = linear, 2 = x², etc
$fit_constant	integer	'1' if we are fitting a constant to the regression.
리턴	MathPHP\LinearAlgebra\Matrix	[[m], [b]]

leverages() 공개 메소드

https://en.wikipedia.org/wiki/Leverage_(statistics) Leverage score for the i-th data unit is defined as: hᵢᵢ = [H]ᵢᵢ which is the i-th diagonal element of the project matrix H, where H = X⟮XᵀX⟯⁻¹Xᵀ where X is the design matrix.

public leverages ( ) : array
리턴	array

meanSquareRegression() 공개 메소드

Mean square regression MSR = SSᵣ / p

public meanSquareRegression ( ) : number
리턴	number

meanSquareResidual() 공개 메소드

Mean of squares for error MSE = SSₑ / ν

public meanSquareResidual ( ) : number
리턴	number

meanSquareTotal() 공개 메소드

Mean of squares total MSTO = SSOT / (n - 1)

public meanSquareTotal ( ) : number
리턴	number

r() 공개 메소드

R - correlation coefficient Convenience wrapper for correlationCoefficient

public r ( ) : number
리턴	number

r2() 공개 메소드

R² - coefficient of determination Convenience wrapper for coefficientOfDetermination

public r2 ( ) : number
리턴	number

regressionVariance() 공개 메소드

Regression variance

public regressionVariance ( number $x ) : number
$x	number
리턴	number

residuals() 공개 메소드

Get the regression residuals eᵢ = yᵢ - ŷᵢ or in matrix form e = (I - H)y

public residuals ( ) : array
리턴	array

standardErrors() 공개 메소드

_________ ∑eᵢ² ----- se(m) = / ν --------- √ ∑⟮xᵢ - μ⟯² where eᵢ = residual (difference between observed value and value predicted by the model) ν = n - 2 degrees of freedom ______ ∑xᵢ² se(b) = / ---- √ n

public standardErrors ( ) : array
리턴	array	[m => se(m), b => se(b)]

sumOfSquaresRegression() 공개 메소드

The sum of the squares of the deviations of the predicted values from the mean value of a response variable, in a standard regression model. https://en.wikipedia.org/wiki/Explained_sum_of_squares SSreg = ∑(ŷᵢ - ȳ)² When a constant is fit to the regression, the average of y = average of ŷ. In the case where the constant is not fit, we use the sum of squares of the predicted value SSreg = ∑ŷᵢ²

public sumOfSquaresRegression ( ) : number
리턴	number

sumOfSquaresResidual() 공개 메소드

The sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model. https://en.wikipedia.org/wiki/Residual_sum_of_squares SSres = ∑(yᵢ - f(xᵢ))² = ∑(yᵢ - ŷᵢ)² where yᵢ is an observed value ŷᵢ is a value predicted by the regression model

public sumOfSquaresResidual ( ) : number
리턴	number

sumOfSquaresTotal() 공개 메소드

the sum, over all observations, of the squared differences of each observation from the overall mean. https://en.wikipedia.org/wiki/Total_sum_of_squares For Simple Linear Regression SStot = ∑(yᵢ - ȳ)² For Regression through a point SStot = ∑yᵢ²

public sumOfSquaresTotal ( ) : number
리턴	number

tProbability() 공개 메소드

t probability = Student's T CDF(t,ν) where: t = t value ν = n - p - alpha degrees of freedom alpha = 1 if the regression includes a constant term

public tProbability ( ) : array
리턴	array	[m => p, b => p]

tValues() 공개 메소드

β t = ----- se(β) where: β = regression parameter (coefficient) se(β) = standard error of the regression parameter (coefficient)

public tValues ( ) : array
리턴	array	[m => t, b => t]