This is an

**for**__archive__**R-sig-geo**, not a forum. It is not possible to post through Nabble - you may not start a new thread nor follow up an existing thread. If you wish to post, but are not subscribed to the list through the list homepage, subscribe first**(through the list homepage, not via Nabble)**and post from your subscribed email address. Until 2015-06-20, subscribers could post through Nabble, but policy has been changed as too many non-subscribers misunderstood the interface.*Updated:*1 hour 27 min ago

### Re: raster: stackApply problems..

Unfortunately the names are not always in ascending order. This is the

result of my data.

names : index_4, index_5, index_6, index_7, index_1, index_2, index_3

min values : 3, 3, 3, 3, 3, 3, 3

max values : 307.0, 297.5, 311.0, 313.0, 468.0, 290.0, 302.0

And worst of all, it is not a proper match with indices.

If I run it with clusterR then the result is different:

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7

min values : 3, 3, 3, 3, 3, 3, 3

max values : 307.0, 297.5, 311.0, 313.0, 468.0, 290.0, 302.0

The solution is to reorder the layers of the stack so that the

stackApply indices are in ascending order e.g. 1,1,1,2,2,2,3,3,3 ...

My indices of my data was like that:

4 5 6 7 1 2 3 4 5 6 7 1 2 3 5 6 7

I've reported this behavior here

https://github.com/rspatial/raster/issues/82

On 11/20/19 3:05 PM, Ben Tupper wrote:

> Hi,

>

> That is certainly is unexpected to have two different naming styles.

> It's not really solution to take to the bank, but you could simply

> compose your own names assuming that the layer orders are always

> returned in ascending index order.

> Would that work for you

>

> ### start

> library(raster)

>

> # Compute layer names for stackApply output

> #

> # @param index numeric, 1-based layer indices used for stackApply function

> # @param prefix character, prefix for names

> # @return character layers names in index order

> layer_names <- function(index = c(2,2,3,3,1,1), prefix = c("layer.",

> "index_")[1]){

> paste0(prefix, sort(unique(index)))

> }

>

> indices <- c(2,2,3,3,1,1)

>

> r <- raster()

> values(r) <- 1

> # simple sequential stack from 1 to 6 in all cells

> s <- stack(r, r*2, r*3, r*4, r*5, r*6)

> s

>

> beginCluster(2)

> res <- clusterR(s, stackApply, args = list(indices=indices, fun = mean))

> raster::endCluster()

> names(res) <- layer_names(indices, prefix = "foobar.")

> res

>

> res2 <- stackApply(s, indices, mean)

> names(res2) <- layer_names(indices, prefix = "foobar.")

> res2

> ### end

>

>

> On Wed, Nov 20, 2019 at 1:36 AM Leonidas Liakos via R-sig-Geo

> <[hidden email]> wrote:

>> This is not a reasonable solution. It is not efficient to run stackapply

>> twice to get the right names. Each execution can take hours.

>>

>>

>> Στις 20/11/2019 3:30 π.μ., ο Frederico Faleiro έγραψε:

>>> Hi Leonidas,

>>>

>>> both results are in the same order, but the name is different.

>>> You can rename the first as in the second:

>>> names(res) <- names(res2)

>>>

>>> I provided an example to help you understand the logic.

>>>

>>> library(raster)

>>> beginCluster(2)

>>> r <- raster()

>>> values(r) <- 1

>>> # simple sequential stack from 1 to 6 in all cells

>>> s <- stack(r, r*2, r*3, r*4, r*5, r*6)

>>> s

>>> res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

>>> mean))

>>> res

>>> res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

>>> res2

>>> dif <- res - res2

>>> # exatly the same order because the difference is zero for all layers

>>> dif

>>> # rename

>>> names(res) <- names(res2)

>>>

>>> Best regards,

>>>

>>> Frederico Faleiro

>>>

>>> On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

>>> [hidden email]> wrote:

>>>

>>>> I run the example with clusterR:

>>>>

>>>> no_cores <- parallel::detectCores() -1

>>>> raster::beginCluster(no_cores)

>>>> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

>>>> list(indices=c(2,2,3,3,1,1),fun = mean))

>>>> raster::endCluster()

>>>>

>>>> And the result is:

>>>>

>>>>> res

>>>> class?????????? : RasterBrick

>>>> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

>>>> resolution : 1, 1?? (x, y)

>>>> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

>>>> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>>>> source???????? : memory

>>>> names?????????? : layer.1, layer.2, layer.3

>>>> min values :???????? 1.5,???????? 3.5,???????? 5.5

>>>> max values :???????? 1.5,???????? 3.5,???????? 5.5??

>>>>

>>>>

>>>> layer.1, layer.2, layer.3 (?)

>>>>

>>>> So what corrensponds to what?

>>>>

>>>>

>>>> If I run:

>>>>

>>>> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

>>>>

>>>> The result is:

>>>>

>>>>> res2

>>>> class : RasterBrick

>>>> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

>>>> resolution : 1, 1 (x, y)

>>>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

>>>> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>>>> source : memory

>>>> names : index_2, index_3, index_1

>>>> min values : 1.5, 3.5, 5.5

>>>> max values : 1.5, 3.5, 5.5

>>>>

>>>> There is no consistency with the names of the output and obscure

>>>> correspondence with the indices in the case of clusterR

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email]

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>>

>>

>> --

>> Λιάκος Λεωνίδας, Γεωγράφος

>> https://www.geographer.gr

>> PGP fingerprint: 5237 83F8 E46C D91A 9FBB C7E7 F943 C9B6 8231 0937

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

result of my data.

names : index_4, index_5, index_6, index_7, index_1, index_2, index_3

min values : 3, 3, 3, 3, 3, 3, 3

max values : 307.0, 297.5, 311.0, 313.0, 468.0, 290.0, 302.0

And worst of all, it is not a proper match with indices.

If I run it with clusterR then the result is different:

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7

min values : 3, 3, 3, 3, 3, 3, 3

max values : 307.0, 297.5, 311.0, 313.0, 468.0, 290.0, 302.0

The solution is to reorder the layers of the stack so that the

stackApply indices are in ascending order e.g. 1,1,1,2,2,2,3,3,3 ...

My indices of my data was like that:

4 5 6 7 1 2 3 4 5 6 7 1 2 3 5 6 7

I've reported this behavior here

https://github.com/rspatial/raster/issues/82

On 11/20/19 3:05 PM, Ben Tupper wrote:

> Hi,

>

> That is certainly is unexpected to have two different naming styles.

> It's not really solution to take to the bank, but you could simply

> compose your own names assuming that the layer orders are always

> returned in ascending index order.

> Would that work for you

>

> ### start

> library(raster)

>

> # Compute layer names for stackApply output

> #

> # @param index numeric, 1-based layer indices used for stackApply function

> # @param prefix character, prefix for names

> # @return character layers names in index order

> layer_names <- function(index = c(2,2,3,3,1,1), prefix = c("layer.",

> "index_")[1]){

> paste0(prefix, sort(unique(index)))

> }

>

> indices <- c(2,2,3,3,1,1)

>

> r <- raster()

> values(r) <- 1

> # simple sequential stack from 1 to 6 in all cells

> s <- stack(r, r*2, r*3, r*4, r*5, r*6)

> s

>

> beginCluster(2)

> res <- clusterR(s, stackApply, args = list(indices=indices, fun = mean))

> raster::endCluster()

> names(res) <- layer_names(indices, prefix = "foobar.")

> res

>

> res2 <- stackApply(s, indices, mean)

> names(res2) <- layer_names(indices, prefix = "foobar.")

> res2

> ### end

>

>

> On Wed, Nov 20, 2019 at 1:36 AM Leonidas Liakos via R-sig-Geo

> <[hidden email]> wrote:

>> This is not a reasonable solution. It is not efficient to run stackapply

>> twice to get the right names. Each execution can take hours.

>>

>>

>> Στις 20/11/2019 3:30 π.μ., ο Frederico Faleiro έγραψε:

>>> Hi Leonidas,

>>>

>>> both results are in the same order, but the name is different.

>>> You can rename the first as in the second:

>>> names(res) <- names(res2)

>>>

>>> I provided an example to help you understand the logic.

>>>

>>> library(raster)

>>> beginCluster(2)

>>> r <- raster()

>>> values(r) <- 1

>>> # simple sequential stack from 1 to 6 in all cells

>>> s <- stack(r, r*2, r*3, r*4, r*5, r*6)

>>> s

>>> res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

>>> mean))

>>> res

>>> res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

>>> res2

>>> dif <- res - res2

>>> # exatly the same order because the difference is zero for all layers

>>> dif

>>> # rename

>>> names(res) <- names(res2)

>>>

>>> Best regards,

>>>

>>> Frederico Faleiro

>>>

>>> On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

>>> [hidden email]> wrote:

>>>

>>>> I run the example with clusterR:

>>>>

>>>> no_cores <- parallel::detectCores() -1

>>>> raster::beginCluster(no_cores)

>>>> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

>>>> list(indices=c(2,2,3,3,1,1),fun = mean))

>>>> raster::endCluster()

>>>>

>>>> And the result is:

>>>>

>>>>> res

>>>> class?????????? : RasterBrick

>>>> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

>>>> resolution : 1, 1?? (x, y)

>>>> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

>>>> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>>>> source???????? : memory

>>>> names?????????? : layer.1, layer.2, layer.3

>>>> min values :???????? 1.5,???????? 3.5,???????? 5.5

>>>> max values :???????? 1.5,???????? 3.5,???????? 5.5??

>>>>

>>>>

>>>> layer.1, layer.2, layer.3 (?)

>>>>

>>>> So what corrensponds to what?

>>>>

>>>>

>>>> If I run:

>>>>

>>>> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

>>>>

>>>> The result is:

>>>>

>>>>> res2

>>>> class : RasterBrick

>>>> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

>>>> resolution : 1, 1 (x, y)

>>>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

>>>> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>>>> source : memory

>>>> names : index_2, index_3, index_1

>>>> min values : 1.5, 3.5, 5.5

>>>> max values : 1.5, 3.5, 5.5

>>>>

>>>> There is no consistency with the names of the output and obscure

>>>> correspondence with the indices in the case of clusterR

>>>>

>>>>

>>>> [[alternative HTML version deleted]]

>>>>

>>>> _______________________________________________

>>>> R-sig-Geo mailing list

>>>> [hidden email]

>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>>

>>

>> --

>> Λιάκος Λεωνίδας, Γεωγράφος

>> https://www.geographer.gr

>> PGP fingerprint: 5237 83F8 E46C D91A 9FBB C7E7 F943 C9B6 8231 0937

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

>

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: raster: stackApply problems..

Hi,

That is certainly is unexpected to have two different naming styles.

It's not really solution to take to the bank, but you could simply

compose your own names assuming that the layer orders are always

returned in ascending index order.

Would that work for you

### start

library(raster)

# Compute layer names for stackApply output

#

# @param index numeric, 1-based layer indices used for stackApply function

# @param prefix character, prefix for names

# @return character layers names in index order

layer_names <- function(index = c(2,2,3,3,1,1), prefix = c("layer.",

"index_")[1]){

paste0(prefix, sort(unique(index)))

}

indices <- c(2,2,3,3,1,1)

r <- raster()

values(r) <- 1

# simple sequential stack from 1 to 6 in all cells

s <- stack(r, r*2, r*3, r*4, r*5, r*6)

s

beginCluster(2)

res <- clusterR(s, stackApply, args = list(indices=indices, fun = mean))

raster::endCluster()

names(res) <- layer_names(indices, prefix = "foobar.")

res

res2 <- stackApply(s, indices, mean)

names(res2) <- layer_names(indices, prefix = "foobar.")

res2

### end

On Wed, Nov 20, 2019 at 1:36 AM Leonidas Liakos via R-sig-Geo

<[hidden email]> wrote:

>

> This is not a reasonable solution. It is not efficient to run stackapply

> twice to get the right names. Each execution can take hours.

>

>

> Στις 20/11/2019 3:30 π.μ., ο Frederico Faleiro έγραψε:

> > Hi Leonidas,

> >

> > both results are in the same order, but the name is different.

> > You can rename the first as in the second:

> > names(res) <- names(res2)

> >

> > I provided an example to help you understand the logic.

> >

> > library(raster)

> > beginCluster(2)

> > r <- raster()

> > values(r) <- 1

> > # simple sequential stack from 1 to 6 in all cells

> > s <- stack(r, r*2, r*3, r*4, r*5, r*6)

> > s

> > res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

> > mean))

> > res

> > res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

> > res2

> > dif <- res - res2

> > # exatly the same order because the difference is zero for all layers

> > dif

> > # rename

> > names(res) <- names(res2)

> >

> > Best regards,

> >

> > Frederico Faleiro

> >

> > On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

> > [hidden email]> wrote:

> >

> >> I run the example with clusterR:

> >>

> >> no_cores <- parallel::detectCores() -1

> >> raster::beginCluster(no_cores)

> >> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

> >> list(indices=c(2,2,3,3,1,1),fun = mean))

> >> raster::endCluster()

> >>

> >> And the result is:

> >>

> >>> res

> >> class?????????? : RasterBrick

> >> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

> >> resolution : 1, 1?? (x, y)

> >> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

> >> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> >> source???????? : memory

> >> names?????????? : layer.1, layer.2, layer.3

> >> min values :???????? 1.5,???????? 3.5,???????? 5.5

> >> max values :???????? 1.5,???????? 3.5,???????? 5.5??

> >>

> >>

> >> layer.1, layer.2, layer.3 (?)

> >>

> >> So what corrensponds to what?

> >>

> >>

> >> If I run:

> >>

> >> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

> >>

> >> The result is:

> >>

> >>> res2

> >> class : RasterBrick

> >> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

> >> resolution : 1, 1 (x, y)

> >> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

> >> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> >> source : memory

> >> names : index_2, index_3, index_1

> >> min values : 1.5, 3.5, 5.5

> >> max values : 1.5, 3.5, 5.5

> >>

> >> There is no consistency with the names of the output and obscure

> >> correspondence with the indices in the case of clusterR

> >>

> >>

> >> [[alternative HTML version deleted]]

> >>

> >> _______________________________________________

> >> R-sig-Geo mailing list

> >> [hidden email]

> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> >>

> >

>

>

> --

> Λιάκος Λεωνίδας, Γεωγράφος

> https://www.geographer.gr

> PGP fingerprint: 5237 83F8 E46C D91A 9FBB C7E7 F943 C9B6 8231 0937

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--

Ben Tupper

Bigelow Laboratory for Ocean Science

West Boothbay Harbor, Maine

http://www.bigelow.org/

https://eco.bigelow.org

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

That is certainly is unexpected to have two different naming styles.

It's not really solution to take to the bank, but you could simply

compose your own names assuming that the layer orders are always

returned in ascending index order.

Would that work for you

### start

library(raster)

# Compute layer names for stackApply output

#

# @param index numeric, 1-based layer indices used for stackApply function

# @param prefix character, prefix for names

# @return character layers names in index order

layer_names <- function(index = c(2,2,3,3,1,1), prefix = c("layer.",

"index_")[1]){

paste0(prefix, sort(unique(index)))

}

indices <- c(2,2,3,3,1,1)

r <- raster()

values(r) <- 1

# simple sequential stack from 1 to 6 in all cells

s <- stack(r, r*2, r*3, r*4, r*5, r*6)

s

beginCluster(2)

res <- clusterR(s, stackApply, args = list(indices=indices, fun = mean))

raster::endCluster()

names(res) <- layer_names(indices, prefix = "foobar.")

res

res2 <- stackApply(s, indices, mean)

names(res2) <- layer_names(indices, prefix = "foobar.")

res2

### end

On Wed, Nov 20, 2019 at 1:36 AM Leonidas Liakos via R-sig-Geo

<[hidden email]> wrote:

>

> This is not a reasonable solution. It is not efficient to run stackapply

> twice to get the right names. Each execution can take hours.

>

>

> Στις 20/11/2019 3:30 π.μ., ο Frederico Faleiro έγραψε:

> > Hi Leonidas,

> >

> > both results are in the same order, but the name is different.

> > You can rename the first as in the second:

> > names(res) <- names(res2)

> >

> > I provided an example to help you understand the logic.

> >

> > library(raster)

> > beginCluster(2)

> > r <- raster()

> > values(r) <- 1

> > # simple sequential stack from 1 to 6 in all cells

> > s <- stack(r, r*2, r*3, r*4, r*5, r*6)

> > s

> > res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

> > mean))

> > res

> > res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

> > res2

> > dif <- res - res2

> > # exatly the same order because the difference is zero for all layers

> > dif

> > # rename

> > names(res) <- names(res2)

> >

> > Best regards,

> >

> > Frederico Faleiro

> >

> > On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

> > [hidden email]> wrote:

> >

> >> I run the example with clusterR:

> >>

> >> no_cores <- parallel::detectCores() -1

> >> raster::beginCluster(no_cores)

> >> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

> >> list(indices=c(2,2,3,3,1,1),fun = mean))

> >> raster::endCluster()

> >>

> >> And the result is:

> >>

> >>> res

> >> class?????????? : RasterBrick

> >> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

> >> resolution : 1, 1?? (x, y)

> >> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

> >> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> >> source???????? : memory

> >> names?????????? : layer.1, layer.2, layer.3

> >> min values :???????? 1.5,???????? 3.5,???????? 5.5

> >> max values :???????? 1.5,???????? 3.5,???????? 5.5??

> >>

> >>

> >> layer.1, layer.2, layer.3 (?)

> >>

> >> So what corrensponds to what?

> >>

> >>

> >> If I run:

> >>

> >> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

> >>

> >> The result is:

> >>

> >>> res2

> >> class : RasterBrick

> >> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

> >> resolution : 1, 1 (x, y)

> >> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

> >> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> >> source : memory

> >> names : index_2, index_3, index_1

> >> min values : 1.5, 3.5, 5.5

> >> max values : 1.5, 3.5, 5.5

> >>

> >> There is no consistency with the names of the output and obscure

> >> correspondence with the indices in the case of clusterR

> >>

> >>

> >> [[alternative HTML version deleted]]

> >>

> >> _______________________________________________

> >> R-sig-Geo mailing list

> >> [hidden email]

> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> >>

> >

>

>

> --

> Λιάκος Λεωνίδας, Γεωγράφος

> https://www.geographer.gr

> PGP fingerprint: 5237 83F8 E46C D91A 9FBB C7E7 F943 C9B6 8231 0937

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

--

Ben Tupper

Bigelow Laboratory for Ocean Science

West Boothbay Harbor, Maine

http://www.bigelow.org/

https://eco.bigelow.org

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: raster: stackApply problems..

This is not a reasonable solution. It is not efficient to run stackapply

twice to get the right names. Each execution can take hours.

Στις 20/11/2019 3:30 π.μ., ο Frederico Faleiro έγραψε:

> Hi Leonidas,

>

> both results are in the same order, but the name is different.

> You can rename the first as in the second:

> names(res) <- names(res2)

>

> I provided an example to help you understand the logic.

>

> library(raster)

> beginCluster(2)

> r <- raster()

> values(r) <- 1

> # simple sequential stack from 1 to 6 in all cells

> s <- stack(r, r*2, r*3, r*4, r*5, r*6)

> s

> res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

> mean))

> res

> res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

> res2

> dif <- res - res2

> # exatly the same order because the difference is zero for all layers

> dif

> # rename

> names(res) <- names(res2)

>

> Best regards,

>

> Frederico Faleiro

>

> On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

> [hidden email]> wrote:

>

>> I run the example with clusterR:

>>

>> no_cores <- parallel::detectCores() -1

>> raster::beginCluster(no_cores)

>> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

>> list(indices=c(2,2,3,3,1,1),fun = mean))

>> raster::endCluster()

>>

>> And the result is:

>>

>>> res

>> class?????????? : RasterBrick

>> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

>> resolution : 1, 1?? (x, y)

>> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

>> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>> source???????? : memory

>> names?????????? : layer.1, layer.2, layer.3

>> min values :???????? 1.5,???????? 3.5,???????? 5.5

>> max values :???????? 1.5,???????? 3.5,???????? 5.5??

>>

>>

>> layer.1, layer.2, layer.3 (?)

>>

>> So what corrensponds to what?

>>

>>

>> If I run:

>>

>> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

>>

>> The result is:

>>

>>> res2

>> class : RasterBrick

>> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

>> resolution : 1, 1 (x, y)

>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

>> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>> source : memory

>> names : index_2, index_3, index_1

>> min values : 1.5, 3.5, 5.5

>> max values : 1.5, 3.5, 5.5

>>

>> There is no consistency with the names of the output and obscure

>> correspondence with the indices in the case of clusterR

>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

>

--

Λιάκος Λεωνίδας, Γεωγράφος

https://www.geographer.gr

PGP fingerprint: 5237 83F8 E46C D91A 9FBB C7E7 F943 C9B6 8231 0937

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

twice to get the right names. Each execution can take hours.

Στις 20/11/2019 3:30 π.μ., ο Frederico Faleiro έγραψε:

> Hi Leonidas,

>

> both results are in the same order, but the name is different.

> You can rename the first as in the second:

> names(res) <- names(res2)

>

> I provided an example to help you understand the logic.

>

> library(raster)

> beginCluster(2)

> r <- raster()

> values(r) <- 1

> # simple sequential stack from 1 to 6 in all cells

> s <- stack(r, r*2, r*3, r*4, r*5, r*6)

> s

> res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

> mean))

> res

> res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

> res2

> dif <- res - res2

> # exatly the same order because the difference is zero for all layers

> dif

> # rename

> names(res) <- names(res2)

>

> Best regards,

>

> Frederico Faleiro

>

> On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

> [hidden email]> wrote:

>

>> I run the example with clusterR:

>>

>> no_cores <- parallel::detectCores() -1

>> raster::beginCluster(no_cores)

>> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

>> list(indices=c(2,2,3,3,1,1),fun = mean))

>> raster::endCluster()

>>

>> And the result is:

>>

>>> res

>> class?????????? : RasterBrick

>> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

>> resolution : 1, 1?? (x, y)

>> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

>> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>> source???????? : memory

>> names?????????? : layer.1, layer.2, layer.3

>> min values :???????? 1.5,???????? 3.5,???????? 5.5

>> max values :???????? 1.5,???????? 3.5,???????? 5.5??

>>

>>

>> layer.1, layer.2, layer.3 (?)

>>

>> So what corrensponds to what?

>>

>>

>> If I run:

>>

>> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

>>

>> The result is:

>>

>>> res2

>> class : RasterBrick

>> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

>> resolution : 1, 1 (x, y)

>> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

>> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

>> source : memory

>> names : index_2, index_3, index_1

>> min values : 1.5, 3.5, 5.5

>> max values : 1.5, 3.5, 5.5

>>

>> There is no consistency with the names of the output and obscure

>> correspondence with the indices in the case of clusterR

>>

>>

>> [[alternative HTML version deleted]]

>>

>> _______________________________________________

>> R-sig-Geo mailing list

>> [hidden email]

>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>

>

--

Λιάκος Λεωνίδας, Γεωγράφος

https://www.geographer.gr

PGP fingerprint: 5237 83F8 E46C D91A 9FBB C7E7 F943 C9B6 8231 0937

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: raster: stackApply problems..

Hi Leonidas,

both results are in the same order, but the name is different.

You can rename the first as in the second:

names(res) <- names(res2)

I provided an example to help you understand the logic.

library(raster)

beginCluster(2)

r <- raster()

values(r) <- 1

# simple sequential stack from 1 to 6 in all cells

s <- stack(r, r*2, r*3, r*4, r*5, r*6)

s

res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

mean))

res

res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

res2

dif <- res - res2

# exatly the same order because the difference is zero for all layers

dif

# rename

names(res) <- names(res2)

Best regards,

Frederico Faleiro

On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

[hidden email]> wrote:

> I run the example with clusterR:

>

> no_cores <- parallel::detectCores() -1

> raster::beginCluster(no_cores)

> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

> list(indices=c(2,2,3,3,1,1),fun = mean))

> raster::endCluster()

>

> And the result is:

>

> > res

> class?????????? : RasterBrick

> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

> resolution : 1, 1?? (x, y)

> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> source???????? : memory

> names?????????? : layer.1, layer.2, layer.3

> min values :???????? 1.5,???????? 3.5,???????? 5.5

> max values :???????? 1.5,???????? 3.5,???????? 5.5??

>

>

> layer.1, layer.2, layer.3 (?)

>

> So what corrensponds to what?

>

>

> If I run:

>

> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

>

> The result is:

>

> > res2

> class : RasterBrick

> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

> resolution : 1, 1 (x, y)

> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> source : memory

> names : index_2, index_3, index_1

> min values : 1.5, 3.5, 5.5

> max values : 1.5, 3.5, 5.5

>

> There is no consistency with the names of the output and obscure

> correspondence with the indices in the case of clusterR

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

both results are in the same order, but the name is different.

You can rename the first as in the second:

names(res) <- names(res2)

I provided an example to help you understand the logic.

library(raster)

beginCluster(2)

r <- raster()

values(r) <- 1

# simple sequential stack from 1 to 6 in all cells

s <- stack(r, r*2, r*3, r*4, r*5, r*6)

s

res <- clusterR(s, stackApply, args = list(indices=c(2,2,3,3,1,1), fun =

mean))

res

res2 <- stackApply(s, c(2,2,3,3,1,1), mean)

res2

dif <- res - res2

# exatly the same order because the difference is zero for all layers

dif

# rename

names(res) <- names(res2)

Best regards,

Frederico Faleiro

On Tue, Nov 19, 2019 at 4:15 PM Leonidas Liakos via R-sig-Geo <

[hidden email]> wrote:

> I run the example with clusterR:

>

> no_cores <- parallel::detectCores() -1

> raster::beginCluster(no_cores)

> ?????? res <- raster::clusterR(inp, raster::stackApply, args =

> list(indices=c(2,2,3,3,1,1),fun = mean))

> raster::endCluster()

>

> And the result is:

>

> > res

> class?????????? : RasterBrick

> dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

> resolution : 1, 1?? (x, y)

> extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

> crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> source???????? : memory

> names?????????? : layer.1, layer.2, layer.3

> min values :???????? 1.5,???????? 3.5,???????? 5.5

> max values :???????? 1.5,???????? 3.5,???????? 5.5??

>

>

> layer.1, layer.2, layer.3 (?)

>

> So what corrensponds to what?

>

>

> If I run:

>

> res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

>

> The result is:

>

> > res2

> class : RasterBrick

> dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

> resolution : 1, 1 (x, y)

> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

> crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

> source : memory

> names : index_2, index_3, index_1

> min values : 1.5, 3.5, 5.5

> max values : 1.5, 3.5, 5.5

>

> There is no consistency with the names of the output and obscure

> correspondence with the indices in the case of clusterR

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: raster: stackApply problems..

I run the example with clusterR:

no_cores <- parallel::detectCores() -1

raster::beginCluster(no_cores)

?????? res <- raster::clusterR(inp, raster::stackApply, args =

list(indices=c(2,2,3,3,1,1),fun = mean))

raster::endCluster()

And the result is:

> res

class?????????? : RasterBrick

dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

resolution : 1, 1?? (x, y)

extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

source???????? : memory

names?????????? : layer.1, layer.2, layer.3

min values :???????? 1.5,???????? 3.5,???????? 5.5

max values :???????? 1.5,???????? 3.5,???????? 5.5??

layer.1, layer.2, layer.3 (?)

So what corrensponds to what?

If I run:

res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

The result is:

> res2

class : RasterBrick

dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

resolution : 1, 1 (x, y)

extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

source : memory

names : index_2, index_3, index_1

min values : 1.5, 3.5, 5.5

max values : 1.5, 3.5, 5.5

There is no consistency with the names of the output and obscure correspondence with the indices in the case of clusterR

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

no_cores <- parallel::detectCores() -1

raster::beginCluster(no_cores)

?????? res <- raster::clusterR(inp, raster::stackApply, args =

list(indices=c(2,2,3,3,1,1),fun = mean))

raster::endCluster()

And the result is:

> res

class?????????? : RasterBrick

dimensions : 180, 360, 64800, 3?? (nrow, ncol, ncell, nlayers)

resolution : 1, 1?? (x, y)

extent???????? : -180, 180, -90, 90?? (xmin, xmax, ymin, ymax)

crs?????????????? : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

source???????? : memory

names?????????? : layer.1, layer.2, layer.3

min values :???????? 1.5,???????? 3.5,???????? 5.5

max values :???????? 1.5,???????? 3.5,???????? 5.5??

layer.1, layer.2, layer.3 (?)

So what corrensponds to what?

If I run:

res2 <- stackApply(inp,c(2,2,3,3,1,1),mean)

The result is:

> res2

class : RasterBrick

dimensions : 180, 360, 64800, 3 (nrow, ncol, ncell, nlayers)

resolution : 1, 1 (x, y)

extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)

crs : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

source : memory

names : index_2, index_3, index_1

min values : 1.5, 3.5, 5.5

max values : 1.5, 3.5, 5.5

There is no consistency with the names of the output and obscure correspondence with the indices in the case of clusterR

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Fwd: Highlight an area using symbol

On Mon, 18 Nov 2019, Raman Mishra wrote:

> Raman Mishra

> Senior Research Scholar

> International Institute for Population Sciences, Mumbai

> Mobile: +91-8879317106

>

>

> ---------- Forwarded message ---------

> From: Raman Mishra <[hidden email]>

> Date: Mon, Nov 18, 2019 at 12:19 PM

> Subject: Highlight an area using symbol

> To: <[hidden email]>

>

>

> My data have a missing value for a district, I gave 0 value for the

> district and did spatial analysis (Univariate, Regression etc.).

>

> I tried using "NA" instead of 0 value for that district but most of the

> codes available online are not working with "NA" and it won't be ethical to

> delete that area from my map.

>

> I want to highlight that district using symbols.

>

> Is it possible that after the creation of maps one can highlight that

> district using some function?

>

> Please provide a solution as I am new to R.

library(sf)

nc <- st_read(system.file("shape/nc.shp", package="sf"))

library(tmap)

tm_shape(nc) + tm_fill("SID74")

set.seed(1)

is.na(nc$SID74) <- sample(1:nrow(nc), 5)

tm_shape(nc) + tm_fill("SID74")

tm_fill() denotes NA values automatically, and provides the easiest

solution. You could also add a symbol, but I don't think this is

necessary. tmap handles the older sp objects too, but new vector data work

should use sf, not sp.

Hope this helps,

Roger

>

> Thanks

> Raman Mishra

> Senior Research Scholar

> International Institute for Population Sciences, Mumbai

> Mobile: +91-8879317106

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> Raman Mishra

> Senior Research Scholar

> International Institute for Population Sciences, Mumbai

> Mobile: +91-8879317106

>

>

> ---------- Forwarded message ---------

> From: Raman Mishra <[hidden email]>

> Date: Mon, Nov 18, 2019 at 12:19 PM

> Subject: Highlight an area using symbol

> To: <[hidden email]>

>

>

> My data have a missing value for a district, I gave 0 value for the

> district and did spatial analysis (Univariate, Regression etc.).

>

> I tried using "NA" instead of 0 value for that district but most of the

> codes available online are not working with "NA" and it won't be ethical to

> delete that area from my map.

>

> I want to highlight that district using symbols.

>

> Is it possible that after the creation of maps one can highlight that

> district using some function?

>

> Please provide a solution as I am new to R.

library(sf)

nc <- st_read(system.file("shape/nc.shp", package="sf"))

library(tmap)

tm_shape(nc) + tm_fill("SID74")

set.seed(1)

is.na(nc$SID74) <- sample(1:nrow(nc), 5)

tm_shape(nc) + tm_fill("SID74")

tm_fill() denotes NA values automatically, and provides the easiest

solution. You could also add a symbol, but I don't think this is

necessary. tmap handles the older sp objects too, but new vector data work

should use sf, not sp.

Hope this helps,

Roger

>

> Thanks

> Raman Mishra

> Senior Research Scholar

> International Institute for Population Sciences, Mumbai

> Mobile: +91-8879317106

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Fwd: Highlight an area using symbol

Raman Mishra

Senior Research Scholar

International Institute for Population Sciences, Mumbai

Mobile: +91-8879317106

---------- Forwarded message ---------

From: Raman Mishra <[hidden email]>

Date: Mon, Nov 18, 2019 at 12:19 PM

Subject: Highlight an area using symbol

To: <[hidden email]>

My data have a missing value for a district, I gave 0 value for the

district and did spatial analysis (Univariate, Regression etc.).

I tried using "NA" instead of 0 value for that district but most of the

codes available online are not working with "NA" and it won't be ethical to

delete that area from my map.

I want to highlight that district using symbols.

Is it possible that after the creation of maps one can highlight that

district using some function?

Please provide a solution as I am new to R.

Thanks

Raman Mishra

Senior Research Scholar

International Institute for Population Sciences, Mumbai

Mobile: +91-8879317106

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Senior Research Scholar

International Institute for Population Sciences, Mumbai

Mobile: +91-8879317106

---------- Forwarded message ---------

From: Raman Mishra <[hidden email]>

Date: Mon, Nov 18, 2019 at 12:19 PM

Subject: Highlight an area using symbol

To: <[hidden email]>

My data have a missing value for a district, I gave 0 value for the

district and did spatial analysis (Univariate, Regression etc.).

I tried using "NA" instead of 0 value for that district but most of the

codes available online are not working with "NA" and it won't be ethical to

delete that area from my map.

I want to highlight that district using symbols.

Is it possible that after the creation of maps one can highlight that

district using some function?

Please provide a solution as I am new to R.

Thanks

Raman Mishra

Senior Research Scholar

International Institute for Population Sciences, Mumbai

Mobile: +91-8879317106

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Spatial Autocorrelation Estimation Method

Dear Roger,

Thank you for your message and sorry for my late answer.

Regarding the number of listings (lettings) for my data set (2.216.642 observations), each listing contains an individual id:

unique ids: 180.004

time periods: 54 (2015-01 to 2019-09)

number of ids that appear only once: 28.486 (of 180.004 ids) (15,8%)

number of ids that appear/repeat 2-10 times: 82.641 (of 180.004 ids) (45,9%)

number of ids that appear/repeat 11-30 times: 46.465 (of 180.004 ids) (25,8%)

number of ids that appear/repeat 31-54 times: 22.412 (of 180.004 ids) (12,5%)

Important to notice is that hosts can change the room_category (between entire/home apt, private room and shared room) keeping the same listing id number. In my data, the number of unique ids that in some point changed the room_type is of 7.204 ids.

--

For the OLS model, I was using only a fixed effect model, where each time period (date_compiled) (54 in total) is a time dummy.

plm::plm(formula = model, data = listings, model = "pooling", index = c("id", "date_compiled"))

--

Osland et al. (2016) (https://doi.org/10.1111/jors.12281) use a spatial fixed effects (SFE) hedonic model, where each defined neighborhood zone in the study area is represented by dummy variables.

Dong et al. (2015) (https://doi.org/10.1111/gean.12049) outline four model specifications to accommodate geographically hierarchical data structures: (1) groupwise W and fixed regional effects; (2) groupwise W and random regional effects; (3) proximity-based W and fixed regional effects; and (4) proximity-based W and random regional effects.

--

I created a new column/variable containing the borough where the zipcode is found (Manhattan, Brooklyn, Queens, Bronx, Staten Island).

If I understood it right, the (two-level) Hierarchical Spatial Simultaneous Autoregressive Model (HSAR) considers the occurrence of spatial relations at the (lower) individual (geographical coordinates - in my case, the listing location) and (higher) group level (territorial units - in my case, zipcodes).

According to Bivand et al. (2017): "(...) W is a spatial weights matrix. The HSAR model may also be estimated without this component.". So, in this case I only estimate the Hierarchical Spatial Simultaneous Autoregressive Model (HSAR) in a "one-level" basis, i.e., at the higher-level.

HSAR::hsar(model, data = listings, W = NULL, M = M, Delta = Delta, burnin = 5000, Nsim = 10000, thinning = 1, parameters.start = pars)

(Where the "model" formula contains the 54 time dummy variables)

Do you think I can proceed with this model? I was able to calculate it.

If I remove all observations/rows with NAs in one of the chosen variables/observations, 884.183 observations remain. If I would create a W matrix for HSAR::hsar, I would have a gigantic 884.183 by 884.183 matrix. This is the reason why I put W = NULL.

Thank you and best regards

________________________________________

From: Roger Bivand <[hidden email]>

Sent: Monday, November 11, 2019 11:31

To: Robert R

Cc: [hidden email]

Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

On Sun, 10 Nov 2019, Robert R wrote:

> Dear Roger,

>

> Again, thank you for your answer. I read the material provided and

> decided that Hierarchical Spatial Autoregressive (HSAR) could be the

> right model for me.

>

> I indeed have the precise latitude and longitude information for all my

> listings for NYC.

>

> I created a stratified sample (group = zipcode) with 22172 (1%) of my

> observations called listings_sample and tried to replicate the hsar

> model, please see below.

>

> For now W = NULL, because otherwise I would have a 22172 x 22172 matrix.

Unless you know definitely that you want to relate the response to its

lagged value, you do not need this. Do note that the matrix is very

sparse, so could be fitted without difficulty with ML in a cross-sectional

model.

>

> You recommended then to introduce a Markov random field (MRF) random

> effect (RE) at the zipcode level, but I did not understand it so well.

> Could you develop a litte more?

>

Did you read the development in

https://doi.org/10.1016/j.spasta.2017.01.002? It is explained there, and

includes code for fitting the Beijing housing parcels data se from HSAR

with many other packages (MCMC, INLA, hglm, etc.). I guess that you should

try to create a model that works on a single borough, sing the zipcodes

in that borough as a proxy for unobserved neighbourhood effects. Try for

example using lme4::lmer() with only a zipcode IID random effect, see if

the hedonic estimates are similar to lm(), and leave adding an MRF RE

(with for example mgcv::gam() or hglm::hglm()) until you have a working

testbed. Then advance step-by-step from there.

You still have not said how many repeat lettings you see - it will affect

the way you specify your model.

Roger

> ##############

> library(spdep)

> library(HSAR)

> library(dplyr)

> library(splitstackshape)

>

>

> # Stratified sample per zipcode (size = 1%) listings_sample <-

> splitstackshape::stratified(indt = listings, group = "zipcode", size =

> 0.01)

>

> # Removing zipcodes from polygon_nyc which are not observable in

> listings_sample polygon_nyc_listings <- polygon_nyc %>% filter(zipcode

> %in% c(unique(as.character(listings_sample$zipcode))))

>

>

> ## Random effect matrix (N by J)

>

> # N: 22172

> # J: 154

>

> # Arrange listings_sample by zipcode (ascending)

> listings_sample <- listings_sample %>% arrange(zipcode)

>

> # Count number of listings per zipcode

> MM <- listings_sample %>% st_drop_geometry() %>% group_by(zipcode) %>% summarise(count = n()) %>% as.data.frame()

> # sum(MM$count)

>

> # N by J nulled matrix creation

> Delta <- matrix(data = 0, nrow = nrow(listings_sample), ncol = dim(MM)[1])

>

> # The total number of neighbourhood

> Uid <- rep(c(1:dim(MM)[1]), MM[,2])

>

> for(i in 1:dim(MM)[1]) {

> Delta[Uid==i,i] <- 1

> }

> rm(i)

>

> Delta <- as(Delta,"dgCMatrix")

>

>

> ## Higher-level spatial weights matrix or neighbourhood matrix (J by J)

>

> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)

> polygon_nyc_nb <- poly2nb(polygon_nyc_listings, row.names = polygon_nyc$zipcode, queen = TRUE)

>

> # Include neighbour itself as a neighbour

> polygon_nyc_nb <- include.self(polygon_nyc_nb)

>

> # Spatial weights matrix for nb

> polygon_nyc_nb_matrix <- nb2mat(neighbours = polygon_nyc_nb, style = "W", zero.policy = NULL)

> M <- as(polygon_nyc_nb_matrix,"dgCMatrix")

>

>

> ## Fit HSAR SAR upper level random effect

> model <- as.formula(log_price ~ guests_included + minimum_nights)

>

> betas = coef(lm(formula = model, data = listings_sample))

> pars = list(rho = 0.5, lambda = 0.5, sigma2e = 2.0, sigma2u = 2.0, betas = betas)

>

> m_hsar <- hsar(model, data = listings_sample, W = NULL, M = M, Delta = Delta, burnin = 5000, Nsim = 10000, thinning = 1, parameters.start = pars)

>

> ##############

>

> Thank you and best regards

> Robert

>

> ________________________________________

> From: Roger Bivand <[hidden email]>

> Sent: Friday, November 8, 2019 13:29

> To: Robert R

> Cc: [hidden email]

> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>

> On Fri, 8 Nov 2019, Robert R wrote:

>

>> Dear Roger,

>>

>> Thank you for your answer.

>>

>> I successfully used the function nb2blocknb() for a smaller dataset.

>>

>> But for a dataset of over 2 million observations, I get the following

>> error: "Error: cannot allocate vector of size 840 Kb".

>

> I don't think the observations are helpful. If you have repeat lets in the

> same property in a given month, you need to handle that anyway. I'd go for

> making the modelling exercise work (we agree that this is not panel data,

> right?) on a small subset first. I would further argue that you need a

> multi-level approach rather than spdep::nb2blocknb(), with a zipcode IID

> RE. You could very well take (stratified) samples per zipcode to represent

> your data. Once that works, introduce an MRF RE at the zipcode level,

> where you do know relative position. Using SARAR is going to be a waste of

> time unless you can geocode the letting addresses. A multi-level approach

> will work. Having big data in your case with no useful location

> information per observation is just adding noise and over-smoothing, I'm

> afraid. The approach used in https://doi.org/10.1016/j.spasta.2017.01.002

> will work, also when you sample the within zipcode lets, given a split

> into training and test sets, and making CV possible.

>

> Roger

>

>>

>> I am expecting that at least 500.000 observations will be dropped due

>> the lack of values for the chosen variables for the regression model, so

>> probably I will filter and remove the observations/rows that will not be

>> used anyway - do you know if there is any package that does this

>> automatically, given the variables/columns chosed by me?

>>

>> Or would you recommend me another approach to avoid the above mentioned

>> error?

>>

>> Thank you and best regards,

>> Robert

>>

>> ________________________________________

>> From: Roger Bivand <[hidden email]>

>> Sent: Thursday, November 7, 2019 10:13

>> To: Robert R

>> Cc: [hidden email]

>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>>

>> On Thu, 7 Nov 2019, Robert R wrote:

>>

>>> Dear Roger,

>>>

>>> Many thanks for your help.

>>>

>>> I have an additional question:

>>>

>>> Is it possible to create a "separate" lw (nb2listw) (with different

>>> rownumbers) from my data set? For now, I am taking my data set and

>>> merging with the sf object polygon_nyc with the function

>>> "merge(polygon_nyc, listings, by=c("zipcode" = "zipcode"))", so I create

>>> a huge n x n matrix (depending of the size of my data set).

>>>

>>> Taking the polygon_nyc alone and turning it to a lw (weights list)

>>> object has only n = 177.

>>>

>>> Of course running

>>>

>>> spatialreg::lagsarlm(formula=model, data = listings_sample,

>>> spatialreg::polygon_nyc_lw, tol.solve=1.0e-10)

>>>

>>> does not work ("Input data and weights have different dimensions").

>>>

>>> The only option is to take my data set, merge it to my polygon_nyc (by

>>> zipcode) and then create the weights list lw? Or there another option?

>>

>> I think we are getting more clarity. You do not know the location of the

>> lettings beyond their zipcode. You do know the boundaries of the zipcode

>> areas, and can create a neighbour object from these boundaries. You then

>> want to treat all the lettings in a zipcode area i as neighbours, and

>> additionally lettings in zipcode areas neighbouring i as neighbours of

>> lettings in i. This is the data structure that motivated the

>> spdep::nb2blocknb() function:

>>

>> https://r-spatial.github.io/spdep/reference/nb2blocknb.html

>>

>> Try running the examples to get a feel for what is going on.

>>

>> I feel that most of the variability will vanish in the very large numbers

>> of neighbours, over-smoothing the outcomes. If you do not have locations

>> for the lettings themselves, I don't think you can make much progress.

>>

>> You could try a linear mixed model (or gam with a spatially structured

>> random effect) with a temporal and a spatial random effect. See the HSAR

>> package, articles by Dong et al., and maybe

>> https://doi.org/10.1016/j.spasta.2017.01.002 for another survey. Neither

>> this nor Dong et al. handle spatio-temporal settings. MRF spatial random

>> effects at the zipcode level might be a way forward, together with an IID

>> random effect at the same level (equivalent to sef-neighbours).

>>

>> Hope this helps,

>>

>> Roger

>>

>>>

>>> Best regards,

>>> Robert

>>>

>>> ________________________________________

>>> From: Roger Bivand <[hidden email]>

>>> Sent: Wednesday, November 6, 2019 15:07

>>> To: Robert R

>>> Cc: [hidden email]

>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>>>

>>> On Tue, 5 Nov 2019, Robert R wrote:

>>>

>>>> Dear Roger,

>>>>

>>>> Thank you for your reply. I disabled HTML; my e-mails should be now in

>>>> plain text.

>>>>

>>>> I will give a better context for my desired outcome.

>>>>

>>>> I am taking Airbnb's listings information for New York City available

>>>> on: http://insideairbnb.com/get-the-data.html

>>>>

>>>> I save every listings.csv.gz file available for NYC (2015-01 to 2019-09)

>>>> - in total, 54 files/time periods - as a YYYY-MM-DD.csv file into a

>>>> Listings/ folder. When importing all these 54 files into one single data

>>>> set, I create a new "date_compiled" variable/column.

>>>>

>>>> In total, after the data cleansing process, I have a little more 2

>>>> million observations.

>>>

>>> You have repeat lettings for some, but not all properties. So this is at

>>> best a very unbalanced panel. For those properties with repeats, you may

>>> see temporal movement (trend/seasonal).

>>>

>>> I suggest (strongly) taking a single borough or even zipcode with some

>>> hindreds of properties, and working from there. Do not include the

>>> observation as its own neighbour, perhaps identify repeats and handle them

>>> specially (create or use a property ID). Unbalanced panels may also create

>>> a selection bias issue (why are some properties only listed sometimes?).

>>>

>>> So this although promising isn't simple, and getting to a hedonic model

>>> may be hard, but not (just) because of spatial autocorrelation. I wouldn't

>>> necessarily trust OLS output either, partly because of the repeat property

>>> issue.

>>>

>>> Roger

>>>

>>>>

>>>> I created 54 timedummy variables for each time period available.

>>>>

>>>> I want to estimate using a hedonic spatial timedummy model the impact of

>>>> a variety of characteristics which potentially determine the daily rate

>>>> on Airbnb listings through time in New York City (e.g. characteristics

>>>> of the listing as number of bedrooms, if the host if professional,

>>>> proximity to downtown (New York City Hall) and nearest subway station

>>>> from the listing, income per capita, etc.).

>>>>

>>>> My dependent variable is price (log price, common in the related

>>>> literature for hedonic prices).

>>>>

>>>> The OLS model is done.

>>>>

>>>> For the spatial model, I am assuming that hosts, when deciding the

>>>> pricing of their listings, take not only into account its structural and

>>>> location characteristics, but also the prices charged by near listings

>>>> with similar characteristics - spatial autocorrelation is then present,

>>>> at least spatial dependence is present in the dependent variable.

>>>>

>>>> As I wrote in my previous post, I was willing to consider the neighbor

>>>> itself as a neighbor.

>>>>

>>>> Parts of my code can be found below:

>>>>

>>>> ########

>>>>

>>>> ## packages

>>>>

>>>> packages_install <- function(packages){

>>>> new.packages <- packages[!(packages %in% installed.packages()[, "Package"])]

>>>> if (length(new.packages))

>>>> install.packages(new.packages, dependencies = TRUE)

>>>> sapply(packages, require, character.only = TRUE)

>>>> }

>>>>

>>>> packages_required <- c("bookdown", "cowplot", "data.table", "dplyr", "e1071", "fastDummies", "ggplot2", "ggrepel", "janitor", "kableExtra", "knitr", "lubridate", "nngeo", "plm", "RColorBrewer", "readxl", "scales", "sf", "spdep", "stargazer", "tidyverse")

>>>> packages_install(packages_required)

>>>>

>>>> # Working directory

>>>> setwd("C:/Users/User/R")

>>>>

>>>>

>>>>

>>>> ## shapefile_us

>>>>

>>>> # Shapefile zips import and Coordinate Reference System (CRS) transformation

>>>> # Shapefile download: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_zcta510_500k.zip

>>>> shapefile_us <- sf::st_read(dsn = "Shapefile", layer = "cb_2018_us_zcta510_500k")

>>>>

>>>> # Columns removal

>>>> shapefile_us <- shapefile_us %>% select(-c(AFFGEOID10, GEOID10, ALAND10, AWATER10))

>>>>

>>>> # Column rename: ZCTA5CE10

>>>> setnames(shapefile_us, old=c("ZCTA5CE10"), new=c("zipcode"))

>>>>

>>>> # Column class change: zipcode

>>>> shapefile_us$zipcode <- as.character(shapefile_us$zipcode)

>>>>

>>>>

>>>>

>>>> ## polygon_nyc

>>>>

>>>> # Zip code not available in shapefile: 11695

>>>> polygon_nyc <- shapefile_us %>% filter(zipcode %in% zips_nyc)

>>>>

>>>>

>>>>

>>>> ## weight_matrix

>>>>

>>>> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)

>>>> polygon_nyc_nb <- poly2nb((polygon_nyc %>% select(-borough)), queen=TRUE)

>>>>

>>>> # Include neighbour itself as a neighbour

>>>> # for(i in 1:length(polygon_nyc_nb)){polygon_nyc_nb[[i]]=as.integer(c(i,polygon_nyc_nb[[i]]))}

>>>> polygon_nyc_nb <- include.self(polygon_nyc_nb)

>>>>

>>>> # Weights to each neighboring polygon

>>>> lw <- nb2listw(neighbours = polygon_nyc_nb, style="W", zero.policy=TRUE)

>>>>

>>>>

>>>>

>>>> ## listings

>>>>

>>>> # Data import

>>>> files <- list.files(path="Listings/", pattern=".csv", full.names=TRUE)

>>>> listings <- setNames(lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE, encoding="UTF-8")), files)

>>>> listings <- mapply(cbind, listings, date_compiled = names(listings))

>>>> listings <- listings %>% bind_rows

>>>>

>>>> # Characters removal

>>>> listings$date_compiled <- gsub("Listings/", "", listings$date_compiled)

>>>> listings$date_compiled <- gsub(".csv", "", listings$date_compiled)

>>>> listings$price <- gsub("\\$", "", listings$price)

>>>> listings$price <- gsub(",", "", listings$price)

>>>>

>>>>

>>>>

>>>> ## timedummy

>>>>

>>>> timedummy <- sapply("date_compiled_", paste, unique(listings$date_compiled), sep="")

>>>> timedummy <- paste(timedummy, sep = "", collapse = " + ")

>>>> timedummy <- gsub("-", "_", timedummy)

>>>>

>>>>

>>>>

>>>> ## OLS regression

>>>>

>>>> # Pooled cross-section data - Randomly sampled cross sections of Airbnb listings price at different points in time

>>>> regression <- plm(formula=as.formula(paste("log_price ~ #some variables", timedummy, sep = "", collapse = " + ")), data=listings, model="pooling", index="id")

>>>>

>>>> ########

>>>>

>>>> Some of my id's repeat in multiple time periods.

>>>>

>>>> I use NYC's zip codes to left join my data with the neighborhood zip code specific characteristics, such as income per capita to that specific zip code, etc.

>>>>

>>>> Now I want to apply the hedonic model with the timedummy variables.

>>>>

>>>> Do you know how to proceed? 1) Which package to use (spdep/splm)?; 2) Do I have to join the polygon_nyc (by zip code) to my listings data set, and then calculate the weight matrix "lw"?

>>>>

>>>> Again, thank you very much for the help provided until now.

>>>>

>>>> Best regards,

>>>> Robert

>>>>

>>>> ________________________________________

>>>> From: Roger Bivand <[hidden email]>

>>>> Sent: Tuesday, November 5, 2019 15:30

>>>> To: Robert R

>>>> Cc: [hidden email]

>>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>>>>

>>>> On Tue, 5 Nov 2019, Robert R wrote:

>>>>

>>>>> I have a large pooled cross-section data set. ?I would like to

>>>>> estimate/regress using spatial autocorrelation methods. I am assuming

>>>>> for now that spatial dependence is present in both the dependent

>>>>> variable and the error term.? ?My data set is over a period of 4 years,

>>>>> monthly data (54 periods). For this means, I've created a time dummy

>>>>> variable for each time period.? ?I also created a weight matrix using the

>>>>> functions "poly2nb" and "nb2listw".? ?Now I am trying to figure out a way

>>>>> to estimate my model which contains a really big data set.? ?Basically, my

>>>>> model is as follows: y = ?D + ?W1y + X? + ?W2u + ?? ?My questions are:? ?1)

>>>>> My spatial weight matrix for the whole data set will be probably a

>>>>> enormous matrix with submatrices for each time period itself. I don't

>>>>> think it would be possible to calculate this.? What I would like to know

>>>>> is a way to estimate each time dummy/period separately (to compare

>>>>> different periods alone). How to do it?? ?2) Which package to use: spdep

>>>>> or splm?? ?Thank you and best regards,? Robert?

>>>>

>>>> Please do not post HTML, only plain text. Almost certainly your model

>>>> specification is wrong (SARAR/SAC is always a bad idea if alternatives are

>>>> untried). What is your cross-sectional size? Using sparse kronecker

>>>> products, the "enormous" matrix may not be very big. Does it make any

>>>> sense using time dummies (54 x N x T will be mostly zero anyway)? Are most

>>>> of the covariates time-varying? Please provide motivation and use area

>>>> (preferably with affiliation (your email and user name are not

>>>> informative) - this feels like a real estate problem, probably wrongly

>>>> specified. You should use splm if time make sense in your case, but if it

>>>> really doesn't, simplify your approach, as much of the data will be

>>>> subject to very large temporal autocorrelation.

>>>>

>>>> If this is a continuation of your previous question about using

>>>> self-neighbours, be aware that you should not use self-neighbours in

>>>> modelling, they are only useful for the Getis-Ord local G_i^* measure.

>>>>

>>>> Roger

>>>>

>>>>>

>>>>> [[alternative HTML version deleted]]

>>>>>

>>>>> _______________________________________________

>>>>> R-sig-Geo mailing list

>>>>> [hidden email]

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>>

>>>> --

>>>> Roger Bivand

>>>> Department of Economics, Norwegian School of Economics,

>>>> Helleveien 30, N-5045 Bergen, Norway.

>>>> voice: +47 55 95 93 55; e-mail: [hidden email]

>>>> https://orcid.org/0000-0003-2392-6140

>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>>>

>>>

>>> --

>>> Roger Bivand

>>> Department of Economics, Norwegian School of Economics,

>>> Helleveien 30, N-5045 Bergen, Norway.

>>> voice: +47 55 95 93 55; e-mail: [hidden email]

>>> https://orcid.org/0000-0003-2392-6140

>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email]

>> https://orcid.org/0000-0003-2392-6140

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> https://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Thank you for your message and sorry for my late answer.

Regarding the number of listings (lettings) for my data set (2.216.642 observations), each listing contains an individual id:

unique ids: 180.004

time periods: 54 (2015-01 to 2019-09)

number of ids that appear only once: 28.486 (of 180.004 ids) (15,8%)

number of ids that appear/repeat 2-10 times: 82.641 (of 180.004 ids) (45,9%)

number of ids that appear/repeat 11-30 times: 46.465 (of 180.004 ids) (25,8%)

number of ids that appear/repeat 31-54 times: 22.412 (of 180.004 ids) (12,5%)

Important to notice is that hosts can change the room_category (between entire/home apt, private room and shared room) keeping the same listing id number. In my data, the number of unique ids that in some point changed the room_type is of 7.204 ids.

--

For the OLS model, I was using only a fixed effect model, where each time period (date_compiled) (54 in total) is a time dummy.

plm::plm(formula = model, data = listings, model = "pooling", index = c("id", "date_compiled"))

--

Osland et al. (2016) (https://doi.org/10.1111/jors.12281) use a spatial fixed effects (SFE) hedonic model, where each defined neighborhood zone in the study area is represented by dummy variables.

Dong et al. (2015) (https://doi.org/10.1111/gean.12049) outline four model specifications to accommodate geographically hierarchical data structures: (1) groupwise W and fixed regional effects; (2) groupwise W and random regional effects; (3) proximity-based W and fixed regional effects; and (4) proximity-based W and random regional effects.

--

I created a new column/variable containing the borough where the zipcode is found (Manhattan, Brooklyn, Queens, Bronx, Staten Island).

If I understood it right, the (two-level) Hierarchical Spatial Simultaneous Autoregressive Model (HSAR) considers the occurrence of spatial relations at the (lower) individual (geographical coordinates - in my case, the listing location) and (higher) group level (territorial units - in my case, zipcodes).

According to Bivand et al. (2017): "(...) W is a spatial weights matrix. The HSAR model may also be estimated without this component.". So, in this case I only estimate the Hierarchical Spatial Simultaneous Autoregressive Model (HSAR) in a "one-level" basis, i.e., at the higher-level.

HSAR::hsar(model, data = listings, W = NULL, M = M, Delta = Delta, burnin = 5000, Nsim = 10000, thinning = 1, parameters.start = pars)

(Where the "model" formula contains the 54 time dummy variables)

Do you think I can proceed with this model? I was able to calculate it.

If I remove all observations/rows with NAs in one of the chosen variables/observations, 884.183 observations remain. If I would create a W matrix for HSAR::hsar, I would have a gigantic 884.183 by 884.183 matrix. This is the reason why I put W = NULL.

Thank you and best regards

________________________________________

From: Roger Bivand <[hidden email]>

Sent: Monday, November 11, 2019 11:31

To: Robert R

Cc: [hidden email]

Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

On Sun, 10 Nov 2019, Robert R wrote:

> Dear Roger,

>

> Again, thank you for your answer. I read the material provided and

> decided that Hierarchical Spatial Autoregressive (HSAR) could be the

> right model for me.

>

> I indeed have the precise latitude and longitude information for all my

> listings for NYC.

>

> I created a stratified sample (group = zipcode) with 22172 (1%) of my

> observations called listings_sample and tried to replicate the hsar

> model, please see below.

>

> For now W = NULL, because otherwise I would have a 22172 x 22172 matrix.

Unless you know definitely that you want to relate the response to its

lagged value, you do not need this. Do note that the matrix is very

sparse, so could be fitted without difficulty with ML in a cross-sectional

model.

>

> You recommended then to introduce a Markov random field (MRF) random

> effect (RE) at the zipcode level, but I did not understand it so well.

> Could you develop a litte more?

>

Did you read the development in

https://doi.org/10.1016/j.spasta.2017.01.002? It is explained there, and

includes code for fitting the Beijing housing parcels data se from HSAR

with many other packages (MCMC, INLA, hglm, etc.). I guess that you should

try to create a model that works on a single borough, sing the zipcodes

in that borough as a proxy for unobserved neighbourhood effects. Try for

example using lme4::lmer() with only a zipcode IID random effect, see if

the hedonic estimates are similar to lm(), and leave adding an MRF RE

(with for example mgcv::gam() or hglm::hglm()) until you have a working

testbed. Then advance step-by-step from there.

You still have not said how many repeat lettings you see - it will affect

the way you specify your model.

Roger

> ##############

> library(spdep)

> library(HSAR)

> library(dplyr)

> library(splitstackshape)

>

>

> # Stratified sample per zipcode (size = 1%) listings_sample <-

> splitstackshape::stratified(indt = listings, group = "zipcode", size =

> 0.01)

>

> # Removing zipcodes from polygon_nyc which are not observable in

> listings_sample polygon_nyc_listings <- polygon_nyc %>% filter(zipcode

> %in% c(unique(as.character(listings_sample$zipcode))))

>

>

> ## Random effect matrix (N by J)

>

> # N: 22172

> # J: 154

>

> # Arrange listings_sample by zipcode (ascending)

> listings_sample <- listings_sample %>% arrange(zipcode)

>

> # Count number of listings per zipcode

> MM <- listings_sample %>% st_drop_geometry() %>% group_by(zipcode) %>% summarise(count = n()) %>% as.data.frame()

> # sum(MM$count)

>

> # N by J nulled matrix creation

> Delta <- matrix(data = 0, nrow = nrow(listings_sample), ncol = dim(MM)[1])

>

> # The total number of neighbourhood

> Uid <- rep(c(1:dim(MM)[1]), MM[,2])

>

> for(i in 1:dim(MM)[1]) {

> Delta[Uid==i,i] <- 1

> }

> rm(i)

>

> Delta <- as(Delta,"dgCMatrix")

>

>

> ## Higher-level spatial weights matrix or neighbourhood matrix (J by J)

>

> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)

> polygon_nyc_nb <- poly2nb(polygon_nyc_listings, row.names = polygon_nyc$zipcode, queen = TRUE)

>

> # Include neighbour itself as a neighbour

> polygon_nyc_nb <- include.self(polygon_nyc_nb)

>

> # Spatial weights matrix for nb

> polygon_nyc_nb_matrix <- nb2mat(neighbours = polygon_nyc_nb, style = "W", zero.policy = NULL)

> M <- as(polygon_nyc_nb_matrix,"dgCMatrix")

>

>

> ## Fit HSAR SAR upper level random effect

> model <- as.formula(log_price ~ guests_included + minimum_nights)

>

> betas = coef(lm(formula = model, data = listings_sample))

> pars = list(rho = 0.5, lambda = 0.5, sigma2e = 2.0, sigma2u = 2.0, betas = betas)

>

> m_hsar <- hsar(model, data = listings_sample, W = NULL, M = M, Delta = Delta, burnin = 5000, Nsim = 10000, thinning = 1, parameters.start = pars)

>

> ##############

>

> Thank you and best regards

> Robert

>

> ________________________________________

> From: Roger Bivand <[hidden email]>

> Sent: Friday, November 8, 2019 13:29

> To: Robert R

> Cc: [hidden email]

> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>

> On Fri, 8 Nov 2019, Robert R wrote:

>

>> Dear Roger,

>>

>> Thank you for your answer.

>>

>> I successfully used the function nb2blocknb() for a smaller dataset.

>>

>> But for a dataset of over 2 million observations, I get the following

>> error: "Error: cannot allocate vector of size 840 Kb".

>

> I don't think the observations are helpful. If you have repeat lets in the

> same property in a given month, you need to handle that anyway. I'd go for

> making the modelling exercise work (we agree that this is not panel data,

> right?) on a small subset first. I would further argue that you need a

> multi-level approach rather than spdep::nb2blocknb(), with a zipcode IID

> RE. You could very well take (stratified) samples per zipcode to represent

> your data. Once that works, introduce an MRF RE at the zipcode level,

> where you do know relative position. Using SARAR is going to be a waste of

> time unless you can geocode the letting addresses. A multi-level approach

> will work. Having big data in your case with no useful location

> information per observation is just adding noise and over-smoothing, I'm

> afraid. The approach used in https://doi.org/10.1016/j.spasta.2017.01.002

> will work, also when you sample the within zipcode lets, given a split

> into training and test sets, and making CV possible.

>

> Roger

>

>>

>> I am expecting that at least 500.000 observations will be dropped due

>> the lack of values for the chosen variables for the regression model, so

>> probably I will filter and remove the observations/rows that will not be

>> used anyway - do you know if there is any package that does this

>> automatically, given the variables/columns chosed by me?

>>

>> Or would you recommend me another approach to avoid the above mentioned

>> error?

>>

>> Thank you and best regards,

>> Robert

>>

>> ________________________________________

>> From: Roger Bivand <[hidden email]>

>> Sent: Thursday, November 7, 2019 10:13

>> To: Robert R

>> Cc: [hidden email]

>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>>

>> On Thu, 7 Nov 2019, Robert R wrote:

>>

>>> Dear Roger,

>>>

>>> Many thanks for your help.

>>>

>>> I have an additional question:

>>>

>>> Is it possible to create a "separate" lw (nb2listw) (with different

>>> rownumbers) from my data set? For now, I am taking my data set and

>>> merging with the sf object polygon_nyc with the function

>>> "merge(polygon_nyc, listings, by=c("zipcode" = "zipcode"))", so I create

>>> a huge n x n matrix (depending of the size of my data set).

>>>

>>> Taking the polygon_nyc alone and turning it to a lw (weights list)

>>> object has only n = 177.

>>>

>>> Of course running

>>>

>>> spatialreg::lagsarlm(formula=model, data = listings_sample,

>>> spatialreg::polygon_nyc_lw, tol.solve=1.0e-10)

>>>

>>> does not work ("Input data and weights have different dimensions").

>>>

>>> The only option is to take my data set, merge it to my polygon_nyc (by

>>> zipcode) and then create the weights list lw? Or there another option?

>>

>> I think we are getting more clarity. You do not know the location of the

>> lettings beyond their zipcode. You do know the boundaries of the zipcode

>> areas, and can create a neighbour object from these boundaries. You then

>> want to treat all the lettings in a zipcode area i as neighbours, and

>> additionally lettings in zipcode areas neighbouring i as neighbours of

>> lettings in i. This is the data structure that motivated the

>> spdep::nb2blocknb() function:

>>

>> https://r-spatial.github.io/spdep/reference/nb2blocknb.html

>>

>> Try running the examples to get a feel for what is going on.

>>

>> I feel that most of the variability will vanish in the very large numbers

>> of neighbours, over-smoothing the outcomes. If you do not have locations

>> for the lettings themselves, I don't think you can make much progress.

>>

>> You could try a linear mixed model (or gam with a spatially structured

>> random effect) with a temporal and a spatial random effect. See the HSAR

>> package, articles by Dong et al., and maybe

>> https://doi.org/10.1016/j.spasta.2017.01.002 for another survey. Neither

>> this nor Dong et al. handle spatio-temporal settings. MRF spatial random

>> effects at the zipcode level might be a way forward, together with an IID

>> random effect at the same level (equivalent to sef-neighbours).

>>

>> Hope this helps,

>>

>> Roger

>>

>>>

>>> Best regards,

>>> Robert

>>>

>>> ________________________________________

>>> From: Roger Bivand <[hidden email]>

>>> Sent: Wednesday, November 6, 2019 15:07

>>> To: Robert R

>>> Cc: [hidden email]

>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>>>

>>> On Tue, 5 Nov 2019, Robert R wrote:

>>>

>>>> Dear Roger,

>>>>

>>>> Thank you for your reply. I disabled HTML; my e-mails should be now in

>>>> plain text.

>>>>

>>>> I will give a better context for my desired outcome.

>>>>

>>>> I am taking Airbnb's listings information for New York City available

>>>> on: http://insideairbnb.com/get-the-data.html

>>>>

>>>> I save every listings.csv.gz file available for NYC (2015-01 to 2019-09)

>>>> - in total, 54 files/time periods - as a YYYY-MM-DD.csv file into a

>>>> Listings/ folder. When importing all these 54 files into one single data

>>>> set, I create a new "date_compiled" variable/column.

>>>>

>>>> In total, after the data cleansing process, I have a little more 2

>>>> million observations.

>>>

>>> You have repeat lettings for some, but not all properties. So this is at

>>> best a very unbalanced panel. For those properties with repeats, you may

>>> see temporal movement (trend/seasonal).

>>>

>>> I suggest (strongly) taking a single borough or even zipcode with some

>>> hindreds of properties, and working from there. Do not include the

>>> observation as its own neighbour, perhaps identify repeats and handle them

>>> specially (create or use a property ID). Unbalanced panels may also create

>>> a selection bias issue (why are some properties only listed sometimes?).

>>>

>>> So this although promising isn't simple, and getting to a hedonic model

>>> may be hard, but not (just) because of spatial autocorrelation. I wouldn't

>>> necessarily trust OLS output either, partly because of the repeat property

>>> issue.

>>>

>>> Roger

>>>

>>>>

>>>> I created 54 timedummy variables for each time period available.

>>>>

>>>> I want to estimate using a hedonic spatial timedummy model the impact of

>>>> a variety of characteristics which potentially determine the daily rate

>>>> on Airbnb listings through time in New York City (e.g. characteristics

>>>> of the listing as number of bedrooms, if the host if professional,

>>>> proximity to downtown (New York City Hall) and nearest subway station

>>>> from the listing, income per capita, etc.).

>>>>

>>>> My dependent variable is price (log price, common in the related

>>>> literature for hedonic prices).

>>>>

>>>> The OLS model is done.

>>>>

>>>> For the spatial model, I am assuming that hosts, when deciding the

>>>> pricing of their listings, take not only into account its structural and

>>>> location characteristics, but also the prices charged by near listings

>>>> with similar characteristics - spatial autocorrelation is then present,

>>>> at least spatial dependence is present in the dependent variable.

>>>>

>>>> As I wrote in my previous post, I was willing to consider the neighbor

>>>> itself as a neighbor.

>>>>

>>>> Parts of my code can be found below:

>>>>

>>>> ########

>>>>

>>>> ## packages

>>>>

>>>> packages_install <- function(packages){

>>>> new.packages <- packages[!(packages %in% installed.packages()[, "Package"])]

>>>> if (length(new.packages))

>>>> install.packages(new.packages, dependencies = TRUE)

>>>> sapply(packages, require, character.only = TRUE)

>>>> }

>>>>

>>>> packages_required <- c("bookdown", "cowplot", "data.table", "dplyr", "e1071", "fastDummies", "ggplot2", "ggrepel", "janitor", "kableExtra", "knitr", "lubridate", "nngeo", "plm", "RColorBrewer", "readxl", "scales", "sf", "spdep", "stargazer", "tidyverse")

>>>> packages_install(packages_required)

>>>>

>>>> # Working directory

>>>> setwd("C:/Users/User/R")

>>>>

>>>>

>>>>

>>>> ## shapefile_us

>>>>

>>>> # Shapefile zips import and Coordinate Reference System (CRS) transformation

>>>> # Shapefile download: https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_zcta510_500k.zip

>>>> shapefile_us <- sf::st_read(dsn = "Shapefile", layer = "cb_2018_us_zcta510_500k")

>>>>

>>>> # Columns removal

>>>> shapefile_us <- shapefile_us %>% select(-c(AFFGEOID10, GEOID10, ALAND10, AWATER10))

>>>>

>>>> # Column rename: ZCTA5CE10

>>>> setnames(shapefile_us, old=c("ZCTA5CE10"), new=c("zipcode"))

>>>>

>>>> # Column class change: zipcode

>>>> shapefile_us$zipcode <- as.character(shapefile_us$zipcode)

>>>>

>>>>

>>>>

>>>> ## polygon_nyc

>>>>

>>>> # Zip code not available in shapefile: 11695

>>>> polygon_nyc <- shapefile_us %>% filter(zipcode %in% zips_nyc)

>>>>

>>>>

>>>>

>>>> ## weight_matrix

>>>>

>>>> # Neighboring polygons: list of neighbors for each polygon (queen contiguity neighbors)

>>>> polygon_nyc_nb <- poly2nb((polygon_nyc %>% select(-borough)), queen=TRUE)

>>>>

>>>> # Include neighbour itself as a neighbour

>>>> # for(i in 1:length(polygon_nyc_nb)){polygon_nyc_nb[[i]]=as.integer(c(i,polygon_nyc_nb[[i]]))}

>>>> polygon_nyc_nb <- include.self(polygon_nyc_nb)

>>>>

>>>> # Weights to each neighboring polygon

>>>> lw <- nb2listw(neighbours = polygon_nyc_nb, style="W", zero.policy=TRUE)

>>>>

>>>>

>>>>

>>>> ## listings

>>>>

>>>> # Data import

>>>> files <- list.files(path="Listings/", pattern=".csv", full.names=TRUE)

>>>> listings <- setNames(lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE, encoding="UTF-8")), files)

>>>> listings <- mapply(cbind, listings, date_compiled = names(listings))

>>>> listings <- listings %>% bind_rows

>>>>

>>>> # Characters removal

>>>> listings$date_compiled <- gsub("Listings/", "", listings$date_compiled)

>>>> listings$date_compiled <- gsub(".csv", "", listings$date_compiled)

>>>> listings$price <- gsub("\\$", "", listings$price)

>>>> listings$price <- gsub(",", "", listings$price)

>>>>

>>>>

>>>>

>>>> ## timedummy

>>>>

>>>> timedummy <- sapply("date_compiled_", paste, unique(listings$date_compiled), sep="")

>>>> timedummy <- paste(timedummy, sep = "", collapse = " + ")

>>>> timedummy <- gsub("-", "_", timedummy)

>>>>

>>>>

>>>>

>>>> ## OLS regression

>>>>

>>>> # Pooled cross-section data - Randomly sampled cross sections of Airbnb listings price at different points in time

>>>> regression <- plm(formula=as.formula(paste("log_price ~ #some variables", timedummy, sep = "", collapse = " + ")), data=listings, model="pooling", index="id")

>>>>

>>>> ########

>>>>

>>>> Some of my id's repeat in multiple time periods.

>>>>

>>>> I use NYC's zip codes to left join my data with the neighborhood zip code specific characteristics, such as income per capita to that specific zip code, etc.

>>>>

>>>> Now I want to apply the hedonic model with the timedummy variables.

>>>>

>>>> Do you know how to proceed? 1) Which package to use (spdep/splm)?; 2) Do I have to join the polygon_nyc (by zip code) to my listings data set, and then calculate the weight matrix "lw"?

>>>>

>>>> Again, thank you very much for the help provided until now.

>>>>

>>>> Best regards,

>>>> Robert

>>>>

>>>> ________________________________________

>>>> From: Roger Bivand <[hidden email]>

>>>> Sent: Tuesday, November 5, 2019 15:30

>>>> To: Robert R

>>>> Cc: [hidden email]

>>>> Subject: Re: [R-sig-Geo] Spatial Autocorrelation Estimation Method

>>>>

>>>> On Tue, 5 Nov 2019, Robert R wrote:

>>>>

>>>>> I have a large pooled cross-section data set. ?I would like to

>>>>> estimate/regress using spatial autocorrelation methods. I am assuming

>>>>> for now that spatial dependence is present in both the dependent

>>>>> variable and the error term.? ?My data set is over a period of 4 years,

>>>>> monthly data (54 periods). For this means, I've created a time dummy

>>>>> variable for each time period.? ?I also created a weight matrix using the

>>>>> functions "poly2nb" and "nb2listw".? ?Now I am trying to figure out a way

>>>>> to estimate my model which contains a really big data set.? ?Basically, my

>>>>> model is as follows: y = ?D + ?W1y + X? + ?W2u + ?? ?My questions are:? ?1)

>>>>> My spatial weight matrix for the whole data set will be probably a

>>>>> enormous matrix with submatrices for each time period itself. I don't

>>>>> think it would be possible to calculate this.? What I would like to know

>>>>> is a way to estimate each time dummy/period separately (to compare

>>>>> different periods alone). How to do it?? ?2) Which package to use: spdep

>>>>> or splm?? ?Thank you and best regards,? Robert?

>>>>

>>>> Please do not post HTML, only plain text. Almost certainly your model

>>>> specification is wrong (SARAR/SAC is always a bad idea if alternatives are

>>>> untried). What is your cross-sectional size? Using sparse kronecker

>>>> products, the "enormous" matrix may not be very big. Does it make any

>>>> sense using time dummies (54 x N x T will be mostly zero anyway)? Are most

>>>> of the covariates time-varying? Please provide motivation and use area

>>>> (preferably with affiliation (your email and user name are not

>>>> informative) - this feels like a real estate problem, probably wrongly

>>>> specified. You should use splm if time make sense in your case, but if it

>>>> really doesn't, simplify your approach, as much of the data will be

>>>> subject to very large temporal autocorrelation.

>>>>

>>>> If this is a continuation of your previous question about using

>>>> self-neighbours, be aware that you should not use self-neighbours in

>>>> modelling, they are only useful for the Getis-Ord local G_i^* measure.

>>>>

>>>> Roger

>>>>

>>>>>

>>>>> [[alternative HTML version deleted]]

>>>>>

>>>>> _______________________________________________

>>>>> R-sig-Geo mailing list

>>>>> [hidden email]

>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>>>>

>>>> --

>>>> Roger Bivand

>>>> Department of Economics, Norwegian School of Economics,

>>>> Helleveien 30, N-5045 Bergen, Norway.

>>>> voice: +47 55 95 93 55; e-mail: [hidden email]

>>>> https://orcid.org/0000-0003-2392-6140

>>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>>>

>>>

>>> --

>>> Roger Bivand

>>> Department of Economics, Norwegian School of Economics,

>>> Helleveien 30, N-5045 Bergen, Norway.

>>> voice: +47 55 95 93 55; e-mail: [hidden email]

>>> https://orcid.org/0000-0003-2392-6140

>>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>>

>>

>> --

>> Roger Bivand

>> Department of Economics, Norwegian School of Economics,

>> Helleveien 30, N-5045 Bergen, Norway.

>> voice: +47 55 95 93 55; e-mail: [hidden email]

>> https://orcid.org/0000-0003-2392-6140

>> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>>

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> https://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: sp/rgdal workflows with PROJ >= 6 and GDAL >= 3

The development version of rgdal on R-Forge is now at rev 894, and is now

ready for trying out with PROJ6/GDAL3 workflows, and workflows that may

migrate within 6 months to modern CRS representations. The motivating RFC

is also updated to cover coordinate operations, the use of prepared

(pre-searched) coordinate operations, and should be read carefully by

anyone using rgdal::spTransform(). Note further that rgdal::project() will

not be adapted for PROJ6, and is effectively deprecated.

I'll be running reverse dependency checks, and may be bugging package

maintainers. I would really prefer that mainainers of packages using

spTransform() checked themselves and joined this thread or the associated

twitter thread: https://twitter.com/RogerBivand/status/1194586193108914177

Be ready for modern PROJ and GDAL, they are already being deployed across

open source geospatial software, like GRASS, QGIS, pyproj, spatialite etc.

Waiting, hopefully not in vain, for contributions.

Roger

On Wed, 13 Nov 2019, Roger Bivand wrote:

> And this link explains the CDN proposal for grid distribution:

>

> https://www.spatialys.com/en/crowdfunding/

>

> Roger

>

> On Wed, 13 Nov 2019, Roger Bivand wrote:

>

>> Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings

>> (representations of coordinate reference systems) are handled, steps are

>> being taken to find ways to adapt sp/rgdal workflows. A current proposal

>> is to store the WKT2_2018 string as a comment to CRS objects as defined in

>> the sp package.

>>

>> A draft development-in-progress version of rgdal is available at

>> https://r-forge.r-project.org/R/?group_id=884, and for sp at

>> https://github.com/rsbivand/sp (this version of sp requires rgdal >=

>> 1.5-1). This adds the WKT comments to CRS objects on reading vector and

>> raster data sources, and uses WKT comments if found when writing vector

>> and raster objects (or at least does as far as I've checked, possibly

>> fragile).

>>

>> An RFC with tersely worked cases for using CRS object comments to carry

>> WKT strings but maintaining full backward compatibility is online at

>> http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.

>>

>> If you have other ideas or concerns about trying to use this mechanism for

>> sp CRS objects, please contribute at your earliest convenience.

>>

>> http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the

>> beginning of the next step, to query transformation operations to find

>> viable coordinate operation pipelines.

>>

>> I'm assuming that the previous behaviour (transform without considering

>> accuracy with whatever is to hand) is not viable going forward, and that

>> we will need two steps: list coordinate operations between source and

>> target CRS (using the WKT comments as better specifications than the PROJ

>> strings), possibly intervene manually to install missing grids, then

>> undertake the coordinate operation.

>>

>> The fallback may be simply to choose the least inaccurate available

>> coordinate operation, but this should be a fallback. This means that all

>> uses of spTransform() will require intervention.

>>

>> Is this OK (it is tiresome but modernises workflows once), or is it not OK

>> (no user intervention is crucial)?

>>

>> These behaviours may be set in an option, so that package maintainers and

>> users may delay modernisation, but all are undoubtedly served by rapid

>> adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS

>> development versions all state that they list candidate coordinate

>> operations).

>>

>> We cannot ship all the grids, they are very bulky, and probably nobody

>> needs sub-metre accuracy world-wide. Work in PROJ is starting to create a

>> content delivery network for trusted download and mechanisms for

>> registering downloaded grids on user platforms. We would for example not

>> want Windows users of rgdal and sf to have to download the same grid

>> twice.

>>

>> Comments welcome here and at

>> https://github.com/r-spatial/discuss/issues/28 or

>> https://github.com/r-spatial/sf/issues/1187

>>

>> Roger

>>

>>

>

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

ready for trying out with PROJ6/GDAL3 workflows, and workflows that may

migrate within 6 months to modern CRS representations. The motivating RFC

is also updated to cover coordinate operations, the use of prepared

(pre-searched) coordinate operations, and should be read carefully by

anyone using rgdal::spTransform(). Note further that rgdal::project() will

not be adapted for PROJ6, and is effectively deprecated.

I'll be running reverse dependency checks, and may be bugging package

maintainers. I would really prefer that mainainers of packages using

spTransform() checked themselves and joined this thread or the associated

twitter thread: https://twitter.com/RogerBivand/status/1194586193108914177

Be ready for modern PROJ and GDAL, they are already being deployed across

open source geospatial software, like GRASS, QGIS, pyproj, spatialite etc.

Waiting, hopefully not in vain, for contributions.

Roger

On Wed, 13 Nov 2019, Roger Bivand wrote:

> And this link explains the CDN proposal for grid distribution:

>

> https://www.spatialys.com/en/crowdfunding/

>

> Roger

>

> On Wed, 13 Nov 2019, Roger Bivand wrote:

>

>> Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings

>> (representations of coordinate reference systems) are handled, steps are

>> being taken to find ways to adapt sp/rgdal workflows. A current proposal

>> is to store the WKT2_2018 string as a comment to CRS objects as defined in

>> the sp package.

>>

>> A draft development-in-progress version of rgdal is available at

>> https://r-forge.r-project.org/R/?group_id=884, and for sp at

>> https://github.com/rsbivand/sp (this version of sp requires rgdal >=

>> 1.5-1). This adds the WKT comments to CRS objects on reading vector and

>> raster data sources, and uses WKT comments if found when writing vector

>> and raster objects (or at least does as far as I've checked, possibly

>> fragile).

>>

>> An RFC with tersely worked cases for using CRS object comments to carry

>> WKT strings but maintaining full backward compatibility is online at

>> http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.

>>

>> If you have other ideas or concerns about trying to use this mechanism for

>> sp CRS objects, please contribute at your earliest convenience.

>>

>> http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the

>> beginning of the next step, to query transformation operations to find

>> viable coordinate operation pipelines.

>>

>> I'm assuming that the previous behaviour (transform without considering

>> accuracy with whatever is to hand) is not viable going forward, and that

>> we will need two steps: list coordinate operations between source and

>> target CRS (using the WKT comments as better specifications than the PROJ

>> strings), possibly intervene manually to install missing grids, then

>> undertake the coordinate operation.

>>

>> The fallback may be simply to choose the least inaccurate available

>> coordinate operation, but this should be a fallback. This means that all

>> uses of spTransform() will require intervention.

>>

>> Is this OK (it is tiresome but modernises workflows once), or is it not OK

>> (no user intervention is crucial)?

>>

>> These behaviours may be set in an option, so that package maintainers and

>> users may delay modernisation, but all are undoubtedly served by rapid

>> adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS

>> development versions all state that they list candidate coordinate

>> operations).

>>

>> We cannot ship all the grids, they are very bulky, and probably nobody

>> needs sub-metre accuracy world-wide. Work in PROJ is starting to create a

>> content delivery network for trusted download and mechanisms for

>> registering downloaded grids on user platforms. We would for example not

>> want Windows users of rgdal and sf to have to download the same grid

>> twice.

>>

>> Comments welcome here and at

>> https://github.com/r-spatial/discuss/issues/28 or

>> https://github.com/r-spatial/sf/issues/1187

>>

>> Roger

>>

>>

>

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Calculating weighted values of precipitation annually

Hi there,I am currently trying to calculate "weighted" spatial annual global values for precipitation using the object "RCP1pctCO2Mean", which is a raster brick. I need to do this for each of the 138 years (i.e. 138 layers) of global precipitation data that I have. The idea would be to somehow apply weights to each grid cell value for each year by using the cosine of its latitude (which means grid cells at the equator would have a weight of 1 (i.e. the cosine of 0 degrees is 1), and the poles would have a value of 1 (as the cosine of 90 is 1)). The idea would also then be to make a time series of these values, from Year 1 to Year 138, after all newly derived grid cell values are averaged for each year, creating 138 (weighted) averages)). The object "RCP1pctCO2Mean" looks like this:

class : RasterBrick

dimensions : 64, 128, 8192, 140 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

data source : C:/Users/Travis/Documents/Other documents/All netCDF files/netcdffiles/MaxPrecIPSLIPSL-CM5B-LR1pctCO2.nc

names : X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...

z-value : 1, 140 (min, max)

varname : onedaymax

Here is what I have done so far:

newraster <- rasterToPoints(RCP1pctCO2Mean) #Not sure if this is necessary?

I then proceeded to do assign weights like this:

weight <- cos(newraster*(pi/180))

However, this yields strange but identical precipitation values (i.e. all values are 0.97 to 0.99, which is odd) across each grid cell for each layer. I am not sure what I am doing incorrectly (if there is anything incorrect) - could it be that the "pi/180" is not necessary? Also, once this is done, how to revert to a raster stack with the new values?

I also saw another function, called "getWeights", but I am not sure how relevant this is. I am not sure about its associated package, but I was thinking about using it like this

weight <- getWeights(newraster, f = (newraster) cos(newraster))

Would this approach apply the weights appropriately?

I would greatly appreciate any help with this!!!!Thanks,

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

class : RasterBrick

dimensions : 64, 128, 8192, 140 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

data source : C:/Users/Travis/Documents/Other documents/All netCDF files/netcdffiles/MaxPrecIPSLIPSL-CM5B-LR1pctCO2.nc

names : X1, X2, X3, X4, X5, X6, X7, X8, X9, X10, X11, X12, X13, X14, X15, ...

z-value : 1, 140 (min, max)

varname : onedaymax

Here is what I have done so far:

newraster <- rasterToPoints(RCP1pctCO2Mean) #Not sure if this is necessary?

I then proceeded to do assign weights like this:

weight <- cos(newraster*(pi/180))

However, this yields strange but identical precipitation values (i.e. all values are 0.97 to 0.99, which is odd) across each grid cell for each layer. I am not sure what I am doing incorrectly (if there is anything incorrect) - could it be that the "pi/180" is not necessary? Also, once this is done, how to revert to a raster stack with the new values?

I also saw another function, called "getWeights", but I am not sure how relevant this is. I am not sure about its associated package, but I was thinking about using it like this

weight <- getWeights(newraster, f = (newraster) cos(newraster))

Would this approach apply the weights appropriately?

I would greatly appreciate any help with this!!!!Thanks,

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Averaging values across grid cells for each layer of a raster brick object

Hi there,

I am trying to average precipitation values across grid cells of a raster (which is masked to only account for land areas). This raster brick object, called "overlap.sub" has 138 layers (years). It was created as follows:

World_land <- readOGR("ne_110m_land.shp")

newprojection1 <- spTransform(World_land, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))overlap <- crop(RCP1pctCO2Median, extent(newprojection1))

overlap.sub <- mask(overlap, newprojection1) #This isolates grid cells on all land areas

The object "overlap.sub" has the following attributes:

class : RasterBrick

dimensions : 62, 127, 7874, 138 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -178.5938, 178.5938, -89.25846, 83.67981 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

data source : C:/Users/Travis/AppData/Local/Temp/Rtmp0GALJn/raster/r_tmp_2019-11-14_201408_9732_60407.grd

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7, layer.8, layer.9, layer.10, layer.11, layer.12, layer.13, layer.14, layer.15, ...

min values : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730, 0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687, 0.51857087, 0.41005131, 0.45956529, 0.47497867, ...

max values : 87.85833, 86.73710, 88.92577, 76.44161, 82.43909, 85.03188, 77.36673, 75.86527, 94.32226, 101.58247, 86.67574, 90.96802, 99.59785, 76.78394, 88.31423, ...

Now, to obtain the annual averages across the masked grid cells, so that I may plot a time series of the average precipitation across grid cells for Year 1, then for Year 2, then Year 3....all the way to Year 138 (essentially 138 averages):

Averageprec <- colMeans(overlap.sub)

However, this yields this error:

Error in colMeans(overlap.sub) :

'x' must be an array of at least two dimensions

I do believe that colMeans is the appropriate function for this, but why am I receiving this error? Unless there is another way to do this? Also, this would only average those grid cells that I masked (i.e. only the land areas), correct? Thanks, and any help/feedback would be greatly appreciated!!

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I am trying to average precipitation values across grid cells of a raster (which is masked to only account for land areas). This raster brick object, called "overlap.sub" has 138 layers (years). It was created as follows:

World_land <- readOGR("ne_110m_land.shp")

newprojection1 <- spTransform(World_land, CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))overlap <- crop(RCP1pctCO2Median, extent(newprojection1))

overlap.sub <- mask(overlap, newprojection1) #This isolates grid cells on all land areas

The object "overlap.sub" has the following attributes:

class : RasterBrick

dimensions : 62, 127, 7874, 138 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -178.5938, 178.5938, -89.25846, 83.67981 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

data source : C:/Users/Travis/AppData/Local/Temp/Rtmp0GALJn/raster/r_tmp_2019-11-14_201408_9732_60407.grd

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7, layer.8, layer.9, layer.10, layer.11, layer.12, layer.13, layer.14, layer.15, ...

min values : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730, 0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687, 0.51857087, 0.41005131, 0.45956529, 0.47497867, ...

max values : 87.85833, 86.73710, 88.92577, 76.44161, 82.43909, 85.03188, 77.36673, 75.86527, 94.32226, 101.58247, 86.67574, 90.96802, 99.59785, 76.78394, 88.31423, ...

Now, to obtain the annual averages across the masked grid cells, so that I may plot a time series of the average precipitation across grid cells for Year 1, then for Year 2, then Year 3....all the way to Year 138 (essentially 138 averages):

Averageprec <- colMeans(overlap.sub)

However, this yields this error:

Error in colMeans(overlap.sub) :

'x' must be an array of at least two dimensions

I do believe that colMeans is the appropriate function for this, but why am I receiving this error? Unless there is another way to do this? Also, this would only average those grid cells that I masked (i.e. only the land areas), correct? Thanks, and any help/feedback would be greatly appreciated!!

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: Isolating only land areas on a global map for computing averages

Hi Tom and others,

I was able to use rasterToPolygons(RCP1pctCO2Mean) to convert my raster into polygons, which is now an object called "trans". However, does this command retain the original 138 layers/years of data, but just now in polygon form?

Also, I am not too clear as to how to generate id values for polygons as row and column indices from this new object. I was looking online for a procedure, but nothing too specific shows how to do this directly. Would it be possible to provide an example in this context?

Finally, I am unsure of what was meant by "get land polygon SpatialPolygons object "landPoly" with polygons of land only, not ocean." Do you mean the SpatialPolygonsDataframe that was created from rasterToPolygons?

Thank you, and I appreciate your help!

-----Original Message-----

From: Tom Philippi <[hidden email]>

To: rain1290 <[hidden email]>; tephilippi <[hidden email]>

Cc: r-sig-geo <[hidden email]>

Sent: Fri, Nov 8, 2019 11:04 pm

Subject: Re: [R-sig-Geo] Isolating only land areas on a global map for computing averages

Rain--Yes, it is possible to do it with your extant raster stack. In fact, pretty much all reasonable approaches will do that. Anything you do will create a raster layer with values at every one of your 8192 raster cells: some values will be 0 or NA.

The trivial answer if you only had a small range of latitude, and small raster cell sizes relative to your polygon, would be the raster::rasterize() function to generate a raster mask.

But, with a global raster from +90 to -90, any interpretable answer averaging over area needs to take into account the different area of a grid cell at 70 deg latitude vs 0 deg. See: https://gis.stackexchange.com/questions/29734/how-to-calculate-area-of-1-x-1-degree-cells-in-a-rasterAt that point, you should go ahead and account for fractions of grid cells over land so your averaging would be over land area.

I'm reticent to give you a complete code solution because without a name, you may well be a student with this as an assignment. But, my approach would be:

create a SpatialPolygonsDataFrame object "gridPoly" from any of your raster layers via raster::rasterToPolygons()generate an id value for each of those polygons as row & column indices from the raster, or as the cell number.get land polygon SpatialPolygons object "landPoly" with polygons of land only, not ocean.create a new SpatialPolygons object from gIntersect::gridPoly, landPoly, byid = c(TRUE, FALSE))calculate an area of each polygon in that object via geosphere::areaPolygon() {rgeos::gArea() only works for projected CRS}create your mask/weights raster layer with either all NA or all 0.either: parse the id values to row & column values. use the triplets of row, column, and area to replace the corresponding NA or 0 values in that mask

or: if you used cell numbers, just use the cell numbers and area values in replacement in that mask

create a second weight raster via raster::area() on one of your raster layers.

raster multiply your polygon-area and your raster::area values to give the actual weighs to use.

This still is an approximation, but likely +/- 1-2%.

If this is still complete gibberish to you, either I need more coffee or you need to consult a good reference on working with spatial data in general.

On Thu, Nov 7, 2019 at 8:25 AM <[hidden email]> wrote:

Hi Tom and others,

Thank you for your response and suggestions!

Yes, I loaded and used the maptools package previously to create continents on my world map, among other things. I do think that the easiest approach would be to create a raster layer for land, and then water, with the values that I have. However, my precipitation values are globally distributed - every grid cell has a precipitation value for each year (i.e. each grid cell has 138 values/layers/years). So, if I were to create a raster of only land areas, how would I have my grid cells coincide with the land areas only on that raster?

Also, would it be possible to accomplish this with the raster stack that I already have? If so, is there a way to separate all land/water areas this way using the maptools package?

Thanks, again, and I really appreciate your help!

-----Original Message-----

From: Tom Philippi <[hidden email]>

To: rain1290 <[hidden email]>

Cc: r-sig-geo <[hidden email]>

Sent: Thu, Nov 7, 2019 12:44 am

Subject: Re: [R-sig-Geo] Isolating only land areas on a global map for computing averages

The easiest approach would be to create a separate aligned raster layer for land vs water. There are plenty of coastline polygons available out there (e.g., maptools, rworldmap, rnaturalearth packages): choose one in your raster CRS (or choose one and spTransform() it). Then, use a grid version of your raster to extract values from that land/water SpatialPolygons object.

1: Your idea of extracting the land/water value at each grid cell centroid gives one estimate. Instead of TRUE/FALSE, think of the numeric equivalents 1,0, then using those as weights for averaging across your grid cells.2: A "better" estimate would be to compute the fraction of each grid cell that is land, then use those fractional [0, 1] values as weights to compute a weighted average of precipitation over land. At 2.8deg grid cells, a lot of heavy rainfall coastal areas would have the grid cell centroid offshore and be omitted by approach #1.3: I recommend that you think hard about averaging across cells in Lat Lon to estimate average precipitation over land. The actual area of a ~2.8 by 2.8 deg grid cell at the equator is much larger than the same at 70 deg N. I would spend the extra hour computing the actual area (in km^2) of land in each of your 8192 grid cells, then using those in a raster as weights for whatever calculations you do on the raster stack. [Or you can cheat, as the area of a grid cell in degrees is a function of only the latitudes, and your required weights are multiplicative.]

Your mileage may vary...

Tom

On Wed, Nov 6, 2019 at 6:18 PM rain1290--- via R-sig-Geo <[hidden email]> wrote:

Hi there,

I am interested in calculating precipitation medians globally. However, I only want to isolate land areas to compute the median. I already first created a raster stack, called "RCP1pctCO2median", which contains the median values. There are 138 layers, with each layer representing one year. This raster stack has the following attributes:

class : RasterStack

dimensions : 64, 128, 8192, 138 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7, layer.8, layer.9, layer.10, layer.11, layer.12, layer.13, layer.14, layer.15, ...

min values : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730, 0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687, 0.51857087, 0.41005131, 0.45956529, 0.47497867, ...

max values : 96.30350, 104.08584, 88.92751, 97.49373, 89.57201, 90.58570, 86.67651, 88.33519, 96.94720, 101.58247, 96.07792, 93.21948, 99.59785, 94.26218, 90.62138, ...

Previously, I was isolating a specific region by specifying a range of longitudes and latitudes to obtain the medians for that region, like this:

expansion1<-expand.grid(103:120, 3:15)lonlataaa<-extract(RCP1pctCO2Median, expansion1)Columnaaa<-colMeans(lonlataaa)

However, with this approach, too much water can mix with land areas, and if I narrow the latitude/longitude range on land, I might miss too much land to compute the median meaningfully.

Therefore, with this data, would it be possible to use an IF/ELSE statement to tell R that if the "center point" of each grid cell happens to fall on land, then it would be considered as land (i.e. that would be TRUE - if not, then FALSE)? Even if a grid cell happens to have water mixed with land, but the center point of the grid is on land, that would be considered land. But can R even tell what is land or water in this case?

Thank you, and I would greatly appreciate any assistance!

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I was able to use rasterToPolygons(RCP1pctCO2Mean) to convert my raster into polygons, which is now an object called "trans". However, does this command retain the original 138 layers/years of data, but just now in polygon form?

Also, I am not too clear as to how to generate id values for polygons as row and column indices from this new object. I was looking online for a procedure, but nothing too specific shows how to do this directly. Would it be possible to provide an example in this context?

Finally, I am unsure of what was meant by "get land polygon SpatialPolygons object "landPoly" with polygons of land only, not ocean." Do you mean the SpatialPolygonsDataframe that was created from rasterToPolygons?

Thank you, and I appreciate your help!

-----Original Message-----

From: Tom Philippi <[hidden email]>

To: rain1290 <[hidden email]>; tephilippi <[hidden email]>

Cc: r-sig-geo <[hidden email]>

Sent: Fri, Nov 8, 2019 11:04 pm

Subject: Re: [R-sig-Geo] Isolating only land areas on a global map for computing averages

Rain--Yes, it is possible to do it with your extant raster stack. In fact, pretty much all reasonable approaches will do that. Anything you do will create a raster layer with values at every one of your 8192 raster cells: some values will be 0 or NA.

The trivial answer if you only had a small range of latitude, and small raster cell sizes relative to your polygon, would be the raster::rasterize() function to generate a raster mask.

But, with a global raster from +90 to -90, any interpretable answer averaging over area needs to take into account the different area of a grid cell at 70 deg latitude vs 0 deg. See: https://gis.stackexchange.com/questions/29734/how-to-calculate-area-of-1-x-1-degree-cells-in-a-rasterAt that point, you should go ahead and account for fractions of grid cells over land so your averaging would be over land area.

I'm reticent to give you a complete code solution because without a name, you may well be a student with this as an assignment. But, my approach would be:

create a SpatialPolygonsDataFrame object "gridPoly" from any of your raster layers via raster::rasterToPolygons()generate an id value for each of those polygons as row & column indices from the raster, or as the cell number.get land polygon SpatialPolygons object "landPoly" with polygons of land only, not ocean.create a new SpatialPolygons object from gIntersect::gridPoly, landPoly, byid = c(TRUE, FALSE))calculate an area of each polygon in that object via geosphere::areaPolygon() {rgeos::gArea() only works for projected CRS}create your mask/weights raster layer with either all NA or all 0.either: parse the id values to row & column values. use the triplets of row, column, and area to replace the corresponding NA or 0 values in that mask

or: if you used cell numbers, just use the cell numbers and area values in replacement in that mask

create a second weight raster via raster::area() on one of your raster layers.

raster multiply your polygon-area and your raster::area values to give the actual weighs to use.

This still is an approximation, but likely +/- 1-2%.

If this is still complete gibberish to you, either I need more coffee or you need to consult a good reference on working with spatial data in general.

On Thu, Nov 7, 2019 at 8:25 AM <[hidden email]> wrote:

Hi Tom and others,

Thank you for your response and suggestions!

Yes, I loaded and used the maptools package previously to create continents on my world map, among other things. I do think that the easiest approach would be to create a raster layer for land, and then water, with the values that I have. However, my precipitation values are globally distributed - every grid cell has a precipitation value for each year (i.e. each grid cell has 138 values/layers/years). So, if I were to create a raster of only land areas, how would I have my grid cells coincide with the land areas only on that raster?

Also, would it be possible to accomplish this with the raster stack that I already have? If so, is there a way to separate all land/water areas this way using the maptools package?

Thanks, again, and I really appreciate your help!

-----Original Message-----

From: Tom Philippi <[hidden email]>

To: rain1290 <[hidden email]>

Cc: r-sig-geo <[hidden email]>

Sent: Thu, Nov 7, 2019 12:44 am

Subject: Re: [R-sig-Geo] Isolating only land areas on a global map for computing averages

The easiest approach would be to create a separate aligned raster layer for land vs water. There are plenty of coastline polygons available out there (e.g., maptools, rworldmap, rnaturalearth packages): choose one in your raster CRS (or choose one and spTransform() it). Then, use a grid version of your raster to extract values from that land/water SpatialPolygons object.

1: Your idea of extracting the land/water value at each grid cell centroid gives one estimate. Instead of TRUE/FALSE, think of the numeric equivalents 1,0, then using those as weights for averaging across your grid cells.2: A "better" estimate would be to compute the fraction of each grid cell that is land, then use those fractional [0, 1] values as weights to compute a weighted average of precipitation over land. At 2.8deg grid cells, a lot of heavy rainfall coastal areas would have the grid cell centroid offshore and be omitted by approach #1.3: I recommend that you think hard about averaging across cells in Lat Lon to estimate average precipitation over land. The actual area of a ~2.8 by 2.8 deg grid cell at the equator is much larger than the same at 70 deg N. I would spend the extra hour computing the actual area (in km^2) of land in each of your 8192 grid cells, then using those in a raster as weights for whatever calculations you do on the raster stack. [Or you can cheat, as the area of a grid cell in degrees is a function of only the latitudes, and your required weights are multiplicative.]

Your mileage may vary...

Tom

On Wed, Nov 6, 2019 at 6:18 PM rain1290--- via R-sig-Geo <[hidden email]> wrote:

Hi there,

I am interested in calculating precipitation medians globally. However, I only want to isolate land areas to compute the median. I already first created a raster stack, called "RCP1pctCO2median", which contains the median values. There are 138 layers, with each layer representing one year. This raster stack has the following attributes:

class : RasterStack

dimensions : 64, 128, 8192, 138 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7, layer.8, layer.9, layer.10, layer.11, layer.12, layer.13, layer.14, layer.15, ...

min values : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730, 0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687, 0.51857087, 0.41005131, 0.45956529, 0.47497867, ...

max values : 96.30350, 104.08584, 88.92751, 97.49373, 89.57201, 90.58570, 86.67651, 88.33519, 96.94720, 101.58247, 96.07792, 93.21948, 99.59785, 94.26218, 90.62138, ...

Previously, I was isolating a specific region by specifying a range of longitudes and latitudes to obtain the medians for that region, like this:

expansion1<-expand.grid(103:120, 3:15)lonlataaa<-extract(RCP1pctCO2Median, expansion1)Columnaaa<-colMeans(lonlataaa)

However, with this approach, too much water can mix with land areas, and if I narrow the latitude/longitude range on land, I might miss too much land to compute the median meaningfully.

Therefore, with this data, would it be possible to use an IF/ELSE statement to tell R that if the "center point" of each grid cell happens to fall on land, then it would be considered as land (i.e. that would be TRUE - if not, then FALSE)? Even if a grid cell happens to have water mixed with land, but the center point of the grid is on land, that would be considered land. But can R even tell what is land or water in this case?

Thank you, and I would greatly appreciate any assistance!

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Computing the means of gridded data over land only using Raster Stack

Greetings,

I am interested in calculating precipitation averages globally. However, I only want to isolate land and/or oceanic areas to compute the mean of those separately. What I would like to do is somehow isolate those grid cells whose centers overlap with either land or ocean and then compute the annual mean. I already first created a raster stack, called "RCP1pctCO2Mean", which contains the mean values of interest. There are 138 layers, with each layer representing one year. This raster stack has the following attributes:

class : RasterStack

dimensions : 64, 128, 8192, 138 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin,

ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

names : layer.1, layer.2, layer.3, layer.4, layer.5,

layer.6, layer.7, layer.8, layer.9, layer.10, layer.11,

layer.12, layer.13, layer.14, layer.15, ...

min values : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730,

0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687,

0.51857087, 0.41005131, 0.45956529, 0.47497867, ...

max values : 96.30350, 104.08584, 88.92751, 97.49373, 89.57201,

90.58570, 86.67651, 88.33519, 96.94720, 101.58247, 96.07792,

93.21948, 99.59785, 94.26218, 90.62138, ...

Previously, I tried isolating a specific region by specifying a range of longitudes and latitudes to obtain the means and medians for that region, just like this:

>expansion1<-expand.grid(103:120, 3:15) #This is a range of longitudes and then latitudes

>lonlataaa<-extract(RCP1pctCO2Mean, expansion1)

>Columnaaa<-colMeans(lonlataaa)

#Packages loaded library(raster)

library(maps)

library(maptools)

library(rasterVis)

However, with this approach, too much water can mix with land areas, and if I narrow the latitude/longitude range on land, I might miss too much land to compute the mean meaningfully.

Therefore, with this RasterStack, would it be possible to create a condition that tells R that if the "center point" or centroid of each grid cell (with each grid cell center representing a specific latitude/longitude coordinate) happens to fall on land, then it would be considered as land (i.e. that would be TRUE - if not, then FALSE, or maybe somehow use 0s or 1s)? Even if a grid cell happens to have water mixed with land, but the center point/centroid of the grid is on land, that would be considered as land. I would like to do this for specific countries, too.

I want the individual 138 layers/years to be retained, so that all the Year 1s can be averaged across all relevant grid cells, then all Year 2s, then all Year 3s, then all Year 4s, etc. (to create a time series later). I'm not sure if this is the correct way to do this, but what I did first was take the "RCP1pctCO2Mean" RasterStack and created a SpatialPolygonsDataframe using:

>trans <- raster::rasterToPolygon(RCP1pctCO2Mean) >trans

class : SpatialPolygonsDataFrame

features : 8192

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

variables : 138

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7, layer.8, layer.9, layer.10, layer.11, layer.12, layer.13, layer.14, layer.15, ...

min values : 0.429645141288708, 0.433756527757047, 0.517493712584313, 0.508389826053265, 0.453667300300907, 0.530991463885754, 0.4975718597839, 0.456977516231847, 0.413821991744321, 0.460824014230889, 0.45516687008843, 0.518570869929649, 0.410051312472821, 0.459565291388527, 0.474978673081429, ...

max values : 96.3034965348338, 104.085840813594, 88.9275127089197, 97.4937326695693, 89.5720053000712, 90.5857030396531, 86.6765123781949, 88.3351859796546, 96.947199473011, 101.582468961459, 96.0779212204739, 93.2194802269814, 99.5978503841538, 94.2621847475029, 90.6213755054263, ...

Could generating an id value for each of those land (or water) polygons, such that the center of those grid cells (i.e. latitude/longitude coordinates) on land are only counted, be a logical next step to do something like this? Is it possible to somehow just isolate an all-land polygon, and then somehow specify which cells are considered land, and then compute averages for each year across the grid cells? If so, there are a lot of cells to assign a weight individually, so I am not sure if there is a way to do this quickly?

Thank you, and I would greatly appreciate any assistance! I look forward to your response!

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I am interested in calculating precipitation averages globally. However, I only want to isolate land and/or oceanic areas to compute the mean of those separately. What I would like to do is somehow isolate those grid cells whose centers overlap with either land or ocean and then compute the annual mean. I already first created a raster stack, called "RCP1pctCO2Mean", which contains the mean values of interest. There are 138 layers, with each layer representing one year. This raster stack has the following attributes:

class : RasterStack

dimensions : 64, 128, 8192, 138 (nrow, ncol, ncell, nlayers)

resolution : 2.8125, 2.789327 (x, y)

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin,

ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

names : layer.1, layer.2, layer.3, layer.4, layer.5,

layer.6, layer.7, layer.8, layer.9, layer.10, layer.11,

layer.12, layer.13, layer.14, layer.15, ...

min values : 0.42964514, 0.43375653, 0.51749371, 0.50838983, 0.45366730,

0.53099146, 0.49757186, 0.45697752, 0.41382199, 0.46082401, 0.45516687,

0.51857087, 0.41005131, 0.45956529, 0.47497867, ...

max values : 96.30350, 104.08584, 88.92751, 97.49373, 89.57201,

90.58570, 86.67651, 88.33519, 96.94720, 101.58247, 96.07792,

93.21948, 99.59785, 94.26218, 90.62138, ...

Previously, I tried isolating a specific region by specifying a range of longitudes and latitudes to obtain the means and medians for that region, just like this:

>expansion1<-expand.grid(103:120, 3:15) #This is a range of longitudes and then latitudes

>lonlataaa<-extract(RCP1pctCO2Mean, expansion1)

>Columnaaa<-colMeans(lonlataaa)

#Packages loaded library(raster)

library(maps)

library(maptools)

library(rasterVis)

However, with this approach, too much water can mix with land areas, and if I narrow the latitude/longitude range on land, I might miss too much land to compute the mean meaningfully.

Therefore, with this RasterStack, would it be possible to create a condition that tells R that if the "center point" or centroid of each grid cell (with each grid cell center representing a specific latitude/longitude coordinate) happens to fall on land, then it would be considered as land (i.e. that would be TRUE - if not, then FALSE, or maybe somehow use 0s or 1s)? Even if a grid cell happens to have water mixed with land, but the center point/centroid of the grid is on land, that would be considered as land. I would like to do this for specific countries, too.

I want the individual 138 layers/years to be retained, so that all the Year 1s can be averaged across all relevant grid cells, then all Year 2s, then all Year 3s, then all Year 4s, etc. (to create a time series later). I'm not sure if this is the correct way to do this, but what I did first was take the "RCP1pctCO2Mean" RasterStack and created a SpatialPolygonsDataframe using:

>trans <- raster::rasterToPolygon(RCP1pctCO2Mean) >trans

class : SpatialPolygonsDataFrame

features : 8192

extent : -181.4062, 178.5938, -89.25846, 89.25846 (xmin, xmax, ymin, ymax)

coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

variables : 138

names : layer.1, layer.2, layer.3, layer.4, layer.5, layer.6, layer.7, layer.8, layer.9, layer.10, layer.11, layer.12, layer.13, layer.14, layer.15, ...

min values : 0.429645141288708, 0.433756527757047, 0.517493712584313, 0.508389826053265, 0.453667300300907, 0.530991463885754, 0.4975718597839, 0.456977516231847, 0.413821991744321, 0.460824014230889, 0.45516687008843, 0.518570869929649, 0.410051312472821, 0.459565291388527, 0.474978673081429, ...

max values : 96.3034965348338, 104.085840813594, 88.9275127089197, 97.4937326695693, 89.5720053000712, 90.5857030396531, 86.6765123781949, 88.3351859796546, 96.947199473011, 101.582468961459, 96.0779212204739, 93.2194802269814, 99.5978503841538, 94.2621847475029, 90.6213755054263, ...

Could generating an id value for each of those land (or water) polygons, such that the center of those grid cells (i.e. latitude/longitude coordinates) on land are only counted, be a logical next step to do something like this? Is it possible to somehow just isolate an all-land polygon, and then somehow specify which cells are considered land, and then compute averages for each year across the grid cells? If so, there are a lot of cells to assign a weight individually, so I am not sure if there is a way to do this quickly?

Thank you, and I would greatly appreciate any assistance! I look forward to your response!

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: sp/rgdal workflows with PROJ >= 6 and GDAL >= 3

And this link explains the CDN proposal for grid distribution:

https://www.spatialys.com/en/crowdfunding/

Roger

On Wed, 13 Nov 2019, Roger Bivand wrote:

> Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings

> (representations of coordinate reference systems) are handled, steps are

> being taken to find ways to adapt sp/rgdal workflows. A current proposal is

> to store the WKT2_2018 string as a comment to CRS objects as defined in the

> sp package.

>

> A draft development-in-progress version of rgdal is available at

> https://r-forge.r-project.org/R/?group_id=884, and for sp at

> https://github.com/rsbivand/sp (this version of sp requires rgdal >= 1.5-1).

> This adds the WKT comments to CRS objects on reading vector and raster data

> sources, and uses WKT comments if found when writing vector and raster

> objects (or at least does as far as I've checked, possibly fragile).

>

> An RFC with tersely worked cases for using CRS object comments to carry WKT

> strings but maintaining full backward compatibility is online at

> http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.

>

> If you have other ideas or concerns about trying to use this mechanism for sp

> CRS objects, please contribute at your earliest convenience.

>

> http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the

> beginning of the next step, to query transformation operations to find viable

> coordinate operation pipelines.

>

> I'm assuming that the previous behaviour (transform without considering

> accuracy with whatever is to hand) is not viable going forward, and that we

> will need two steps: list coordinate operations between source and target CRS

> (using the WKT comments as better specifications than the PROJ strings),

> possibly intervene manually to install missing grids, then undertake the

> coordinate operation.

>

> The fallback may be simply to choose the least inaccurate available

> coordinate operation, but this should be a fallback. This means that all uses

> of spTransform() will require intervention.

>

> Is this OK (it is tiresome but modernises workflows once), or is it not OK

> (no user intervention is crucial)?

>

> These behaviours may be set in an option, so that package maintainers and

> users may delay modernisation, but all are undoubtedly served by rapid

> adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS

> development versions all state that they list candidate coordinate

> operations).

>

> We cannot ship all the grids, they are very bulky, and probably nobody needs

> sub-metre accuracy world-wide. Work in PROJ is starting to create a content

> delivery network for trusted download and mechanisms for registering

> downloaded grids on user platforms. We would for example not want Windows

> users of rgdal and sf to have to download the same grid twice.

>

> Comments welcome here and at https://github.com/r-spatial/discuss/issues/28

> or https://github.com/r-spatial/sf/issues/1187

>

> Roger

>

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

https://www.spatialys.com/en/crowdfunding/

Roger

On Wed, 13 Nov 2019, Roger Bivand wrote:

> Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings

> (representations of coordinate reference systems) are handled, steps are

> being taken to find ways to adapt sp/rgdal workflows. A current proposal is

> to store the WKT2_2018 string as a comment to CRS objects as defined in the

> sp package.

>

> A draft development-in-progress version of rgdal is available at

> https://r-forge.r-project.org/R/?group_id=884, and for sp at

> https://github.com/rsbivand/sp (this version of sp requires rgdal >= 1.5-1).

> This adds the WKT comments to CRS objects on reading vector and raster data

> sources, and uses WKT comments if found when writing vector and raster

> objects (or at least does as far as I've checked, possibly fragile).

>

> An RFC with tersely worked cases for using CRS object comments to carry WKT

> strings but maintaining full backward compatibility is online at

> http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.

>

> If you have other ideas or concerns about trying to use this mechanism for sp

> CRS objects, please contribute at your earliest convenience.

>

> http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the

> beginning of the next step, to query transformation operations to find viable

> coordinate operation pipelines.

>

> I'm assuming that the previous behaviour (transform without considering

> accuracy with whatever is to hand) is not viable going forward, and that we

> will need two steps: list coordinate operations between source and target CRS

> (using the WKT comments as better specifications than the PROJ strings),

> possibly intervene manually to install missing grids, then undertake the

> coordinate operation.

>

> The fallback may be simply to choose the least inaccurate available

> coordinate operation, but this should be a fallback. This means that all uses

> of spTransform() will require intervention.

>

> Is this OK (it is tiresome but modernises workflows once), or is it not OK

> (no user intervention is crucial)?

>

> These behaviours may be set in an option, so that package maintainers and

> users may delay modernisation, but all are undoubtedly served by rapid

> adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS

> development versions all state that they list candidate coordinate

> operations).

>

> We cannot ship all the grids, they are very bulky, and probably nobody needs

> sub-metre accuracy world-wide. Work in PROJ is starting to create a content

> delivery network for trusted download and mechanisms for registering

> downloaded grids on user platforms. We would for example not want Windows

> users of rgdal and sf to have to download the same grid twice.

>

> Comments welcome here and at https://github.com/r-spatial/discuss/issues/28

> or https://github.com/r-spatial/sf/issues/1187

>

> Roger

>

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### sp/rgdal workflows with PROJ >= 6 and GDAL >= 3

Because PROJ >= 6 and GDAL >= 3 change the way that PROJ strings

(representations of coordinate reference systems) are handled, steps are

being taken to find ways to adapt sp/rgdal workflows. A current proposal

is to store the WKT2_2018 string as a comment to CRS objects as defined in

the sp package.

A draft development-in-progress version of rgdal is available at

https://r-forge.r-project.org/R/?group_id=884, and for sp at

https://github.com/rsbivand/sp (this version of sp requires rgdal >=

1.5-1). This adds the WKT comments to CRS objects on reading vector and

raster data sources, and uses WKT comments if found when writing vector

and raster objects (or at least does as far as I've checked, possibly

fragile).

An RFC with tersely worked cases for using CRS object comments to carry

WKT strings but maintaining full backward compatibility is online at

http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.

If you have other ideas or concerns about trying to use this mechanism for

sp CRS objects, please contribute at your earliest convenience.

http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the

beginning of the next step, to query transformation operations to find

viable coordinate operation pipelines.

I'm assuming that the previous behaviour (transform without considering

accuracy with whatever is to hand) is not viable going forward, and that

we will need two steps: list coordinate operations between source and

target CRS (using the WKT comments as better specifications than the PROJ

strings), possibly intervene manually to install missing grids, then

undertake the coordinate operation.

The fallback may be simply to choose the least inaccurate available

coordinate operation, but this should be a fallback. This means that all

uses of spTransform() will require intervention.

Is this OK (it is tiresome but modernises workflows once), or is it not OK

(no user intervention is crucial)?

These behaviours may be set in an option, so that package maintainers and

users may delay modernisation, but all are undoubtedly served by rapid

adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS

development versions all state that they list candidate coordinate

operations).

We cannot ship all the grids, they are very bulky, and probably nobody

needs sub-metre accuracy world-wide. Work in PROJ is starting to create a

content delivery network for trusted download and mechanisms for

registering downloaded grids on user platforms. We would for example not

want Windows users of rgdal and sf to have to download the same grid

twice.

Comments welcome here and at

https://github.com/r-spatial/discuss/issues/28 or

https://github.com/r-spatial/sf/issues/1187

Roger

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

(representations of coordinate reference systems) are handled, steps are

being taken to find ways to adapt sp/rgdal workflows. A current proposal

is to store the WKT2_2018 string as a comment to CRS objects as defined in

the sp package.

A draft development-in-progress version of rgdal is available at

https://r-forge.r-project.org/R/?group_id=884, and for sp at

https://github.com/rsbivand/sp (this version of sp requires rgdal >=

1.5-1). This adds the WKT comments to CRS objects on reading vector and

raster data sources, and uses WKT comments if found when writing vector

and raster objects (or at least does as far as I've checked, possibly

fragile).

An RFC with tersely worked cases for using CRS object comments to carry

WKT strings but maintaining full backward compatibility is online at

http://rgdal.r-forge.r-project.org/articles/PROJ6_GDAL3.html.

If you have other ideas or concerns about trying to use this mechanism for

sp CRS objects, please contribute at your earliest convenience.

http://rgdal.r-forge.r-project.org/reference/list_coordOps.html shows the

beginning of the next step, to query transformation operations to find

viable coordinate operation pipelines.

I'm assuming that the previous behaviour (transform without considering

accuracy with whatever is to hand) is not viable going forward, and that

we will need two steps: list coordinate operations between source and

target CRS (using the WKT comments as better specifications than the PROJ

strings), possibly intervene manually to install missing grids, then

undertake the coordinate operation.

The fallback may be simply to choose the least inaccurate available

coordinate operation, but this should be a fallback. This means that all

uses of spTransform() will require intervention.

Is this OK (it is tiresome but modernises workflows once), or is it not OK

(no user intervention is crucial)?

These behaviours may be set in an option, so that package maintainers and

users may delay modernisation, but all are undoubtedly served by rapid

adaptation (GRASS 7.8.1 released yesterday, libspatialite, pyproj, QGIS

development versions all state that they list candidate coordinate

operations).

We cannot ship all the grids, they are very bulky, and probably nobody

needs sub-metre accuracy world-wide. Work in PROJ is starting to create a

content delivery network for trusted download and mechanisms for

registering downloaded grids on user platforms. We would for example not

want Windows users of rgdal and sf to have to download the same grid

twice.

Comments welcome here and at

https://github.com/r-spatial/discuss/issues/28 or

https://github.com/r-spatial/sf/issues/1187

Roger

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Re: Error in impacts() after lagsarlm (spdep package)

On Wed, 13 Nov 2019, Denys Dukhovnov via R-sig-Geo wrote:

> I posted this question on StackOverflow a while back but received no answer.

I maintain the spatialreg package to which I think you are referring. I

never visit SO, both to save time and because the signal/noise ratio there

is not encouraging (with exceptions where reproducible examples are used).

While your post was pain text (thanks), what you copied from SO got rather

mangled. The spdep package does warn you that the functions you are using

are deprecated and are actually to be called from spatialreg.

>

> I have 9,150 polygons in my data set. I was trying to run a spatial

> autoregressive model (SAR) in spdep to test spatial dependence of my

> outcome variable. After running the model, I wanted to examine the

> direct/indirect impacts, but encountered an error that seems to have

> something to do with the length of neighbors in the weights matrix not

> being equal to n. I tried running the very same equation as SLX model

> (Spatial Lag X), and impacts() worked fine, even though there were some

> polygons in my set that had no neighbors.

>

>> # Defining queen contiguity neighbors for polyset and storing the

>> matrix as list

>> q.nbrs <- poly2nb(polyset)

>> listweights <- nb2listw(q.nbrs, zero.policy = TRUE)

>

>> # Defining the model

>> model.equation <- TIME ~ A + B + C

>

>> # Run SAR model reg <- lagsarlm(model.equation, data = polyset, listw =

>> listweights, zero.policy = TRUE) You are completely unnecessarily using the "eigen" method. If your weights

are symmetric, use Cholesky decomposition ("Matrix"), much faster, same

output.

>

>> # Run impacts() to show direct/indirect impacts

>> impacts(reg, listw = listweights, zero.policy = TRUE)

>

>> Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, :

> length(listweights$neighbours) == n is not TRUE

>

Never, ever, do this. Did you read LeSage & Pace? Use traces, not the

weights themselves. With the listw object, you need to invert an nxn

matrix once in this case, 1+R times if you run Monte Carlo simulations.

If you provide a reproducible example using built-in data, I can try to

provide a more informative error message.

> What am I doing wrong? I am running Windows 10 machine with R 3.5.3 with

> the most up-to-date spdep package, if it helps.

>

R is at 3.6.1.

Hope this clarifies,

Roger

> Thank you very much.

>

> Regards,

> Denys Dukhovnov

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

> I posted this question on StackOverflow a while back but received no answer.

I maintain the spatialreg package to which I think you are referring. I

never visit SO, both to save time and because the signal/noise ratio there

is not encouraging (with exceptions where reproducible examples are used).

While your post was pain text (thanks), what you copied from SO got rather

mangled. The spdep package does warn you that the functions you are using

are deprecated and are actually to be called from spatialreg.

>

> I have 9,150 polygons in my data set. I was trying to run a spatial

> autoregressive model (SAR) in spdep to test spatial dependence of my

> outcome variable. After running the model, I wanted to examine the

> direct/indirect impacts, but encountered an error that seems to have

> something to do with the length of neighbors in the weights matrix not

> being equal to n. I tried running the very same equation as SLX model

> (Spatial Lag X), and impacts() worked fine, even though there were some

> polygons in my set that had no neighbors.

>

>> # Defining queen contiguity neighbors for polyset and storing the

>> matrix as list

>> q.nbrs <- poly2nb(polyset)

>> listweights <- nb2listw(q.nbrs, zero.policy = TRUE)

>

>> # Defining the model

>> model.equation <- TIME ~ A + B + C

>

>> # Run SAR model reg <- lagsarlm(model.equation, data = polyset, listw =

>> listweights, zero.policy = TRUE) You are completely unnecessarily using the "eigen" method. If your weights

are symmetric, use Cholesky decomposition ("Matrix"), much faster, same

output.

>

>> # Run impacts() to show direct/indirect impacts

>> impacts(reg, listw = listweights, zero.policy = TRUE)

>

>> Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, :

> length(listweights$neighbours) == n is not TRUE

>

Never, ever, do this. Did you read LeSage & Pace? Use traces, not the

weights themselves. With the listw object, you need to invert an nxn

matrix once in this case, 1+R times if you run Monte Carlo simulations.

If you provide a reproducible example using built-in data, I can try to

provide a more informative error message.

> What am I doing wrong? I am running Windows 10 machine with R 3.5.3 with

> the most up-to-date spdep package, if it helps.

>

R is at 3.6.1.

Hope this clarifies,

Roger

> Thank you very much.

>

> Regards,

> Denys Dukhovnov

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> --

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

### Error in impacts() after lagsarlm (spdep package)

I posted this question on StackOverflow a while back but received no answer.

I have 9,150 polygons in my data set. I was trying to run a spatial autoregressive model (SAR) in spdep to test spatial dependence of my outcome variable. After running the model, I wanted to examine the direct/indirect impacts, but encountered an error that seems to have something to do with the length of neighbors in the weights matrix not being equal to n. I tried running the very same equation as SLX model (Spatial Lag X), and impacts() worked fine, even though there were some polygons in my set that had no neighbors.

> # Defining queen contiguity neighbors for polyset and storing the matrix as list

> q.nbrs <- poly2nb(polyset)

> listweights <- nb2listw(q.nbrs, zero.policy = TRUE)

> # Defining the model

> model.equation <- TIME ~ A + B + C

> # Run SAR model

> reg <- lagsarlm(model.equation, data = polyset, listw = listweights, zero.policy = TRUE)

> # Run impacts() to show direct/indirect impacts

> impacts(reg, listw = listweights, zero.policy = TRUE)

> Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, :

length(listweights$neighbours) == n is not TRUE

What am I doing wrong? I am running Windows 10 machine with R 3.5.3 with the most up-to-date spdep package, if it helps.

Thank you very much.

Regards,

Denys Dukhovnov

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

I have 9,150 polygons in my data set. I was trying to run a spatial autoregressive model (SAR) in spdep to test spatial dependence of my outcome variable. After running the model, I wanted to examine the direct/indirect impacts, but encountered an error that seems to have something to do with the length of neighbors in the weights matrix not being equal to n. I tried running the very same equation as SLX model (Spatial Lag X), and impacts() worked fine, even though there were some polygons in my set that had no neighbors.

> # Defining queen contiguity neighbors for polyset and storing the matrix as list

> q.nbrs <- poly2nb(polyset)

> listweights <- nb2listw(q.nbrs, zero.policy = TRUE)

> # Defining the model

> model.equation <- TIME ~ A + B + C

> # Run SAR model

> reg <- lagsarlm(model.equation, data = polyset, listw = listweights, zero.policy = TRUE)

> # Run impacts() to show direct/indirect impacts

> impacts(reg, listw = listweights, zero.policy = TRUE)

> Error in intImpacts(rho = rho, beta = beta, P = P, n = n, mu = mu, Sigma = Sigma, :

length(listweights$neighbours) == n is not TRUE

What am I doing wrong? I am running Windows 10 machine with R 3.5.3 with the most up-to-date spdep package, if it helps.

Thank you very much.

Regards,

Denys Dukhovnov

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: A surface of GWR predicted values

Dear Binbin,

Yes, it helped a lot. Thanks for the clarification.

Best,

Fred.

Sent from Mail for Windows 10

From: [hidden email]

Sent: Tuesday, November 12, 2019 4:19 PM

To: [hidden email]; fred ramos

Cc: [hidden email]

Subject: Re: Re: [R-sig-Geo] A surface of GWR predicted values

Dear Fred,

It seems that you were using the gwr.predict from the GWmodel package.

Note that the condition of outputing predictions at specific locations is that observations of the corresponding exploratory variables are available.

The predictions are output when you use the following routine:

gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

because you were predicting the observations.

When you specified a point grid, NA were returned. I assumed that you didn't have any observed exploratory variables at them, and that's why. Note that GWR is not an interpolation technique.

Hope it helps.

Cheers,

Binbin

> -----原始邮件-----

> 发件人: "Roger Bivand" <[hidden email]>

> 发送时间: 2019-11-12 20:59:17 (星期二)

> 收件人: "Fred Ramos" <[hidden email]>

> 抄送: "[hidden email]" <[hidden email]>

> 主题: Re: [R-sig-Geo] A surface of GWR predicted values

>

> Please do not post HTML, only plain text.

>

> You do need to say which package gwr.predict() comes from. Better, provide

> a reproducible example with a built-in data set showing your problem.

>

> Roger

>

> On Tue, 12 Nov 2019, Fred Ramos wrote:

>

> > Dear all,

> >

> > I`m trying to build a surface with predicted values using gwr.predict.

> >

> > When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?

> >

> >

> >> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

> >

> >> gwr_test$SDF

> > class : SpatialPointsDataFrame

> > features : 423

> > extent : 292782, 414055.3, 7349266, 7424404 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -0.958964210431336, -0.000128871984509238, 0.287006700717343, -3.26862056578432, 0.48618073025782

> > max values : 4.17712723602899, -1.8499577687439e-06, 0.979959974621219, 5.64311734285255, 0.693583031013022

> >

> >> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)

> >

> >

> >> gwr_out_grid_test$SDF

> > class : SpatialPointsDataFrame

> > features : 9075

> > extent : 293282, 413282, 7349904, 7423904 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -1.18701861474003, -0.000127242449368378, 0.310710138723114, NA, NA

> > max values : 4.04479455484814, 2.02812418949068e-06, 0.995539141570355, NA, NA

> >

> > Many thanks,

> > Fred.

> >

> >

> >

> >

> > Sent from Mail for Windows 10

> >

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> >

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> https://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Yes, it helped a lot. Thanks for the clarification.

Best,

Fred.

Sent from Mail for Windows 10

From: [hidden email]

Sent: Tuesday, November 12, 2019 4:19 PM

To: [hidden email]; fred ramos

Cc: [hidden email]

Subject: Re: Re: [R-sig-Geo] A surface of GWR predicted values

Dear Fred,

It seems that you were using the gwr.predict from the GWmodel package.

Note that the condition of outputing predictions at specific locations is that observations of the corresponding exploratory variables are available.

The predictions are output when you use the following routine:

gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

because you were predicting the observations.

When you specified a point grid, NA were returned. I assumed that you didn't have any observed exploratory variables at them, and that's why. Note that GWR is not an interpolation technique.

Hope it helps.

Cheers,

Binbin

> -----原始邮件-----

> 发件人: "Roger Bivand" <[hidden email]>

> 发送时间: 2019-11-12 20:59:17 (星期二)

> 收件人: "Fred Ramos" <[hidden email]>

> 抄送: "[hidden email]" <[hidden email]>

> 主题: Re: [R-sig-Geo] A surface of GWR predicted values

>

> Please do not post HTML, only plain text.

>

> You do need to say which package gwr.predict() comes from. Better, provide

> a reproducible example with a built-in data set showing your problem.

>

> Roger

>

> On Tue, 12 Nov 2019, Fred Ramos wrote:

>

> > Dear all,

> >

> > I`m trying to build a surface with predicted values using gwr.predict.

> >

> > When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?

> >

> >

> >> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

> >

> >> gwr_test$SDF

> > class : SpatialPointsDataFrame

> > features : 423

> > extent : 292782, 414055.3, 7349266, 7424404 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -0.958964210431336, -0.000128871984509238, 0.287006700717343, -3.26862056578432, 0.48618073025782

> > max values : 4.17712723602899, -1.8499577687439e-06, 0.979959974621219, 5.64311734285255, 0.693583031013022

> >

> >> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)

> >

> >

> >> gwr_out_grid_test$SDF

> > class : SpatialPointsDataFrame

> > features : 9075

> > extent : 293282, 413282, 7349904, 7423904 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -1.18701861474003, -0.000127242449368378, 0.310710138723114, NA, NA

> > max values : 4.04479455484814, 2.02812418949068e-06, 0.995539141570355, NA, NA

> >

> > Many thanks,

> > Fred.

> >

> >

> >

> >

> > Sent from Mail for Windows 10

> >

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> >

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> https://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: A surface of GWR predicted values

Dear Fred,

It seems that you were using the gwr.predict from the GWmodel package.

Note that the condition of outputing predictions at specific locations is that observations of the corresponding exploratory variables are available.

The predictions are output when you use the following routine:

gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

because you were predicting the observations.

When you specified a point grid, NA were returned. I assumed that you didn't have any observed exploratory variables at them, and that's why. Note that GWR is not an interpolation technique.

Hope it helps.

Cheers,

Binbin

> -----原始邮件-----

> 发件人: "Roger Bivand" <[hidden email]>

> 发送时间: 2019-11-12 20:59:17 (星期二)

> 收件人: "Fred Ramos" <[hidden email]>

> 抄送: "[hidden email]" <[hidden email]>

> 主题: Re: [R-sig-Geo] A surface of GWR predicted values

>

> Please do not post HTML, only plain text.

>

> You do need to say which package gwr.predict() comes from. Better, provide

> a reproducible example with a built-in data set showing your problem.

>

> Roger

>

> On Tue, 12 Nov 2019, Fred Ramos wrote:

>

> > Dear all,

> >

> > I`m trying to build a surface with predicted values using gwr.predict.

> >

> > When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?

> >

> >

> >> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

> >

> >> gwr_test$SDF

> > class : SpatialPointsDataFrame

> > features : 423

> > extent : 292782, 414055.3, 7349266, 7424404 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -0.958964210431336, -0.000128871984509238, 0.287006700717343, -3.26862056578432, 0.48618073025782

> > max values : 4.17712723602899, -1.8499577687439e-06, 0.979959974621219, 5.64311734285255, 0.693583031013022

> >

> >> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)

> >

> >

> >> gwr_out_grid_test$SDF

> > class : SpatialPointsDataFrame

> > features : 9075

> > extent : 293282, 413282, 7349904, 7423904 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -1.18701861474003, -0.000127242449368378, 0.310710138723114, NA, NA

> > max values : 4.04479455484814, 2.02812418949068e-06, 0.995539141570355, NA, NA

> >

> > Many thanks,

> > Fred.

> >

> >

> >

> >

> > Sent from Mail for Windows 10

> >

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> >

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> https://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

It seems that you were using the gwr.predict from the GWmodel package.

Note that the condition of outputing predictions at specific locations is that observations of the corresponding exploratory variables are available.

The predictions are output when you use the following routine:

gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

because you were predicting the observations.

When you specified a point grid, NA were returned. I assumed that you didn't have any observed exploratory variables at them, and that's why. Note that GWR is not an interpolation technique.

Hope it helps.

Cheers,

Binbin

> -----原始邮件-----

> 发件人: "Roger Bivand" <[hidden email]>

> 发送时间: 2019-11-12 20:59:17 (星期二)

> 收件人: "Fred Ramos" <[hidden email]>

> 抄送: "[hidden email]" <[hidden email]>

> 主题: Re: [R-sig-Geo] A surface of GWR predicted values

>

> Please do not post HTML, only plain text.

>

> You do need to say which package gwr.predict() comes from. Better, provide

> a reproducible example with a built-in data set showing your problem.

>

> Roger

>

> On Tue, 12 Nov 2019, Fred Ramos wrote:

>

> > Dear all,

> >

> > I`m trying to build a surface with predicted values using gwr.predict.

> >

> > When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?

> >

> >

> >> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

> >

> >> gwr_test$SDF

> > class : SpatialPointsDataFrame

> > features : 423

> > extent : 292782, 414055.3, 7349266, 7424404 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -0.958964210431336, -0.000128871984509238, 0.287006700717343, -3.26862056578432, 0.48618073025782

> > max values : 4.17712723602899, -1.8499577687439e-06, 0.979959974621219, 5.64311734285255, 0.693583031013022

> >

> >> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)

> >

> >

> >> gwr_out_grid_test$SDF

> > class : SpatialPointsDataFrame

> > features : 9075

> > extent : 293282, 413282, 7349904, 7423904 (xmin, xmax, ymin, ymax)

> > crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> > variables : 5

> > names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> > min values : -1.18701861474003, -0.000127242449368378, 0.310710138723114, NA, NA

> > max values : 4.04479455484814, 2.02812418949068e-06, 0.995539141570355, NA, NA

> >

> > Many thanks,

> > Fred.

> >

> >

> >

> >

> > Sent from Mail for Windows 10

> >

> >

> > [[alternative HTML version deleted]]

> >

> > _______________________________________________

> > R-sig-Geo mailing list

> > [hidden email]

> > https://stat.ethz.ch/mailman/listinfo/r-sig-geo

> >

>

> --

> Roger Bivand

> Department of Economics, Norwegian School of Economics,

> Helleveien 30, N-5045 Bergen, Norway.

> voice: +47 55 95 93 55; e-mail: [hidden email]

> https://orcid.org/0000-0003-2392-6140

> https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

[[alternative HTML version deleted]]

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

### Re: A surface of GWR predicted values

Please do not post HTML, only plain text.

You do need to say which package gwr.predict() comes from. Better, provide

a reproducible example with a built-in data set showing your problem.

Roger

On Tue, 12 Nov 2019, Fred Ramos wrote:

> Dear all,

>

> I`m trying to build a surface with predicted values using gwr.predict.

>

> When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?

>

>

>> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

>

>> gwr_test$SDF

> class : SpatialPointsDataFrame

> features : 423

> extent : 292782, 414055.3, 7349266, 7424404 (xmin, xmax, ymin, ymax)

> crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> variables : 5

> names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> min values : -0.958964210431336, -0.000128871984509238, 0.287006700717343, -3.26862056578432, 0.48618073025782

> max values : 4.17712723602899, -1.8499577687439e-06, 0.979959974621219, 5.64311734285255, 0.693583031013022

>

>> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)

>

>

>> gwr_out_grid_test$SDF

> class : SpatialPointsDataFrame

> features : 9075

> extent : 293282, 413282, 7349904, 7423904 (xmin, xmax, ymin, ymax)

> crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> variables : 5

> names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> min values : -1.18701861474003, -0.000127242449368378, 0.310710138723114, NA, NA

> max values : 4.04479455484814, 2.02812418949068e-06, 0.995539141570355, NA, NA

>

> Many thanks,

> Fred.

>

>

>

>

> Sent from Mail for Windows 10

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway

You do need to say which package gwr.predict() comes from. Better, provide

a reproducible example with a built-in data set showing your problem.

Roger

On Tue, 12 Nov 2019, Fred Ramos wrote:

> Dear all,

>

> I`m trying to build a surface with predicted values using gwr.predict.

>

> When I run the gwr.predict without giving the fitting point it runs without problems. But when I enter with SpatialPointsDataFrame (a point grid) as fitting points the prediction values returns NA. Is there anything that I could do to get these results in the fitting points?

>

>

>> gwr_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,kernel="gaussian",bw=21720)

>

>> gwr_test$SDF

> class : SpatialPointsDataFrame

> features : 423

> extent : 292782, 414055.3, 7349266, 7424404 (xmin, xmax, ymin, ymax)

> crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> variables : 5

> names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> min values : -0.958964210431336, -0.000128871984509238, 0.287006700717343, -3.26862056578432, 0.48618073025782

> max values : 4.17712723602899, -1.8499577687439e-06, 0.979959974621219, 5.64311734285255, 0.693583031013022

>

>> gwr_out_grid_test <- gwr.predict(job_density_log~distances+pop_density_log,data=zonas_OD_P_sp,predictdata=grade_g_DF,kernel="gaussian",bw=21720)

>

>

>> gwr_out_grid_test$SDF

> class : SpatialPointsDataFrame

> features : 9075

> extent : 293282, 413282, 7349904, 7423904 (xmin, xmax, ymin, ymax)

> crs : +proj=utm +zone=23 +south +ellps=intl +units=m +no_defs

> variables : 5

> names : Intercept_coef, distances_coef, pop_density_log_coef, prediction, prediction_var

> min values : -1.18701861474003, -0.000127242449368378, 0.310710138723114, NA, NA

> max values : 4.04479455484814, 2.02812418949068e-06, 0.995539141570355, NA, NA

>

> Many thanks,

> Fred.

>

>

>

>

> Sent from Mail for Windows 10

>

>

> [[alternative HTML version deleted]]

>

> _______________________________________________

> R-sig-Geo mailing list

> [hidden email]

> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

>

--

Roger Bivand

Department of Economics, Norwegian School of Economics,

Helleveien 30, N-5045 Bergen, Norway.

voice: +47 55 95 93 55; e-mail: [hidden email]

https://orcid.org/0000-0003-2392-6140

https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

_______________________________________________

R-sig-Geo mailing list

[hidden email]

https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand

Department of Economics

Norwegian School of Economics

Helleveien 30

N-5045 Bergen, Norway