python - Excessive NaNs when resampling + interpolating in Pandas

admin 管理员组

文章数量: 1086019

I am trying to down-sample a time series in Pandas from 8 seconds to 10 seconds. For the purposes of this example, I've generated fake data that linearly increases with the number of seconds, over a minute. Importantly, for this example, the time intervals of the two time series are not multiples of each other.

When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:

import numpy as np
import pandas as pd
import datetime

a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']

2025-12-02 17:39:06     60.0
2025-12-02 17:39:14     68.0
2025-12-02 17:39:22     76.0
2025-12-02 17:39:30     84.0
2025-12-02 17:39:38     92.0
2025-12-02 17:39:46    100.0
2025-12-02 17:39:54    108.0
2025-12-02 17:40:02    116.0
2025-12-02 17:40:10    124.0
Freq: 8s, Name: Hi, dtype: float64

When using resample interpolate, this is the result:

interval8df.resample('10s').interpolate(method='time')['Hi']

2025-12-02 17:39:00      NaN
2025-12-02 17:39:10      NaN
2025-12-02 17:39:20      NaN
2025-12-02 17:39:30     84.0
2025-12-02 17:39:40     94.0
2025-12-02 17:39:50    104.0
2025-12-02 17:40:00    114.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

While I can understand the first 17:39:00 going NaN, both 17:39:10 and 17:39:20 are both surrounded by points in the original time series (6 and 14 seconds, then 14 and 20 seconds respectively). Why is it occurring?

I've tried using mean, but that produced no NaNs.

interval8df.resample('10s').mean()['Hi']

2025-12-02 17:39:00     60.0
2025-12-02 17:39:10     68.0
2025-12-02 17:39:20     76.0
2025-12-02 17:39:30     88.0
2025-12-02 17:39:40    100.0
2025-12-02 17:39:50    108.0
2025-12-02 17:40:00    116.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

Additionally, changing the interpolate method does not seem to have improved the solution.

The workaround I've been using is up-sampling from 8 seconds to 1 second using interpolate, then down-sampling from 1 second to 10 seconds using the mean, which is obviously clunky. I would like to be able to do this directly in one step.

When using .resample().interpolate() in Pandas, it seems unable to interpolate for the first few points, for which there is sufficient data. How can I work around it? Here's the example:

import numpy as np
import pandas as pd
import datetime

a = datetime.datetime(2025, 12, 2, 17, 39, 6)
interval8df = pd.DataFrame(np.linspace(60, 124, 9), columns=['Hi'], index=pd.date_range(a, periods=9, freq='8s'))
interval8df['Hi']

2025-12-02 17:39:06     60.0
2025-12-02 17:39:14     68.0
2025-12-02 17:39:22     76.0
2025-12-02 17:39:30     84.0
2025-12-02 17:39:38     92.0
2025-12-02 17:39:46    100.0
2025-12-02 17:39:54    108.0
2025-12-02 17:40:02    116.0
2025-12-02 17:40:10    124.0
Freq: 8s, Name: Hi, dtype: float64

When using resample interpolate, this is the result:

interval8df.resample('10s').interpolate(method='time')['Hi']

2025-12-02 17:39:00      NaN
2025-12-02 17:39:10      NaN
2025-12-02 17:39:20      NaN
2025-12-02 17:39:30     84.0
2025-12-02 17:39:40     94.0
2025-12-02 17:39:50    104.0
2025-12-02 17:40:00    114.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

I've tried using mean, but that produced no NaNs.

interval8df.resample('10s').mean()['Hi']

2025-12-02 17:39:00     60.0
2025-12-02 17:39:10     68.0
2025-12-02 17:39:20     76.0
2025-12-02 17:39:30     88.0
2025-12-02 17:39:40    100.0
2025-12-02 17:39:50    108.0
2025-12-02 17:40:00    116.0
2025-12-02 17:40:10    124.0
Freq: 10s, Name: Hi, dtype: float64

Additionally, changing the interpolate method does not seem to have improved the solution.

Share Improve this question edited Mar 30 at 15:56 halfer 20.4k19 gold badges109 silver badges202 bronze badges asked Mar 29 at 21:59 user30106177 132 bronze badges

A suitable kludge may be to back project a linear extrapolation of the first few samples so that pandas doesn't nan the first few datapoints of the actual series. Whether this is an acceptable solution depends on what your final use is. It is usually safer to use your raw data as it was sampled to infer whatever it is that you want to compute. Interpolation always introduces artefacts that might or might not matter. – Martin Brown Commented Mar 29 at 22:11
Unfortunately resample isn't quite as powerful as it sounds. If your data is sampled exactly every 8s and you want it resampled to 10s, it's probably easiest to (1) upsample (with resample) to 2s (highest common denominator), (2) interpolate, then (3) resample down to 10s. This will avoid NaNs being produced where the timestamps don't align exactly with 10s intervals (as per Scott Boston's answer below). – Paul Wilson Commented Mar 30 at 6:58

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

To see what is happening, let's add asfreq after the resample and you can see what is passed in to the next chained function:

interval8df.resample('10s').asfreq()

Output:

Hi
2025-12-02 17:39:00    NaN
2025-12-02 17:39:10    NaN
2025-12-02 17:39:20    NaN
2025-12-02 17:39:30   84.0
2025-12-02 17:39:40    NaN
2025-12-02 17:39:50    NaN
2025-12-02 17:40:00    NaN
2025-12-02 17:40:10  124.0

And, since you doing interpolation, the lower bound is not seen hence the nulls for seconds 00, 10, 20. While doing mean with out interpolating you, are just doing a window of 10s means of values. Since you have values within each 10s interval you are getting that mean values returned.

本文标签： pythonExcessive NaNs when resamplinginterpolating in PandasStack Overflow

版权声明：本文标题：python - Excessive NaNs when resampling + interpolating in Pandas - Stack Overflow 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://roclinux.cn/p/1744002627a2516726.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

python - Excessive NaNs when resampling + interpolating in Pandas - Stack Overflow

1 Answer 1

更多相关文章