首页技术日记正文内容

Appache spark : Cannot grow BufferHolder by size 524432 because the size after growing exceeds size limitation 2147483632 - Stac

技术日记

更新时间：2025-07-17 14:29:45 101

admin 管理员组

文章数量: 1087786

I am working with a large and nested JSON dataset in Apache Spark and encountering a "max buffer size exceeded" exception during the writing process.

My Processing Steps:

Read the JSON file.
Explode nested structures.
Filter unnecessary data
Select relevant columns.
Count the records.
Write the final DataFrame.

Issue: During the count() or write() operations, Spark is recomputing all transformations from the beginning, leading to excessive memory usage and eventually the max buffer size exceeded error.

What I Tried:

Initially, I got a GC (Garbage Collection) error, so I increased the executor memory (spark.driver.memory, spark.executor.memory).

Now, the GC error is gone, but I still get the "max buffer size exceeded" error during count() or write().

Spark seems to recompute all transformations during these actions, leading to excessive memory usage.

Questions: How can I prevent Spark from recomputing all transformations at the final stage?

Is caching or checkpointing an effective solution here?

Are there any specific configurations to handle this buffer size limitation? enter image description here

I am working with a large and nested JSON dataset in Apache Spark and encountering a "max buffer size exceeded" exception during the writing process.

My Processing Steps:

Read the JSON file.
Explode nested structures.
Filter unnecessary data
Select relevant columns.
Count the records.
Write the final DataFrame.

Issue: During the count() or write() operations, Spark is recomputing all transformations from the beginning, leading to excessive memory usage and eventually the max buffer size exceeded error.

What I Tried:

Initially, I got a GC (Garbage Collection) error, so I increased the executor memory (spark.driver.memory, spark.executor.memory).

Now, the GC error is gone, but I still get the "max buffer size exceeded" error during count() or write().

Spark seems to recompute all transformations during these actions, leading to excessive memory usage.

Questions: How can I prevent Spark from recomputing all transformations at the final stage?

Is caching or checkpointing an effective solution here?

Are there any specific configurations to handle this buffer size limitation? enter image description here

Share Improve this question asked Mar 27 at 9:29 Ram Shan 112 bronze badges

Filter first and select before exploding. If you can. – Ged Commented Mar 27 at 9:58
Cache? Appropiate level. – Ged Commented Mar 27 at 10:11

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

Spark is not recomputing in the final stage but it is doing lazy evaluation i.e it is doing actual computation when it sees the action. So basically when it sees write action it starts reading the data , transforms it and writes to the destination. This is what spark usually does. Since you have not attached your code i am assuming that has been the case. Cache would not help here. This error comes if one column size turns out to be of very big size. So you need to check what exactly is your explode doing. Below is a thread on same error which yo u can refer. There are couple of solutions provided in the thread.
https://community.databricks/t5/data-engineering/bufferholder-exceeded-on-json-flattening/td-p/12873

Btw you can attach code and insert image in the question itself instead of linking it. Will be easier to check.

本文标签：

Error[2]: Invalid argument supplied for foreach(), File: /www/wwwroot/roclinux.cn/tmp/view_template_quzhiwa_htm_read.htm, Line: 58

File: /www/wwwroot/roclinux.cn/tmp/route_read.php, Line: 205, include(/www/wwwroot/roclinux.cn/tmp/view_template_quzhiwa_htm_read.htm)
File: /www/wwwroot/roclinux.cn/tmp/index.inc.php, Line: 129, include(/www/wwwroot/roclinux.cn/tmp/route_read.php)
File: /www/wwwroot/roclinux.cn/index.php, Line: 29, include(/www/wwwroot/roclinux.cn/tmp/index.inc.php)

版权声明：本文标题：Appache spark : Cannot grow BufferHolder by size 524432 because the size after growing exceeds size limitation 2147483632 - Stac 内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://roclinux.cn/p/1744100195a2533460.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

Error[2]: array_keys() expects parameter 1 to be array, null given, File: /www/wwwroot/roclinux.cn/tmp/view_template_quzhiwa_htm_read.htm, Line: 77

File: /www/wwwroot/roclinux.cn/tmp/view_template_quzhiwa_htm_read.htm, Line: 77, array_keys()
File: /www/wwwroot/roclinux.cn/tmp/route_read.php, Line: 205, include(/www/wwwroot/roclinux.cn/tmp/view_template_quzhiwa_htm_read.htm)
File: /www/wwwroot/roclinux.cn/tmp/index.inc.php, Line: 129, include(/www/wwwroot/roclinux.cn/tmp/route_read.php)
File: /www/wwwroot/roclinux.cn/index.php, Line: 29, include(/www/wwwroot/roclinux.cn/tmp/index.inc.php)

更多相关文章

【亲测免费】纯净体验，一键启动：Windows 7 SP1 x64 旗舰版 VMware 虚拟机推荐

编程

1天前

纯净体验，一键启动：Windows 7 SP1 x64 旗舰版 VMware 虚拟机推荐【下载地址】Windows7SP1x64旗舰版VMware虚拟机下载分享本仓库提供了一个纯净版的 Win

【亲测免费】 Windows Embedded Standard 7 X86X64纯净版2018.3.19

编程

1天前

Windows Embedded Standard 7 X86X64纯净版2018.3.19 【下载地址】WindowsEmbeddedStandard7X86X64纯净版2018.3.19分享本仓库提供了一个名为“Windows Emb

中间件【安装篇】01：Windows环境安装jdk1.8

编程

22小时前

一、jdk下载地址：注：这个jdk是1.8的版本链接：https:pan.baidus1DYG3-F0D2QDC5CjhFOwNVA 提取码&#xff1

【免费下载】跨平台应用新体验：EXE转APK资源转换器

编程

20小时前

跨平台应用新体验：EXE转APK资源转换器【下载地址】EXE转APK资源转换器本仓库提供了一个资源文件，名为“EXE转APK资源转换器”。该工具的主要功能是将仅能在电脑上运行的 .exe 文件

【免费下载】 USB共享(USB-Over-Network)5.02带注册码

编程

15小时前

USB共享(USB-Over-Network)5.02带注册码【下载地址】USB共享USB-Over-Network5.02带注册码 USB共享是一款高效且实用的工具软件，专为需要在网络上共享USB设备的用户设计。版

【亲测免费】轻松实现虚拟化：VMware虚拟机安装Windows Server 2016及文件共享指南

编程

15小时前

轻松实现虚拟化：VMware虚拟机安装Windows Server 2016及文件共享指南【下载地址】VMware虚拟机安装WindowsServer2016及主机文件共享指南分享 VMware虚拟机安装Window

【免费下载】 ThinkPad 原装出厂 Win11 系统镜像：恢复出厂状态的最佳选择

编程

12小时前

ThinkPad 原装出厂 Win11 系统镜像：恢复出厂状态的最佳选择【下载地址】ThinkPad原装出厂Win11系统镜像下载本仓库提供 ThinkPad T14s、X13 Gen3、ThinkPad P14s

【亲测免费】重拾纯净体验：Dell戴尔G15 5511原装出厂系统20H2恢复工具推荐

编程

12小时前

重拾纯净体验：Dell戴尔G15 5511原装出厂系统20H2恢复工具推荐【下载地址】Dell戴尔G155511原装出厂系统20H2恢复原厂系统 Dell戴尔G15 5511原装出厂系统20H2恢复原厂系统项目

【亲测免费】 MateBook 14 2021款出厂系统恢复资源

编程

12小时前

MateBook 14 2021款出厂系统恢复资源【下载地址】MateBook142021款出厂系统恢复资源 MateBook 14 2021款出厂系统恢复资源本资源提供了华为MateBook 14 2021款(KLVD-WFH9)笔记本

【免费下载】戴尔外星人Alienware m15 R7原装出厂Win11预装OEM系统包下载指南

编程

12小时前

戴尔外星人Alienware m15 R7原装出厂Win11预装OEM系统包下载指南【下载地址】戴尔外星人Alienwarem15R7原装出厂Win11预装OEM系统包下载指南戴尔外星人Alienware m15 R7原装出厂Win11

【亲测免费】华为MateBook 14 2021款出厂系统恢复资源：重拾新机体验的利器

编程

12小时前

华为MateBook 14 2021款出厂系统恢复资源：重拾新机体验的利器【下载地址】MateBook142021款出厂系统恢复资源 MateBook 14 2021款出厂系统恢复资源本资源提供了华为MateBook

【亲测免费】 Win7 OEM DIY 修改器

编程

12小时前

Win7 OEM DIY 修改器【下载地址】Win7OEMDIY修改器 Win7_OEM_DIY.zip 是一款专为 Windows 7 用户设计的 OEM 信息修改工具。通过这款工具，用户可以轻松修改自己电脑的 O

【免费下载】 ASUS华硕飞行堡垒7笔记本原装Win10系统恢复指南

编程

12小时前

ASUS华硕飞行堡垒7笔记本原装Win10系统恢复指南【下载地址】ASUS华硕飞行堡垒7笔记本原装Win10系统恢复指南本资源文件提供了ASUS华硕飞行堡垒7笔记本（型号FX505GT_FX95GT&#xf

【免费下载】 ThinkPad 原装出厂 Win11 系统镜像下载

编程

11小时前

ThinkPad 原装出厂 Win11 系统镜像下载【下载地址】ThinkPad原装出厂Win11系统镜像下载本仓库提供 ThinkPad T14s、X13 Gen3、ThinkPad P14s、P16s、T16 Gen3&#x

【亲测免费】华为MateBook X Pro 2023款微绒典藏版 i7 集显触屏原装出厂 Win11 系统原厂OEM系统镜像

编程

11小时前

华为MateBook X Pro 2023款微绒典藏版 i7 集显触屏原装出厂 Win11 系统原厂OEM系统镜像【下载地址】华为MateBookXPro2023款微绒典藏版i7集显触屏原装出厂Win11系统原厂OEM系统镜像欢迎使

【免费下载】 ASUS华硕魔霸7S枪神7笔记本Win11系统镜像包

编程

11小时前

ASUS华硕魔霸7S枪神7笔记本Win11系统镜像包【下载地址】ASUS华硕魔霸7S枪神7笔记本Win11系统镜像包本仓库提供ASUS华硕魔霸7S枪神7笔记本G713PV PI PU PZ原装出厂Win11系统工厂模式镜像包。该镜像包包

【亲测免费】 U盘启动盘制作工具——大白菜最新版

编程

7小时前

U盘启动盘制作工具——大白菜最新版【下载地址】U盘启动盘制作工具大白菜最新版大白菜U盘启动盘制作工具是一款简单实用的电脑启动盘制作软件，适合系统安装、数据恢复和系统维护等多种场景。它支持多种系统镜像格式&#

【亲测免费】 Rufus创建UEFI启动盘教程

编程

7小时前

Rufus创建UEFI启动盘教程去发现同类优质开源项目:https:gitcode 本资源文件提供了一个详细的教程，指导用户如何使用Rufus工具创建UEFI启动盘。通过本教程，您可以

【亲测免费】【免费下载】推荐：偷闲阅读神器 —— Thief Book

编程

5小时前

推荐：偷闲阅读神器 —— Thief Book 在繁忙的编码生活里，是否渴望一抹书香的慰藉？今天，我们为您推荐一款技术与文学完美融合的开源神器——Th

花森起始页正式发布，强势开源，700+ Star。敲一行代码，即可拥有导航站 + 博客 + 后台管理系统。

编程

3小时前

花森起始页码云仓库地址：https:giteeHuaSenJioJiohuasenjio-compose Github 仓库地址：https:githubhuasenjiohua

发表评论

全部评论 0

暂无评论

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

Appache spark : Cannot grow BufferHolder by size 524432 because the size after growing exceeds size limitation 2147483632 - Stac

1 Answer 1

更多相关文章

【亲测免费】 纯净体验，一键启动：Windows 7 SP1 x64 旗舰版 VMware 虚拟机推荐

【亲测免费】 Windows Embedded Standard 7 X86X64纯净版2018.3.19

中间件【安装篇】01：Windows环境安装jdk1.8

【免费下载】 跨平台应用新体验：EXE转APK资源转换器

【免费下载】 USB共享(USB-Over-Network)5.02带注册码

【亲测免费】 轻松实现虚拟化：VMware虚拟机安装Windows Server 2016及文件共享指南

【免费下载】 ThinkPad 原装出厂 Win11 系统镜像：恢复出厂状态的最佳选择

【亲测免费】 重拾纯净体验：Dell戴尔G15 5511原装出厂系统20H2恢复工具推荐

【亲测免费】 MateBook 14 2021款出厂系统恢复资源

【免费下载】 戴尔外星人Alienware m15 R7原装出厂Win11预装OEM系统包下载指南

【亲测免费】 华为MateBook 14 2021款出厂系统恢复资源：重拾新机体验的利器

【亲测免费】 Win7 OEM DIY 修改器

【免费下载】 ASUS华硕飞行堡垒7笔记本原装Win10系统恢复指南

【免费下载】 ThinkPad 原装出厂 Win11 系统镜像下载

【亲测免费】 华为MateBook X Pro 2023款 微绒典藏版 i7 集显触屏 原装出厂 Win11 系统原厂OEM系统镜像

【免费下载】 ASUS华硕魔霸7S枪神7笔记本Win11系统镜像包

【亲测免费】 U盘启动盘制作工具——大白菜最新版

【亲测免费】 Rufus创建UEFI启动盘教程

【亲测免费】【免费下载】 推荐：偷闲阅读神器 —— Thief Book

花森起始页正式发布，强势开源，700+ Star。敲一行代码，即可拥有导航站 + 博客 + 后台管理系统。

发表评论

推荐文章

java - Generate PDF on client side using JSPDF - Stack Overflow

dom - window.open not loading address in Safari on iOS - Stack Overflow

jquery - How do I send values between pages using javascript? - Stack Overflow

Windows Server2019---01 安装环境

vnc黑屏window_远程桌面黑屏如何修复_win10远程桌面黑屏的处理方法－系统城

热门文章

nginx - SSL certificate for a sub domain showing not secure on a public network - Stack Overflow

javascript - Polymer.js two-way binding to textarea value - Stack Overflow

javascript - Postman console log not allowing to write string in next line - Stack Overflow

javascript - How am I supposed to use the &quot;pdf&quot; package from typescript - Stack Overflow

Create a slowly changing dimension in SQL Server using SQL query - Stack Overflow

javascript - Angular: returning a value from onValue() in firebase realtime database - Stack Overflow

javascript - Select value of jQuery select2 not updating correctly - Stack Overflow

javascript - ASP.NET not seeing Radio Button value change - Stack Overflow

老板监工神器：ChatLog！支持一键监听企业微信聊天记录!

清华同方u盘重装Windows11

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

花森起始页正式发布，强势开源，700+ Star。敲一行代码，即可拥有导航站 + 博客 + 后台管理系统。

【亲测免费】【免费下载】 推荐：偷闲阅读神器 —— Thief Book

【亲测免费】 Rufus创建UEFI启动盘教程

【亲测免费】 U盘启动盘制作工具——大白菜最新版

【免费下载】 ASUS华硕魔霸7S枪神7笔记本Win11系统镜像包

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

【亲测免费】纯净体验，一键启动：Windows 7 SP1 x64 旗舰版 VMware 虚拟机推荐

【免费下载】跨平台应用新体验：EXE转APK资源转换器

【亲测免费】轻松实现虚拟化：VMware虚拟机安装Windows Server 2016及文件共享指南

【亲测免费】重拾纯净体验：Dell戴尔G15 5511原装出厂系统20H2恢复工具推荐

【免费下载】戴尔外星人Alienware m15 R7原装出厂Win11预装OEM系统包下载指南

【亲测免费】华为MateBook 14 2021款出厂系统恢复资源：重拾新机体验的利器

【亲测免费】华为MateBook X Pro 2023款微绒典藏版 i7 集显触屏原装出厂 Win11 系统原厂OEM系统镜像

【亲测免费】【免费下载】推荐：偷闲阅读神器 —— Thief Book

javascript - How am I supposed to use the "pdf" package from typescript - Stack Overflow

【亲测免费】【免费下载】推荐：偷闲阅读神器 —— Thief Book