Far Harbor
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Gork@sopuli.xyz to Programmer Humor@programming.dev · 23 hours ago

You can do anything at Zombocom

sopuli.xyz

message-square
48
link
fedilink
402

You can do anything at Zombocom

sopuli.xyz

Gork@sopuli.xyz to Programmer Humor@programming.dev · 23 hours ago
message-square
48
link
fedilink
  • yetAnotherUser@lemmy.ca
    link
    fedilink
    arrow-up
    20
    ·
    22 hours ago

    Hey, you guys got any cool tips for website scraping?

    • MonkderVierte@lemmy.zip
      link
      fedilink
      arrow-up
      1
      ·
      3 hours ago

      Consider free API first if possible.

    • irelephant [he/him]@lemmy.dbzer0.com
      link
      fedilink
      arrow-up
      2
      ·
      8 hours ago

      what do you want to scrape.

    • MalReynolds@piefed.social
      link
      fedilink
      English
      arrow-up
      13
      ·
      17 hours ago

      Beautiful Soup (python library, bs4) is also fren

    • luciole (he/him)@beehaw.org
      link
      fedilink
      arrow-up
      33
      ·
      21 hours ago

      They’re gonna tell not to parse HTML with regular expressions. Heed this warning, and do it anyways.

      • yetAnotherUser@lemmy.ca
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        13 hours ago

        Thanks for your reply. What are your arguments in favour of parsing HTML with regex instead of using another method?

        • luciole (he/him)@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          2 hours ago

          You have basically two options: treat HTML as a string or parse it then process it with higher level DOM features.

          The problem with the second approach is that HTML may look like an XML dialect but it is actually immensely quirky and tolerant. Moreover the modern web page is crazy bloated, so mass processing pages might be surprisingly demanding. And in the end you still need to do custom code to grab the data you’re after.

          On the other hand string searching is as lightweight as it gets and you typically don’t really need to care about document structure as a scraper anyways.

        • lime!@feddit.nu
          link
          fedilink
          arrow-up
          7
          ·
          12 hours ago

          it’s quick, it’s easy and it’s free

        • MonkderVierte@lemmy.zip
          link
          fedilink
          arrow-up
          6
          ·
          12 hours ago

          Are you a LLM?

    • TropicalDingdong@lemmy.world
      link
      fedilink
      arrow-up
      25
      ·
      22 hours ago

      Selenium is your fren

Programmer Humor@programming.dev

programmer_humor@programming.dev

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !programmer_humor@programming.dev

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

  • Keep content in english
  • No advertisements
  • Posts must be related to programming or programmer topics
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 923 users / day
  • 3.4K users / week
  • 9.01K users / month
  • 9.58K users / 6 months
  • 1 local subscriber
  • 27.8K subscribers
  • 582 Posts
  • 5.15K Comments
  • Modlog
  • mods:
  • adr1an@programming.dev
  • Feyter@programming.dev
  • BurningTurtle@programming.dev
  • Pierre-Yves Lapersonne@programming.dev
  • BE: 0.19.13
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org