{"id":2104,"date":"2024-01-18T03:59:14","date_gmt":"2024-01-18T11:59:14","guid":{"rendered":"https:\/\/gantovnik.com\/bio-tips\/?p=2104"},"modified":"2024-01-18T03:59:14","modified_gmt":"2024-01-18T11:59:14","slug":"411-clustering-using-dbscan-algorithm-in-sklearn-cluster-in-python","status":"publish","type":"post","link":"https:\/\/gantovnik.com\/bio-tips\/2024\/01\/411-clustering-using-dbscan-algorithm-in-sklearn-cluster-in-python\/","title":{"rendered":"#411 Clustering using DBSCAN algorithm in sklearn.cluster in python"},"content":{"rendered":"<p><a href=\"https:\/\/gantovnik.com\/bio-tips\/2024\/01\/411-clustering-using-dbscan-algorithm-in-sklearn-cluster-in-python\/ex411\/\" rel=\"attachment wp-att-2105\"><img data-recalc-dims=\"1\" decoding=\"async\" src=\"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2024\/01\/ex411.png?resize=515%2C256&#038;ssl=1\" alt=\"\" width=\"515\" height=\"256\" class=\"alignnone size-full wp-image-2105\" srcset=\"https:\/\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2024\/01\/ex411.png 515w, https:\/\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2024\/01\/ex411-480x239.png 480w\" sizes=\"(min-width: 0px) and (max-width: 480px) 480px, (min-width: 481px) 515px, 100vw\" \/><\/a><\/p>\n<p>DBSCAN works by finding core points that have many data points within a given radius. Once the core is defined, the process is iteratively computed until there are no more core points definable within the maximum radius. This algorithm does exceptionally well compared to kmeans where there is noise present in the data.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport numpy as np\r\nimport matplotlib.pyplot as mpl\r\nfrom sklearn.cluster import DBSCAN\r\n# Creating data\r\nc1 = np.random.randn(100, 2) + 5\r\nc2 = np.random.randn(50, 2)\r\n# Creating a uniformly distributed background\r\nu1 = np.random.uniform(low=-10, high=10, size=100)\r\nu2 = np.random.uniform(low=-10, high=10, size=100)\r\nc3 = np.column_stack(&#x5B;u1, u2])\r\n# Pooling all the data into one 150 x 2 array\r\ndata = np.vstack(&#x5B;c1, c2, c3])\r\n# Calculating the cluster with DBSCAN function.\r\n# db.labels_ is an array with identifiers to the\r\n# different clusters in the data.\r\n#db = DBSCAN().fit(data, eps=0.95, min_samples=10)\r\ndb = DBSCAN().fit(data)\r\nlabels = db.labels_\r\n# Retrieving coordinates for points in each\r\n# identified core. There are two clusters\r\n# denoted as 0 and 1 and the noise is denoted\r\n# as -1. Here we split the data based on which\r\n# component they belong to.\r\ndbc1 = data&#x5B;labels == 0]\r\ndbc2 = data&#x5B;labels == 1]\r\nnoise = data&#x5B;labels == -1]\r\n# Setting up plot details\r\nx1, x2 = -12, 12\r\ny1, y2 = -12, 12\r\nfig = mpl.figure()\r\nfig.subplots_adjust(hspace=0.1, wspace=0.1)\r\nax1 = fig.add_subplot(121, aspect='equal')\r\nax1.scatter(c1&#x5B;:,0], c1&#x5B;:,1], lw=0.1, color='#00CC00', marker=&quot;.&quot;)\r\nax1.scatter(c2&#x5B;:,0], c2&#x5B;:,1], lw=0.1, color='#028E9B', marker=&quot;.&quot;)\r\nax1.scatter(c3&#x5B;:,0], c3&#x5B;:,1], lw=0.1, color='#FF7800', marker=&quot;.&quot;)\r\nax1.xaxis.set_visible(False)\r\nax1.yaxis.set_visible(False)\r\nax1.set_xlim(x1, x2)\r\nax1.set_ylim(y1, y2)\r\nax1.text(-11, 10, 'Original')\r\nax2 = fig.add_subplot(122, aspect='equal')\r\nax2.scatter(dbc1&#x5B;:,0], dbc1&#x5B;:,1], lw=0.1, color='#00CC00', marker=&quot;.&quot;)\r\nax2.scatter(dbc2&#x5B;:,0], dbc2&#x5B;:,1], lw=0.1, color='#028E9B', marker=&quot;.&quot;)\r\nax2.scatter(noise&#x5B;:,0], noise&#x5B;:,1], lw=0.1, color='#FF7800', marker=&quot;.&quot;)\r\nax2.xaxis.set_visible(False)\r\nax2.yaxis.set_visible(False)\r\nax2.set_xlim(x1, x2)\r\nax2.set_ylim(y1, y2)\r\nax2.text(-11, 10, 'DBSCAN identified')\r\nfig.savefig(&quot;ex411.png&quot;, dpi=100, bbox_inches='tight')\r\nfig.savefig('ex411.pdf', bbox_inches='tight')\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>DBSCAN works by finding core points that have many data points within a given radius. Once the core is defined, the process is iteratively computed until there are no more core points definable within the maximum radius. This algorithm does exceptionally well compared to kmeans where there is noise present in the data. import numpy [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","_lmt_disableupdate":"yes","_lmt_disable":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[98,2,97],"tags":[],"class_list":["post-2104","post","type-post","status-publish","format-standard","hentry","category-cluster","category-python","category-sklearn"],"modified_by":"gantovnik","jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8bH0k-xW","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":2094,"url":"https:\/\/gantovnik.com\/bio-tips\/2024\/01\/409-rain-drops-animation-using-matplotlib-in-python\/","url_meta":{"origin":2104,"position":0},"title":"#409 Rain drops &#8211; animation using matplotlib in python","author":"gantovnik","date":"2024-01-14","format":false,"excerpt":"[code language=\"python\"] import matplotlib.pyplot as plt import numpy as np from matplotlib.animation import FuncAnimation # Fixing random state for reproducibility np.random.seed(19680801) # Create new Figure and an Axes which fills it. fig = plt.figure(figsize=(7, 7)) ax = fig.add_axes([0, 0, 1, 1], frameon=False) ax.set_xlim(0, 1), ax.set_xticks([]) ax.set_ylim(0, 1), ax.set_yticks([]) # Create\u2026","rel":"","context":"In &quot;animation&quot;","block_context":{"text":"animation","link":"https:\/\/gantovnik.com\/bio-tips\/category\/animation\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2024\/01\/ex409.gif?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2024\/01\/ex409.gif?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2024\/01\/ex409.gif?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2024\/01\/ex409.gif?resize=700%2C400&ssl=1 2x"},"classes":[]},{"id":926,"url":"https:\/\/gantovnik.com\/bio-tips\/2021\/06\/166-solution-of-a-differential-equation-using-bubnov-galerkin-method-with-sympy-package\/","url_meta":{"origin":2104,"position":1},"title":"#166 Solution of a differential equation using Bubnov-Galerkin method with Sympy package","author":"gantovnik","date":"2021-06-15","format":false,"excerpt":"#166 Solution of a differential equation using Bubnov-Galerkin method with Sympy package The problem and solution in this pdf file: ex166 [code language=\"python\"] import sympy from matplotlib import pyplot as plt import seaborn as sns import numpy as np from sympy.utilities.lambdify import lambdify from sympy import simplify from sympy import\u2026","rel":"","context":"In &quot;python&quot;","block_context":{"text":"python","link":"https:\/\/gantovnik.com\/bio-tips\/category\/python\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":449,"url":"https:\/\/gantovnik.com\/bio-tips\/2019\/08\/smoothing-the-noise-in-real-world-data\/","url_meta":{"origin":2104,"position":2},"title":"#65 Smoothing the Noise in Real-world Data","author":"gantovnik","date":"2019-08-22","format":false,"excerpt":"#Smoothing the noise in real-world data #This window rolls over the data and is used to compute the average over that window. import matplotlib.pyplot as plt import numpy as np import os os.chdir(r'D:\\projects\\wordpress\\ex65') def moving_average(interval, window_size): #Compute convoluted window for given size window = np.ones(int(window_size)) \/ float(window_size) return np.convolve(interval, window,\u2026","rel":"","context":"In &quot;python&quot;","block_context":{"text":"python","link":"https:\/\/gantovnik.com\/bio-tips\/category\/python\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/08\/ex65.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/08\/ex65.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/08\/ex65.png?resize=525%2C300&ssl=1 1.5x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/08\/ex65.png?resize=700%2C400&ssl=1 2x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/08\/ex65.png?resize=1050%2C600&ssl=1 3x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/08\/ex65.png?resize=1400%2C800&ssl=1 4x"},"classes":[]},{"id":1211,"url":"https:\/\/gantovnik.com\/bio-tips\/2021\/11\/210-parametric-curve-in-3d-2-2-2-2\/","url_meta":{"origin":2104,"position":3},"title":"#214 Animated 3D random walk","author":"gantovnik","date":"2021-11-28","format":false,"excerpt":"[code language=\"python\"] import numpy as np import matplotlib.pyplot as plt import matplotlib.animation as animation # Fixing random state for reproducibility np.random.seed(19680801) def random_walk(num_steps, max_step=0.05): \"\"\"Return a 3D random walk as (num_steps, 3) array.\"\"\" start_pos = np.random.random(3) steps = np.random.uniform(-max_step, max_step, size=(num_steps, 3)) walk = start_pos + np.cumsum(steps, axis=0) return walk\u2026","rel":"","context":"In &quot;python&quot;","block_context":{"text":"python","link":"https:\/\/gantovnik.com\/bio-tips\/category\/python\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2021\/11\/ex214.png?resize=350%2C200&ssl=1","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2021\/11\/ex214.png?resize=350%2C200&ssl=1 1x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2021\/11\/ex214.png?resize=525%2C300&ssl=1 1.5x"},"classes":[]},{"id":1208,"url":"https:\/\/gantovnik.com\/bio-tips\/2021\/11\/210-parametric-curve-in-3d-2-2-2\/","url_meta":{"origin":2104,"position":4},"title":"#213 Annotate Text Arrow","author":"gantovnik","date":"2021-11-27","format":false,"excerpt":"[code language=\"python\"] import numpy as np import matplotlib.pyplot as plt # Fixing random state for reproducibility np.random.seed(19680801) fig, ax = plt.subplots(figsize=(5, 5)) ax.set_aspect(1) x1 = -1 + np.random.randn(100) y1 = -1 + np.random.randn(100) x2 = 1. + np.random.randn(100) y2 = 1. + np.random.randn(100) ax.scatter(x1, y1, color=\"r\") ax.scatter(x2, y2, color=\"g\") bbox_props\u2026","rel":"","context":"In &quot;python&quot;","block_context":{"text":"python","link":"https:\/\/gantovnik.com\/bio-tips\/category\/python\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2021\/11\/ex213.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]},{"id":139,"url":"https:\/\/gantovnik.com\/bio-tips\/2019\/01\/multidimensional-spline\/","url_meta":{"origin":2104,"position":5},"title":"Multidimensional Spline","author":"gantovnik","date":"2019-01-04","format":false,"excerpt":"import os import matplotlib.pyplot as plt import numpy as np from scipy import interpolate os.chdir(r'D:\\data\\scripts\\web1\\ex29') os.getcwd() np.random.seed(115925231) x = y = np.linspace(-1, 1, 100) X, Y = np.meshgrid(x, y) def f(x, y): return np.exp(-x**2 - y**2) * np.cos(4*x) * np.sin(6*y) Z = f(X, Y) N = 500 xdata = np.random.uniform(-1,\u2026","rel":"","context":"In &quot;python&quot;","block_context":{"text":"python","link":"https:\/\/gantovnik.com\/bio-tips\/category\/python\/"},"img":{"alt_text":"example29","src":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/01\/example29.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/01\/example29.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/gantovnik.com\/bio-tips\/wp-content\/uploads\/2019\/01\/example29.png?resize=525%2C300 1.5x"},"classes":[]}],"_links":{"self":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/posts\/2104","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/comments?post=2104"}],"version-history":[{"count":2,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/posts\/2104\/revisions"}],"predecessor-version":[{"id":2107,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/posts\/2104\/revisions\/2107"}],"wp:attachment":[{"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/media?parent=2104"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/categories?post=2104"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/gantovnik.com\/bio-tips\/wp-json\/wp\/v2\/tags?post=2104"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}